aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary
Commit message (Collapse)AuthorAgeFilesLines
* gallivm: revert accidentally commited hunkRoland Scheidegger2013-08-151-12/+1
| | | | That magic wasn't meant to be commited, need to work on some proper fix.
* gallivm: do per-sample depth comparison instead of doing it post-filterRoland Scheidegger2013-08-152-106/+195
| | | | | | | | | | | | | | | | | | | | | | | | | Doing the comparisons pre-filter is highly recommended by OpenGL (and d3d9) and definitely required by d3d10. This actually doesn't do it pre-filter but more "in-filter" as otherwise need to push the comparisons even further down into fetch code and this also trivially allows using a somewhat cheaper lerp. Doing it pre-filter would actually have some performance advantage for UNORM formats (because the comparisons should be done in texture format, we'd only need to convert the shadow ref coord to texture format once, but in turn would save converting the per-sample texture values to floats) but this gets a bit messy as this has implications for border color handling as well (which needs to be done prior to depth comparisons, hence would also need to convert border color to texture format too or use some other tricks like doing separate border color / shadow ref comparison and simply using that result directly when doing border replacement). Should make no difference for nearest filtering, and performance for linear filtering should be mostly the same too (essentially have one more comparison instruction per sample, and replace the sub/mul/add lerp with a sub/and/and/add special "lerp" which all in all shouldn't be much of a difference). v2: get rid of old code completely Reviewed-by: Zack Rusin <[email protected]>
* tgsi: add info about MSAA samplers to tgsi_shader_infoMarek Olšák2013-08-152-0/+14
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* tgsi: fix the location of sample indexMarek Olšák2013-08-151-1/+3
| | | | | | The sample index is always in W. Reviewed-by: Michel Dänzer <[email protected]>
* gallivm: already pass coords in the right place in the sampler interfaceRoland Scheidegger2013-08-153-99/+90
| | | | | | | | | | | | | | | | | This makes things a bit nicer, and more importantly it fixes an issue where a "downgraded" array texture (due to view reduced to 1 layer and addressed with (non-array) samplec instruction) would use the wrong coord as shadow reference value. (This could also be fixed by passing target through the sampler interface much the same way as is done for size queries, might do this eventually anyway.) And if we'd ever want to support (shadow) cube map arrays, we'd need 5 coords in any case. v2: fix bugs (texel fetch using wrong layer coord for 1d, shadow tex using wrong shadow coord for 2d...). Plus need to project the shadow coord, and just for fun keep projecting the layer coord too. Reviewed-by: Zack Rusin <[email protected]>
* gallivm: change coordinate handling throughout functionsRoland Scheidegger2013-08-153-133/+133
| | | | | | | | | | | | | | | | Instead of passing s,t,r coordinates pass a coord array - the reason is that I need to pass more coords (in particular for shadow "coord", future will also need another one for cube map arrays) so just pass them as an array. Also, to simplify things, use fixed location for the shadow reference value I want to get rid of the silly "where is the right coord value" game. Keep old-style however for aos sampling (which is not going to need shadow coord, though for cube map arrays it still would need fixing). (Next patch will pass those through using the new arrangement directly from sampler interface.) v2: fix up soa split path (unreachable currently but still...) Reviewed-by: Zack Rusin <[email protected]>
* gallivm: fix border color with normalized texture formatsRoland Scheidegger2013-08-151-13/+53
| | | | | | | | | | | | | We need to put border color into texture format color space which essentially means clamping for non-float, normalized formats (not entirely sure if we're also meant to quantize the float but it's probably ok not to do it thankfully). For OpenGL we could do this easily outside generated code due to the 1:1 sampler/texture correspondence but not for d3d10 which is terrible (as we recalculate a constant over and over again per shader invocation). Fortunately border color should be rare enough that we don't care THAT much. Reviewed-by: Zack Rusin <[email protected]>
* draw: make sure that the stages setup outputsZack Rusin2013-08-145-30/+62
| | | | | | | | | | | | Calling the prepare outputs cleans up the slot assignments for outputs, unfortunately aapoint and aaline didn't have code to reset their slots after the initial setup, this was messing up our slot assignments. The unfilled stage was just missing the initial assignment of the face slot. This fixes all of the reported piglit failures. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* vl: Add support for max level query v2Rico Schüller2013-08-142-0/+21
| | | | | | | | | This patch adds the level query support to the video decoders and uses some more reasonable defaults. v2: (ck) add commit message Reviewed-by: Christian König <[email protected]>
* gallivm: implement new float comparison instructions returning integer masksRoland Scheidegger2013-08-131-2/+79
| | | | | | | | | FSEQ/FSGE/FSLT/FSNE work just the same as SEQ/SGE/SLT/SNE except skip the select. And just for consistency use the same appropriate ordered/unordered comparisons for the old opcodes as well. Reviewed-by: Zack Rusin <[email protected]>
* tgsi: implement new float comparison instructions returning integer masksRoland Scheidegger2013-08-134-4/+102
| | | | | | | | Also while here add a bunch of other forgotten (integer) instructions to tgsi_util_get_inst_usage_mask() (which isn't used for much except optimizing away unused input components), though it may still be incomplete. Reviewed-by: Zack Rusin <[email protected]>
* gallivm: fix exec_mask interaction with geometry shader after end of mainRoland Scheidegger2013-08-122-16/+14
| | | | | | | | | | | | | | | | Because we must maintain an exec_mask even if there's currently nothing on the mask stack, we can still have an exec_mask at the end of the program. Effectively, this mask should be set back to default when returning from main. Without relying on END/RET opcode (I think it's valid to have neither) it is actually difficult to do this, as there doesn't seem any reasonable place to do it, so instead let's just say the exec_mask is invalid outside main (which it really is effectively). The problem is that geometry shader called end_primitive outside the shader (in the epilogue), and as a result used a bogus mask, leading to bugs if we had to set the (somewhat misnamed) ret_in_main bit anywhere. So just avoid the mask combining function when called from outside the shader. Reviewed-by: Zack Rusin <[email protected]>
* draw: simplify prim mask constructionRoland Scheidegger2013-08-121-22/+10
| | | | | | | | | The code was quite weird, the second comparison was in fact a complete no-op and we can also do the comparison with the vector directly instead of scalar, which should not also be faster but it is way more obvious how that mask is actually going to look like. Reviewed-by: Zack Rusin <[email protected]>
* gallivm: simplify geometry shader mask handling a bitRoland Scheidegger2013-08-121-36/+28
| | | | | | | | | | | | Instead of reducing masks to 0/1 simply use the mask directly as -1. Also use some signed comparison instead of unsigned (as far as I understand these values have to be (very) small and signed means llvm doesn't have to apply additional logic to do the unsigned comparisons the cpu can't do). Saves a couple of instructions in some test geometry shader here. v2: that was a bit to much optimization, don't skip combining the masks... Reviewed-by: Zack Rusin <[email protected]>
* draw: (trivial) dump tgsi for geometry shaders with GALLIVM_DEBUG_TGSIRoland Scheidegger2013-08-121-0/+5
| | | | | | And dump the variant key too (same as vs does). Just so I can stop wondering why I see the tgsi dump for fs and vs but not gs...
* gallivm: (trivial) fix typo in argument declaration of lp_build_size_query_soaRoland Scheidegger2013-08-121-1/+1
| | | | Was meant to match the name used elsewhere, spotted by Anthony.
* gallivm: set non-existing values really to zero in size queries for d3d10Roland Scheidegger2013-08-093-20/+20
| | | | | | | | | | | My previous attempt at doing so double-failed miserably (minification of zero still gives one, and even if it would not the value was never written anyway). While here also rename the confusingly named int_vec bld as we have int vecs of different sizes, and rename need_nr_mips (as this also changes out-of-bounds behavior) to is_sviewinfo too. Reviewed-by: Zack Rusin <[email protected]>
* gallivm: use texture target from shader instead of static state for size queryRoland Scheidegger2013-08-095-4/+75
| | | | | | | | | | | | | | | | | | | d3d10 has no notion of distinct array resources neither at the resource nor sampler view level. However, shader dcl of resources certainly has, and d3d10 expects resinfo to return the values according to that - in particular a resource might have been a 1d texture with some array layers, then the sampler view might have only used 1 layer so it can be accessed both as 1d or 1d array texture (I think - the former definitely works). resinfo of a resource decleared as array needs to return number of array layers but non-array resource needs to return 0 (and not 1). Hence fix this by passing the target from the shader decl to emit_size_query and use that (in case of OpenGL the target will come from the instruction itself). Could probably do the same for actual sampling, though it may not matter there (as the bogus components will essentially get clamped away), possibly could wreak havoc though if it REALLY doesn't match (which is of course an error but still). Reviewed-by: Zack Rusin <[email protected]>
* gallivm: honor d3d10's wishes of out-of-bounds behavior for texture size queryRoland Scheidegger2013-08-091-8/+27
| | | | | | | Specifically, must return 0 for non-existent mip levels (and non-existent textures which is an unsolved problem) for everything but total mip count. Reviewed-by: Zack Rusin <[email protected]>
* util: (trivial) fix asm input/output list for fxsaveRoland Scheidegger2013-08-091-1/+1
| | | | | Otherwise gcc might do very unsafe optimizations, spotted by Uros Bizjak. Hopefully this time it's finally right?
* draw: rewrite primitive assemblerZack Rusin2013-08-089-297/+180
| | | | | | | | | | | | We can't be injecting the primitive id's in the pipeline because by that time the primitives have already been decomposed. To properly number the primitives we need to handle the adjacency primitives by hand. This patch moves the prim id injection into the original primitive assembler and completely removes the useless pipeline stage. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: reset the vertex id when injecting new primitive idZack Rusin2013-08-081-0/+9
| | | | | | | | | | | | Without reseting the vertex id, with primitives where the same vertex is used with different primitives (e.g. tri/lines strips) our vbuf module won't re-emit those vertices with the changed primitive id. So lets reset the vertex id whenever injecting new primitive id to make sure that the vertex data is correctly emitted. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: cleanup the extra attribsZack Rusin2013-08-081-0/+1
| | | | | | | | | | Before inserting new front face and prim id outputs cleanup the old extra outputs, otherwise our cache will use previous output slots which will break as soon as outputs of the current shader don't match the last. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* util: (trivial) fix more compile errors in u_cpu_detect (gcc/x86 this time).Dieter Nützel2013-08-091-1/+1
| | | | Oops. Should fix https://bugs.freedesktop.org/show_bug.cgi?id=67921
* util: (trivial) fix compile error with MSVC on x86Roland Scheidegger2013-08-081-1/+1
|
* gallivm: honor d3d10 floating point rules for shadow comparisonsRoland Scheidegger2013-08-081-3/+17
| | | | | | | | d3d10 specifies ordered comparisons for everything but not_equal which is unordered (http://msdn.microsoft.com/en-us/library/windows/desktop/cc308050.aspx). OpenGL probably doesn't care. Reviewed-by: Zack Rusin <[email protected]>
* gallivm: don't clamp reference value for shadow comparison for float formatsRoland Scheidegger2013-08-081-4/+17
| | | | | | | | This is wrong both for OpenGL and d3d. (In fact clamping is a side effect of converting to depth format, so this should really do quantization too at least in d3d10 for the comparisons to be truly correct.) Reviewed-by: Zack Rusin <[email protected]>
* gallivm: propagate scalar_lod to emit_size_query tooRoland Scheidegger2013-08-085-0/+10
| | | | | | | Clearly the returned values need to be per-element if the lod is per element. Does not actually change behavior yet. Reviewed-by: Zack Rusin <[email protected]>
* gallivm: fix out-of-bounds behavior for fetch/ldRoland Scheidegger2013-08-083-30/+88
| | | | | | | | | | | | For d3d10 and ARB_robust_buffer_access_behavior, we are required to return 0 for out-of-bounds coordinates (for which we can just enable the code already there was just disabled). Additionally, also need to return 0 for out-of-bounds mip level and out-of-bounds layer. This changes the logic so instead of clamping the level/layer, an out-of-bound mask is computed instead in this case (actual clamping then can be omitted just like with coordinates, since we set the fetch offset to zero if that happens anyway). Reviewed-by: Zack Rusin <[email protected]>
* util: try much harder to set DAZ flagRoland Scheidegger2013-08-083-1/+31
| | | | | | | | | | | | | | | | | | While so far this only causes some harmless test failures, there's lots more cpus with DAZ. All 64bit capable ones can do it (particularly relevant for AMD cpus as they supported sse3 very very late) but if really necessary we can check support for that for real with some more magic. (In fact just about ANY cpu with sse2 can support DAZ, I believe the only exception are first gen P4 (Willamette) and from those only early steppings which can't do it it's almost like intel forgot to add it... - a real pity though docs say you can't just try to set it as they will throw a GPF.) While this was meant to address https://bugs.freedesktop.org/show_bug.cgi?id=67672 it does not fix it. Most likely the tests need fixing as I don't think there's any guarantee about denorm handling in the reference math library functions if the flags aren't set to standard values. Nevertheless enabling DAZ on all cpus which can do it should be the right thing to do. Reviewed-by: Jose Fonseca <[email protected]>
* util: implement table-based + linear interpolation linear-to-srgb conversionRoland Scheidegger2013-08-082-11/+102
| | | | | | | | | | | | | | | | | Should be much faster, seems to work in softpipe. While here (also it's now disabled) fix up the pow factor - the former value is what is in GL core it is however not actually accurate to fp32 standard (as it is 1.0/2.4), and if someone would do all the accurate math there's no reason to waste 8 mantissa bits or so... v2: use real table generating function instead of just printing the values (might take a bit longer as it does calculations on some 3+ million floats but much more descriptive obviously). Also fix up another inaccurate pow factor (this time in the python code) - wondering where the couple one bit errors came from :-(. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* gallivm: fix comment wrt srgb accuracy.Roland Scheidegger2013-08-081-2/+4
| | | | I think it's actually not good enough now...
* gallivm: Fix build - Remove TargetOptions.RealignStack for llvm>=3.4Laurent Carlier2013-08-061-0/+2
| | | | | | | | | | Since llvm -3.4svn r187618, TargetOptions doesn't provide RealignStack, so only enable it with llvm<3.4 This option must now be specified using function attributes, see LLVM commit r187618 Reviewed-by: Tom Stellard <[email protected]>
* draw: Change slot from unsigned to int.Vinson Lee2013-08-051-1/+1
| | | | | | | | | unfilled_stage::face_slot is of type int. Fixes "Unsigned compared against 0" defect reported by Coverity. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* postprocess: Check ppq is null before calling pp_free_bos.Vinson Lee2013-08-051-1/+3
| | | | | | | | | pp_free_bos dereferences ppq without a null check. Fixes "Dereference before null check" defect reported by Coverity. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* draw: add back separate input assemblerZack Rusin2013-08-036-4/+351
| | | | | | | | | | | the issue is that stream output is run before the pipeline, which means that unless we decompose the primitives before the so then things crash. we could convert the entire stream output code into a pipeline stage but it will take a bit, so for now fix the crashes by simply re-adding the old input assembler which is run before the SO. Signed-off-by: Zack Rusin <[email protected]>
* draw: implement proper primitive assembler as a pipeline stageZack Rusin2013-08-0312-352/+280
| | | | | | | | | | | | | we used to have a face primitive assembler that we ran after if the gs was missing but we had adjacency primitives in the pipeline, lets convert it to a pipeline stage, which allows us to use it to inject outputs (primitive id) into the vertices. it's also a lot cleaner because the decomposition is already handled for us. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* draw: fix front face injectionZack Rusin2013-08-031-9/+15
| | | | | | | | | | | | | Inject front face only if the fragment shader uses it and propagate through all channels because otherwise we'll need to figure out the exact swizzle that the fs expects and it's just simpler to make sure all the components within the front face register are correctly set. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* tgsi: remove unneeded File == TGSI_FILE_INPUT testBrian Paul2013-08-051-1/+0
| | | | We're already in an "if (File == TGSI_FILE_INPUT)" block at that point.
* tgsi: clean up tgsi_scan_shader() functionBrian Paul2013-08-051-41/+42
| | | | | | | | | Replace "fulldecl->Semantic.Name/Index" with semName/semIndex. Simplify if/else logic for TGSI_FILE_OUTPUT code. Remove old comment. Fix indentation. Reviewed-by: Jose Fonseca <[email protected]>
* draw: make sure clipping works with injected outputsZack Rusin2013-08-021-35/+54
| | | | | | | | | | | | | clipping would drop the extra outputs because it always used the number of standard vertex shader outputs, without geometry shader or extra outputs. The commit makes sure that clipping with geometry shaders which have more outputs than the current vertex shader and with extra outputs correctly propagates the entire vertex. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* draw: inject frontface info into wireframe outputsZack Rusin2013-08-024-0/+101
| | | | | | | | | | | | | | Draw module can decompose primitives into wireframe models, which is a fancy word for 'lines', unfortunately that decomposition means that we weren't able to preserve the original front-face info which could be derived from the original primitives (lines don't have a 'face'). To fix it allow draw module to inject a fake face semantic into outputs from which the backends can figure out the original frontfacing info of the primitives. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* draw: stop crashing with extra shader outputsZack Rusin2013-08-029-15/+52
| | | | | | | | | | | | | | | | | | Draw sometimes injects extra shader outputs (aa points, lines or front face), unfortunately most of the pipeline and llvm code didn't handle them at all. It only worked if number of inputs happened to be bigger or equal to the number of shader outputs plus the extra injected outputs. In particular when running the pipeline which depends on the vertex_id in the vertex_header things were completely broken. The patch adjust the code to correctly use the total number of shader outputs (the standard ones plus the injected ones) to make it all stop crashing and work. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* draw: use the vertex sizeZack Rusin2013-08-021-1/+1
| | | | | | | | | | Instead of using the magical 4 use the above computed vertex size. Doesn't change the behavior, just makes the code a bit cleaner. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* draw/llvm: add some extra debugging outputZack Rusin2013-08-021-0/+6
| | | | | | | | | | when dumping shader outputs it's nice to have the integer values of the outputs, in particular because some values are integers. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* tgsi: detect prim id and front face usage in fsZack Rusin2013-08-022-2/+8
| | | | | | | | | Adding code to detect the usage of prim id and front face semantics in fragment shaders. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* tgsi: add ucmp to the list of opcodesZack Rusin2013-08-021-0/+1
| | | | | | | | | we forgot to add ucmp to the list of opcodes, so it was never generated for ureg. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: obey clarified shift behaviorRoland Scheidegger2013-08-022-8/+24
| | | | | | | | | | | | | | | llvm shifts are undefined for shift counts exceeding (or matching) bit width, so need to apply a mask for the tgsi shift instructions. v2: only use mask for the tgsi shift instructions, not for the build shift helpers. None of the internal callers need this behavior, and while llvm can optimize away the masking for constants there are legitimate cases where it might not be able to do so even if we know that shift count must be smaller than type width (currently all such callers do not use the build shift helpers). Reviewed-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* tgsi: obey clarified shift behaviorRoland Scheidegger2013-08-021-12/+27
| | | | | | | | | c shifts are undefined for shift counts exceeding (or matching) bit width, so need to apply a mask (on x86 it actually would usually probably work as shifts do masking on int domain shifts - unless some auto-vectorizer would come along at last as simd domain does not mask the shift count). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: use nearest rounding for float->unorm24 conversionRoland Scheidegger2013-07-311-4/+6
| | | | | | | | | | | | | Previously we were using truncation, which gives the correct result only for numbers in [0.5-1.0] range (because there's no mantissa bits to do any rounding there). This is frequently hit (and probably only used there) when converting fragment depth to depth format (d24s8 etc.) or otherwise dealing with depth format. v2: as spotted by Jose, get rid of extra type (src_type is already unsigned). Reviewed-by: Jose Fonseca <[email protected]>