summaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary/gallivm
Commit message (Collapse)AuthorAgeFilesLines
* llvmpipe: add support for b5g6r5_srgbRoland Scheidegger2014-03-213-5/+25
| | | | | | | | | | | | | The conversion code for srgb was tuned for n x 4x8bit AoS -> 4 x nxfloat SoA (and vice versa), fix this to handle also 16bit 565-style srgb formats. Still not really all that generic, things like r10g10b10a2_srgb or r4g4b4a4_srgb wouldn't work (the latter trivial to fix, the former would not require more work to not crash but near certainly need some higher precision calculation) but not needed right now. The code is not fully optimized for this (could use more direct calculation instead of expanding to 8-bit range first) but should be good enough. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: optimize repeat linear npot code in the aos int pathJeff Muizelaar2014-03-141-12/+62
| | | | | | Similar to the other cases, shift some weight/coord calculations to int space. This should be slightly faster (on x86 sse it should actually safe one instruction, and generally int instructions are cheaper).
* gallivm: use correct rounding for nearest wrap mode (in the aos int path)Roland Scheidegger2014-03-141-29/+9
| | | | | | | | | | | | | The previous code used coords which were calculated as (int) (f_coord * tex_size * 256) >> 8. This is not only unnecessarily complex but can give the wrong texel due to rounding for negative coords (as an example, after denormalization coords from -1.0 to 0.0 should give -1, but this will give -1 for numbers from -1.0-1/256 - 0.0-1/256. Instead, juse use ifloor, dropping the shift stuff. Unfortunately, this will most likely be slower - with arch rounding available it shouldn't be too bad (trades a int shift for a round but also saves an int mul (which is shared by all coords) but otherwise it's a mess.
* gallivm: use correct rounding for linear wrap mode (in the aos int path)Jeff Muizelaar2014-03-141-6/+8
| | | | | | | | | | | | | | | | | | | The previous method for converting coords to ints was sligthly inaccurate (effectively losing 1bit from the 8bit lerp weight). This is probably especially noticeable when trying to draw a pixel-aligned texture. As an example, for a 100x100 texture after dernormalization the texture coords in this case would turn up as 0.5, 1.5, 2.5, 3.5, 4.5, ... After the mul by 256, conversion to int and 128 subtraction, they end up as 0, 256, 512, 768, 1024, ... which gets us the correct coords/weights of 0/0, 1/0, 2/0, 3/0, 4/0, ... But even LSB errors (which are unavoidable) in the input coords may cause these coords/weights to be wrong, e.g. for a coord of 3.49999 we'd get a coord/weight of 2/255 instead. Fix this by using round-to-nearest int instead of FPToSi (trunc). Should be equally fast on x86 sse though other archs probably suffer a little.
* gallivm: add smallfloat to float conversion not relying on cpu denorm handlingRoland Scheidegger2014-02-201-20/+65
| | | | | | | | | | | | | | | | | | | | | | The previous code relied on cpu denorm support for converting small float formats (such r11g11b10_float and r16_float) to floats, otherwise denorms are flushed to zero. We worked around that in llvmpipe blend code by reenabling denorms, but this did nothing for texture sampling. Now it would be possible to reenable it there too but I'm not really a fan of messing with fpu flags (and it seems we can't actually do it reliably with llvm in any case looking at some bug reports). (Not to mention if you actually have a lot of denorms in there, you can expect some order-of-magnitude slowdown with x86 cpus.) So instead use code which adjusts exponents etc. directly hence not relying on cpu denorm support for the rescaling mul. (We still need the fpu flag handling as we can't do float-to-smallfloat without using cpu denorms at least for now - I actually wanted to keep both the old and new code and using one or the other depending on from where it's called but that didn't work out as the parameter would have to be passed through too many layers than I'd like.) Reviewed-by: Zack Rusin <[email protected]> Reviewed-by: Si Chen <[email protected]>
* gallivm: make sure analysis works with large number of immediatesZack Rusin2014-02-051-8/+9
| | | | | | | | | | | We need to handle a lot more immediates and in order to do that we also switch from allocating this structure on the stack to allocating it on the heap. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: handle huge number of immediatesZack Rusin2014-02-054-44/+86
| | | | | | | | | | | | | | We only supported up to 256 immediates, which isn't enough. We had code which was allocating immediates as an allocated array, but it was always used along a statically backed array for performance reasons. This commit adds code to skip that performance optimization and always use just the dynamically allocated immediates if the number of them is too great. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: allow large numbers of temporariesZack Rusin2014-02-054-5/+20
| | | | | | | | | | | | | | The number of allowed temporaries increases almost with every iteration of an api. We used to support 128, then we started increasing and the newer api's support 4096+. So if we notice that the number of temporaries is larger than our statically allocated storage would allow we just treat them as indexable temporaries and allocate them as an array from the start. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: fix F2U opcodeRoland Scheidegger2014-02-051-20/+22
| | | | | | | | | | | | | | | | | | | | | Previously, we were really doing F2I. And also move it to generic section. (Note that for llvmpipe the code generated is definitely bad, due to lack of unsigned conversions with sse. I think though what llvm does (using scalar conversions to 64bit signed either with x87 fpu (32bit) or sse (64bit) including lots of domain changes is quite suboptimal, could do something like is_large = arg >= 2^31 half_arg = 0.5 * arg small_c = fptoint(arg) large_c = fptoint(half_arg) << 1 res = select(is_large, large_c, small_c) which should be much less instructions but that's something llvm should do itself.) This fixes piglit fs/vs-float-uint-conversion.shader_test (maybe more, needs GL 3.0 version override to run.) Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* gallivm: fix opcode and function nestingZack Rusin2014-02-032-157/+317
| | | | | | | | | | | | | | gallivm soa code supported only a single level of nesting for control flow opcodes (if, switch, loops...) but the d3d10 spec clearly states that those are nested within functions. To support nesting of conditionals inside functions we need to store the nesting data inside function contexts and keep a stack of those. Furthermore we make sure that if nesting for subroutines is deeper than 32 then we simply ignore all subsequent 'call' invocations. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: add a few const qualifiersBrian Paul2014-02-022-4/+4
| | | | Trivial.
* gallivm: Workaround http://llvm.org/PR18600José Fonseca2014-01-281-2/+4
| | | | | | | | | | | | | | | | | | We have code generation paths that carry out swizzles of AoS vectors via bitwise shifts, as these tend to generate more efficient code than straightforward byte shuffles. But when the input is a constant the additional bitwise arithmetic operations somehow don't really get constant propagated properly, evenutally causing assertion failure in InstCombine pass. Therefore avoid the bug by using the trivial shuffles for constant inputs. Although the sample LLVM IR can cause a crash with any LLVM version, this was only seen in practice with LLVM 3.2. Reviewed-by: Matthew McClure <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* s/Tungsten Graphics/VMware/José Fonseca2014-01-174-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the old copyright name is creating unnecessary confusion, hence this change. This was the sed script I used: $ cat tg2vmw.sed # Run as: # # git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed # # Rename copyrights s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g /Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./ s/TUNGSTEN GRAPHICS/VMWARE/g # Rename emails s/[email protected]/[email protected]/ s/[email protected]/[email protected]/g s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/ s/jrfonseca\[email protected]/[email protected]/g s/keithw\[email protected]/[email protected]/g s/[email protected]/[email protected]/g s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/ s/[email protected]/[email protected]/ # Remove dead links s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g # C string src/gallium/state_trackers/vega/api_misc.c s/"Tungsten Graphics, Inc"/"VMware, Inc"/ Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: do constant buffer bounds checking in shadersZack Rusin2014-01-162-15/+98
| | | | | | | | | | | | | | | | | | | | It's possible to bind a smaller buffer as a constant buffer, than what the shader actually uses/requires. This could cause nasty crashes. This patch adds the architecture to pass the maximum allowable constant buffer index to the jit to let it make sure that the constant buffer indices are always within bounds. The behavior follows the d3d10 spec, which says the overflow should always return all zeros, and overflow is only defined as access beyond the size of the currently bound buffer. Accesses beyond the declared shader constant register size are not considered an overflow and expected to return garbage but consistent garbage (we follow the behavior which some wlk tests expect which is to return the actual values from the bound buffer). Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: fix pointer type for stmxcsr/ldmxcsrRoland Scheidegger2013-12-141-2/+7
| | | | | | | | The argument is a i8 pointer not a i32 pointer (even though the value actually stored/loaded IS i32). Older llvm versions didn't care but 3.2 and newer do leading to crashes. Reviewed-by: Zack Rusin <[email protected]>
* llvmpipe: fix blending with half-float formatsZack Rusin2013-12-102-0/+82
| | | | | | | | | | | | | The fact that we flush denorms to zero breaks our half-float conversion and blending. This patches enables denorms for blending. It's a little tricky due to the llvm bug that makes it incorrectly reorder the mxcsr intrinsics: http://llvm.org/bugs/show_bug.cgi?id=6393 Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Zack Rusin <[email protected]>
* gallium: new shader cap bit for the amount of sampler viewsRoland Scheidegger2013-11-281-0/+2
| | | | | | | | | Ever since introducing separate sampler and sampler view max this was really missing. Every driver but llvmpipe reports the same number as number of samplers for now, so nothing should break. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Ignore unknown file type in non-debug builds.Vinson Lee2013-11-201-0/+1
| | | | | | | Fixes "Uninitialized pointer read" defect reported by Coverity. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Fix mask calculation for emit_kill_if.Si Chen2013-11-191-5/+8
| | | | | | | | | | The exec_mask must be taken in consideration, just like emit_kill above. The tgsi_exec module has the same bug and should be fixed in a future change. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* gallivm: Compile flag to debug TGSI execution through printfs.José Fonseca2013-11-143-47/+192
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is similar to tgsi_exec.c's DEBUG_EXECUTION compile flag. I had prototyped this for a while while debugging an issue, but finally cleaned this up and added a few more bells and whistles. v2: Use '$' as marker; better output. Thanks to Brian, Zack and Roland reviews. Here is a sample output. CONST[0].x = 0.00625000009 0.00625000009 0.00625000009 0.00625000009 CONST[0].y = -0.00714285718 -0.00714285718 -0.00714285718 -0.00714285718 CONST[0].z = -1 -1 -1 -1 CONST[0].w = 1 1 1 1 IN[0].x = 143.5 175.5 175.5 143.5 IN[0].y = 123.5 123.5 155.5 155.5 IN[0].z = 0 0 0 0 IN[0].w = 1 1 1 1 $ 1: RCP TEMP[0].w, IN[0].wwww TEMP[0].w = 1 1 1 1 $ 2: MAD TEMP[0].xy, IN[0], CONST[0], CONST[0].zwzw TEMP[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976 TEMP[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316 $ 3: MUL OUT[0].xy, TEMP[0], TEMP[0].wwww OUT[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976 OUT[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316 $ 4: MUL OUT[0].z, IN[0].zzzz, TEMP[0].wwww OUT[0].z = 0 0 0 0 $ 5: MOV OUT[0].w, TEMP[0] OUT[0].w = 1 1 1 1 $ 6: END OUT[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976 OUT[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316 OUT[0].z = 0 0 0 0 OUT[0].w = 1 1 1 1
* gallivm,llvmpipe: fix float->srgb conversion to handle NaNsRoland Scheidegger2013-11-144-26/+43
| | | | | | | | | | | | | | | | | | | | | | | | d3d10 requires us to convert NaNs to zero for any float->int conversion. We don't really do that but mostly seems to work. In particular I suspect the very common float->unorm8 path only really passes because it relies on sse2 pack intrinsics which just happen to work by luck for NaNs (float->int conversion in hw gives integer indeterminate value, which just happens to be -0x80000000 hence gets converted to zero in the end after pack intrinsics). However, float->srgb didn't get so lucky, because we need to clamp before blending and clamping resulted in NaN behavior being undefined (and actually got converted to 1.0 by clamping with sse2). Fix this by using a zero/one clamp with defined nan behavior as we can handle the NaN for free this way. I suspect there's more bugs lurking in this area (e.g. converting floats to snorm) as we don't really use defined NaN behavior everywhere but this seems to be good enough. While here respecify nan behavior modes a bit, in particular the return_second mode didn't really do what we wanted. From the caller's perspective, we really wanted to say we need the non-nan result, but we already know the second arg isn't a NaN. So we use this now instead, which means that cpu architectures which actually implement min/max by always returning non-nan (that is adhering to ieee754-2008 rules) don't need to bend over backwards for nothing. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: deduplicate some indirect register address codeRoland Scheidegger2013-11-081-157/+96
| | | | | | | | | There's only one minor functional change, for immediates the pixel offsets are no longer added since the values are all the same for all elements in any case (it might be better if those weren't stored as soa vectors in the first place maybe). Reviewed-by: Zack Rusin <[email protected]>
* gallivm: fix indirect addressing of inputsRoland Scheidegger2013-11-061-17/+28
| | | | | | | | | | | | We weren't adding the soa offsets when constructing the indices for the gather functions. That meant that we were always returning the data in the first element. (Copied straight from the same fix for temps.) While here fix up a couple of broken comments in the fetch functions, plus don't name a straight float type float4 which is just confusing. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* gallivm: optimize lp_build_minify for sseRoland Scheidegger2013-11-053-13/+54
| | | | | | | | | | | | | | | | | SSE can't handle true vector shifts (with variable shift count), so llvm is turning them into a mess of extracts, scalar shifts and inserts. It is however possible to emulate them in lp_build_minify with float muls, which should be way faster (saves over 20 instructions per 8-wide lp_build_minify). This wouldn't work for "generic" 32bit shifts though since we've got only 24bits of mantissa (actually for left shifts it would work by using sse41 int mul instead of float mul but not for right shifts). Note that this has very limited scope for now, since this is only used with per-pixel lod (otherwise we're avoiding the non-constant shift count by doing per-quad shifts manually), and only 1d textures even then (though the latter should change). Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Remove llvm::DisablePrettyStackTrace for LLVM >= 3.4.Vinson Lee2013-11-041-0/+2
| | | | | | | | | | | | | LLVM 3.4 r193971 removed llvm::DisablePrettyStackTrace and made the pretty stack trace opt-in rather than opt-out. The default value of DisablePrettyStackTrace has changed to true in LLVM 3.4 and newer. Signed-off-by: Vinson Lee <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60929 Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallivm: implement fully accurate corner filtering for seamless cube mapsRoland Scheidegger2013-10-251-13/+151
| | | | | | | | | | | | | | | | | | | | | | d3d10 requires that cube corners are filtered with accurate weights (that is, the weight of the non-existing corner texel should be evenly distributed to the other 3 texels). OpenGL does not require this (but recommends it). This requires us to use different filtering code, since we need per-texel weights which our 2d lerp doesn't (and can't) do. And of course the (now per element) weights need to be adjusted too for it to work. Invoke the new filtering code whenever there's an edge to keep things simpler, as it will work for edges too not just corners but of course it's only needed with corners. More ugly code for not much gain but at least a hacked up cubemap demo shows very nice corners now... Not sure yet if and how this should be configurable... v2: incorporate feedback from Jose, only use special corner filtering code when there's a corner not when there's only an edge (as corner filtering code is slower, though a perf difference was only measureable when always forcing edge code). Plus some minor style fixes. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: implement seamless cube filteringRoland Scheidegger2013-10-213-40/+368
| | | | | | | | | | | | | | | | | | | | | | | | | For seamless cube filtering it is necessary to determine new faces and new coords per sample. The logic for this is _seriously_ complex (what needs to happen is very "asymmetric" wrt face, x/y under/overflow), further complicated by the fact that if the 4 samples are in a corner (meaning we only have actually 3 samples, and all 3 are on different faces) then falling off the edge is happening _both_ on x and y axis simultaneously. There was a noticeable performance hit in mesa's cubemap demo when seamless filtering was forced on (just below 10 percent or so in a debug build, when disabling all filtering hacks, otherwise it would probably be a bit more) and when always doing the logic, hence use a branch which it only does it if any of the pixels in a quad (or in two quads) actually hit this. With that there was no measurable performance hit in the cubemap demo (neither in a debug nor release buidl), but this will vary (cubemap demo very rarely hits edges). Might also be different on other cpus, as this forces SoA sampling path which potentially can be quite a bit slower. Note that as for corners, this code gets all the 3 samples which actually exist right, and the 4th texel will simply be the same as one of the others, meaning that filter weights will be a bit wrong. This however should be enough for full OpenGL (but not d3d10) compliance. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallivm: kill old per-quad face selection codeRoland Scheidegger2013-10-101-475/+286
| | | | | | | | | | Not used since ages, and it wouldn't work at all with explicit derivatives now (not that it did before as it ignored them but now the code would just use the derivs pre-projected which would be quite random numbers). v2: also get rid of 3 helper functions no longer used. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: handle explicit derivatives for cubemapsRoland Scheidegger2013-10-103-56/+235
| | | | | | | | | | | | | | | | | | | | They need some special handling. Quite complicated. Additionally, use the same code for implicit derivatives too if no_rho_approx and no_quad_lod is set, because it seems while generally it should be ok to use per quad lod for implicit derivatives there's at least some test which insists that in case of cubemaps the shared lod value MUST come from a pixel inside the primitive (due to the derivatives becoming different if a different larger major axis is chosen). v2: based on Brian's feedback, clean up code a bit. And use sign bit of major axis instead of pre-select s/t/r sign for coord mirroring (which should be the same in the end, saves 2 ands). Also fix two bugs with select/mirror of derivatives, the minor axes need to use major axis sign as well (instead of major derivative axis sign), and don't mistakenly use absolute values of major derivative and inverse major values. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: ignore rho approximation for cube mapsRoland Scheidegger2013-10-101-30/+20
| | | | | | | | | | | | | | There's two reasons for this: 1) even when ignoring rho approximation for cube maps, the result is still not correct, but it's better as the max error at edges is now sqrt(2) instead of 2 (which was a full mip level), same as it is for ordinary 2d maps when doing rho approximations (so the error actually goes from factor 2 at edges and sqrt(2) completely inside a face to sqrt(2) at edges and 0 inside a face). 2) I want to repurpose rho_no_approx for cubemaps for fully correct cubemap derivatives (so don't need yet another debug var). Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: implement 64 bit mul opcodes in llvmpipeZack Rusin2013-10-091-0/+60
| | | | | | | | | Both the imul_hi and umul_hi are working with this patch. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallivm: support printing of 64 bit integersZack Rusin2013-10-091-1/+6
| | | | | | | only 8 and 32 bit integers were supported before. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* gallivm: adjust wrap mode to CLAMP_TO_EDGE always for cube maps.Roland Scheidegger2013-09-191-3/+7
| | | | | | | | | | | | Technically without seamless filtering enabled GL allows any wrap mode, which made sense when supporting true borders (can get seamless effect with border and CLAMP_TO_BORDER), but gallium doesn't support borders and d3d9 requires wrap modes to be ignored and it's a pain to fix up the sampler state (as it makes it texture dependent). It is difficult to imagine a situation where an app really wants another behavior so just cheat here. (It looks like some graphics hw (intel) actually requires this too hence it should be safe.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: some bits of seamless cube filtering implementationRoland Scheidegger2013-09-183-14/+29
| | | | | | | | | | | | Simply adjust wrap mode to clamp_to_edge. This is all that's needed for a correct implementation for nearest filtering, and it's way better than using repeat wrap for instance for linear filtering (though obviously this doesn't actually do seamless filtering). v2: fix s/t wrap not r/s... Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: Remove the special path for TGSI_OPCODE_EXP.José Fonseca2013-09-123-72/+30
| | | | | | It was wrong for EXP.y, as we clamped the source before computing the fractional part, and this opcode should be rarely used, so it's not worth the hassle.
* gallivm: support indirect registers on both dimensionsZack Rusin2013-09-062-2/+5
| | | | | | | | | | | We support indirect addressing only on the vertex index, but some shaders also use indirect addressing on attributes. This patch adds support for indirect addressing on both dimensions inside gs arrays. Signed-off-by: Zack Rusin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* gallivm: handle unbound textures in texture sampling / texture queriesRoland Scheidegger2013-08-301-0/+26
| | | | | | | | | | | | | | Turns out we don't need to do much extra work for detecting this case, since we are guaranteed to get a empty static texture state in this case, hence just rely on format being 0 and return all zero then. Previously needed dummy textures (would just have crashed on format being 0 otherwise) which cannot return the correct result for size queries and when sampling textures with wrap modes using border. As a bonus should hugely increase performance when sampling unbound textures - too bad it isn't a useful feature :-). Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* gallivm: (trivial) don't pass sampler_unit variable down to filtering funcsRoland Scheidegger2013-08-301-36/+21
| | | | | | | The only reason this was needed was because the fetch texel function had to get the (dynamic) border color, but this is now done much earlier. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: don't use AoS path if min/mag filter are different with multiple lodsRoland Scheidegger2013-08-301-1/+6
| | | | | | | | Instead of enhancing the AoS path so it can deal with it, just use SoA. Fixing AoS path wouldn't be all that difficult (use all the same logic as SoA) but considered not worth it for now. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: support per-pixel min/mag filter in SoA pathRoland Scheidegger2013-08-301-43/+243
| | | | | | | | | | | | | | | | | | | | | | | Since we can have per-pixel lod we should also honor the filter per-pixel (in fact we didn't honor it per quad neither in the multiple quad case). Do this by running the linear path and simply beating the weights into shape (the sample with the higher weight is the one which should have been chosen with nearest filtering hence adjust filter weight to 1.0/0.0 based on that). If all pixels use nearest filter (either min and mag) then still run just a nearest filter as this is way cheaper (probably around 4 times faster for 2d, more for 3d case) and it should be relatively rare that pixels really need different filtering. OTOH if all pixels would require linear don't do anything special since the linear path with filter adjustments shouldn't really be all that much more expensive than ordinary linear, and we think it's rare that min/mag filters are configured differently so there doesn't seem much value in trying to optimize this further. This does not yet fix the AoS path (though currently AoS is only used for single quads hence it could be considered less broken, just never honoring per-pixel filter decision but doing it per quad). v2: simplify code a bit (unify min linear and min nearest cases) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: don't calculate square root of rho if we use accurate rho methodRoland Scheidegger2013-08-301-39/+74
| | | | | | | | | | | | | | | | | | | | While a sqrt here and there shouldn't hurt much (depending on the cpu) it is possible to completely omit it since rho is only used for calculating lod and there log2(x) == 0.5*log2(x^2). Depending on the exact path taken for calculating lod this means we get a simple mul instead of sqrt (in case of nearest mip filter in fact we don't need to replace the sqrt with something else at all), only in some not very useful path this doesn't work (combined brilinear calculation of int level and fractional lod, accurate rho calc but brilinear filtering seems odd). Apart from being faster as an added bonus this should increase our crappy fractional accuracy of lod, since fast_log2 is only good for ~3bits and this should increase accuracy by one bit (though not used if dimension is just one as we'd need an extra mul there as we never had the squared rho in the first place). v2: use separate ilog2_sqrt function if we have squared rho. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: refactor num_lods handlingRoland Scheidegger2013-08-304-131/+169
| | | | | | | | | | | | | This is just preparation for per-pixel (or per-quad in case of multiple quads) min/mag filter since some assumptions about number of miplevels being equal to number of lods no longer holds true. This change does not change behavior yet (though theoretically when forcing per-element path it might be slower with different min/mag filter since the code will respect this setting even when there's no mip maps now in this case, so some lod calcs will be done per-element just ultimately still the same filter used for all pixels). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: fix min/mag switchover point for nearest/none mip filterRoland Scheidegger2013-08-235-66/+81
| | | | | | | | | | | | | | Previously, the min/mag switchover point when using nearest/none mip filter was effectively -0.5 which can't be right. Looks like new OpenGL thinks it's ok if it's always 0.0 (older versions required 0.5 in some cases), let's hope everybody else thinks that's fine too. Refactor this slightly and get the per-quad/per-pixel min/mag decision values further down to sampling, though still only the first component is used yet. While here also fix code trying to skip lod bias application etc. when mipfilter is none, as this is still needed for determining min/mag filter. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: do per-element lod for lod bias and explicit derivs tooRoland Scheidegger2013-08-222-31/+74
| | | | | | | | | | | | Except for explicit derivs with cube maps which are very bogus anyway. Just like explicit lod this is only used if no_quad_lod is set in GALLIVM_DEBUG env var. Minification is terrible on cpus which don't support true vector shifts (but should work correctly). Cannot do the min/mag filter decision (if they are different) per pixel though, only selecting different mip levels works. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) fix int/uint border color clampingRoland Scheidegger2013-08-221-2/+2
| | | | | | | | | Just a copy & paste error. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=68409. Note that the test passing before probably simply means it doesn't verify clamping of the border color itself as required by the OpenGL spec. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) fix linear aos sampling of 3d compressed formatsRoland Scheidegger2013-08-221-2/+2
| | | | | | | | block size depth is always 1 even for compressed formats (unless someone invents true 3d compressed formats at least which we can't represent). Nearest (and soa) path had it right. Reviewed-by: Jose Fonseca <[email protected]>
* gallium: Support PIPE_FORMAT_R10G10B10A2_UINT.José Fonseca2013-08-221-0/+1
| | | | | | Same as PIPE_FORMAT_B10G10R10A2_UINT but without the swizzling. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: unify sin and cos implementationRoland Scheidegger2013-08-212-255/+53
| | | | | | | | | | | The (complicated!) math is all identical, there's just minimal differences how sign bit is calculated plus there's an additional subtraction for the argument going into the polynomial for cos. The logic stays 100% the same (with a small exception, sign bit calculation for sin is minimally simplified, applying sign mask after xoring the arguments instead of applying it to each argument). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: add comment for bogus min/mag filter selection with nearest mip filterRoland Scheidegger2013-08-213-2/+10
| | | | | | | Detected this hunting some other bug, not sure if it really needs fixing but it is definitely wrong. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: fix rho calculation for 1d caseRoland Scheidegger2013-08-211-1/+1
| | | | | | | Was using wrong (undefined) vector element (the elements are at 0/2 position, not 0/1). Reviewed-by: Jose Fonseca <[email protected]>