summaryrefslogtreecommitdiffstats
path: root/src/mesa/main/texcompress_cpal.h
diff options
context:
space:
mode:
authorRoland Scheidegger <[email protected]>2016-11-12 22:46:32 +0100
committerRoland Scheidegger <[email protected]>2016-11-18 01:25:21 +0100
commitb16f06fd0593099aad74775a41cf74d4c09c3f6a (patch)
treea9b090b2d9bc5f45b907a8fec1b08771016fd359 /src/mesa/main/texcompress_cpal.h
parent0cee3fd5c73acf7e3841a7d790e3ec3031b0fe41 (diff)
draw: use vectorized calculations for fetch (v2)
Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the mul could actually be optimized away too), where things are still scalar. To eliminate control flow in the main shader loop fetch, provide fake buffers (so index 0 is always valid to fetch). Still uses aos fetch though in the end - mostly because some more code would be needed to handle unaligned fetches in that path, and because for most formats it won't make a difference anyway (we generate some truly horrendous code for things like R16G16_something for instance). Instanced fetch however stays roughly the same as before, except that no longer the same element is fetched multiple times (I've seen a reduction of ~3 times in main shader loop size due to llvm not recognizing it's all the same fetch, since it would have been possible some of the fetches getting replaced with zeros in case vector size exceeds remaining fetch count - the values of such fetches don't matter at all though). Also, for elts gathering, use vectorized code as well. The generated shaders are smaller and faster to compile (not entirely sure about execution speed, but generally unless there's just single vertices to handle I would expect it to be faster - there's more opportunities for future improvements by using soa fetch). v3: skip the fake index buffer, not needed due to the jit code never seeing the real index buffer in the first place. Fix a bug with mask expansion (needs SExt, not ZExt). Also, be really really careful to keep the behavior the same, even in cases where it looks wrong, and add comments why the code is doing the seemingly wrong stuff... Fortunately it's not actually more complex in the end... Also change function order slightly just to make the diff more readable. No piglit change. Passes some internal testing with another api too... Reviewed-by: Jose Fonseca <[email protected]>
Diffstat (limited to 'src/mesa/main/texcompress_cpal.h')
0 files changed, 0 insertions, 0 deletions