diff options
author | José Fonseca <[email protected]> | 2013-04-21 22:23:31 +0100 |
---|---|---|
committer | José Fonseca <[email protected]> | 2013-05-17 20:23:00 +0100 |
commit | 6166ffeaf70e96e3f94417f8db79ba2440462178 (patch) | |
tree | 896e210baedd519f8dac7fd9110ad79ff760034f /m4 | |
parent | 5aaa4bafe04e601c2e42da76447f2b9297dc3a93 (diff) |
gallivm: Eliminate 8.8 fixed point intermediates from AoS sampling path.
This change was meant as a stepping stone to use PMADDUBSW SSSE3
instruction, but actually this refactoring by itself yields a 10%
speedup on texture intensive shaders (e.g, Google Earth's ocean water
w/o S3TC on a Ivy Bridge machine), while giving yielding exactly the
same results, whereas PMADDUBSW only gave an extra 5%, at the expense of
2bits of precision in the interpolation.
I belive that the speedup of this change comes from the reduced register
pressure (as 8.8 fixed point intermediates take twice the space of 8bit
unorm).
Also, not dealing with 8.8 simplifies lp_bld_sample_aos.c code
substantially -- it's no longer necessary to have code duplicated for
low and high register halfs.
Note about lp_build_sample_mipmap(): the path for num_quads > 1 is never
executed (as it is faster on AVX to split the 256bit wide texture
computation into two 128bit chunks, in order to leverage integer
opcodes). This path might be useful in the future, so in order to
verify this change did not break that path I had to apply this change:
@@ -1662,11 +1662,11 @@ lp_build_sample_soa(struct gallivm_state *gallivm,
/*
* we only try 8-wide sampling with soa as it appears to
* be a loss with aos with AVX (but it should work).
* (It should be faster if we'd support avx2)
*/
- if (num_quads == 1 || !use_aos) {
+ if (/* num_quads == 1 || ! */ use_aos) {
if (num_quads > 1) {
if (mip_filter == PIPE_TEX_MIPFILTER_NONE) {
LLVMValueRef index0 = lp_build_const_int32(gallivm, 0);
/*
and then run texfilt mesademo:
LP_NATIVE_VECTOR_WIDTH=256 ./texfilt
Ran whole piglit without regressions.
Reviewed-by: Roland Scheidegger <[email protected]>
Diffstat (limited to 'm4')
0 files changed, 0 insertions, 0 deletions