summaryrefslogtreecommitdiffstats
path: root/m4
diff options
context:
space:
mode:
authorJosé Fonseca <[email protected]>2013-04-21 22:23:31 +0100
committerJosé Fonseca <[email protected]>2013-05-17 20:23:00 +0100
commit6166ffeaf70e96e3f94417f8db79ba2440462178 (patch)
tree896e210baedd519f8dac7fd9110ad79ff760034f /m4
parent5aaa4bafe04e601c2e42da76447f2b9297dc3a93 (diff)
gallivm: Eliminate 8.8 fixed point intermediates from AoS sampling path.
This change was meant as a stepping stone to use PMADDUBSW SSSE3 instruction, but actually this refactoring by itself yields a 10% speedup on texture intensive shaders (e.g, Google Earth's ocean water w/o S3TC on a Ivy Bridge machine), while giving yielding exactly the same results, whereas PMADDUBSW only gave an extra 5%, at the expense of 2bits of precision in the interpolation. I belive that the speedup of this change comes from the reduced register pressure (as 8.8 fixed point intermediates take twice the space of 8bit unorm). Also, not dealing with 8.8 simplifies lp_bld_sample_aos.c code substantially -- it's no longer necessary to have code duplicated for low and high register halfs. Note about lp_build_sample_mipmap(): the path for num_quads > 1 is never executed (as it is faster on AVX to split the 256bit wide texture computation into two 128bit chunks, in order to leverage integer opcodes). This path might be useful in the future, so in order to verify this change did not break that path I had to apply this change: @@ -1662,11 +1662,11 @@ lp_build_sample_soa(struct gallivm_state *gallivm, /* * we only try 8-wide sampling with soa as it appears to * be a loss with aos with AVX (but it should work). * (It should be faster if we'd support avx2) */ - if (num_quads == 1 || !use_aos) { + if (/* num_quads == 1 || ! */ use_aos) { if (num_quads > 1) { if (mip_filter == PIPE_TEX_MIPFILTER_NONE) { LLVMValueRef index0 = lp_build_const_int32(gallivm, 0); /* and then run texfilt mesademo: LP_NATIVE_VECTOR_WIDTH=256 ./texfilt Ran whole piglit without regressions. Reviewed-by: Roland Scheidegger <[email protected]>
Diffstat (limited to 'm4')
0 files changed, 0 insertions, 0 deletions