summaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary/draw/draw_llvm.c
diff options
context:
space:
mode:
authorRoland Scheidegger <[email protected]>2016-12-11 23:39:22 +0100
committerRoland Scheidegger <[email protected]>2016-12-21 04:48:24 +0100
commit8bd67a35c50e68c21aed043de11e095c284d151a (patch)
tree990f5b573013944b86fca5fe923a9fcc79c88243 /src/gallium/auxiliary/draw/draw_llvm.c
parent5b950319ced820ee112f38f69b5694179c15815d (diff)
gallivm: optimize gather a bit, by using supplied destination type
By using a dst_type in the the gather interface, gather has some more knowledge about how values should be fetched. E.g. if this is a 3x32bit fetch and dst_type is 4x32bit vector gather will no longer do a ZExt with a 96bit scalar value to 128bit, but just fetch the 96bit as 3x32bit vector (this is still going to be 2 loads of course, but the loads can be done directly to simd vector that way). Also, we can now do some try to use the right int/float type. This should make no difference really since there's typically no domain transition penalties for such simd loads, however it actually makes a difference since llvm will use different shuffle lowering afterwards so the caller can use this to trick llvm into using sane shuffle afterwards (and yes llvm is really stupid there - nothing against using the shuffle instruction from the correct domain, but not at the cost of doing 3 times more shuffles, the case which actually matters is refusal to use shufps for integer values). Also do some attempt to avoid things which look great on paper but llvm doesn't really handle (e.g. fetching 3-element 8 bit and 16 bit vectors which is simply disastrous - I suspect type legalizer is to blame trying to extend these vectors to 128bit types somehow, so fetching these with scalars like before which is suboptimal due to the ZExt). Remove the ability for truncation (no point, this is gather, not conversion) as it is complex enough already. While here also implement not just the float, but also the 64bit avx2 gathers (disabled though since based on the theoretical numbers the benefit just isn't there at all until Skylake at least). Reviewed-by: Jose Fonseca <[email protected]>
Diffstat (limited to 'src/gallium/auxiliary/draw/draw_llvm.c')
-rw-r--r--src/gallium/auxiliary/draw/draw_llvm.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c
index c5485728e42..19b75a5003b 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -1864,7 +1864,7 @@ draw_llvm_generate(struct draw_llvm *llvm, struct draw_llvm_variant *variant)
LLVMPointerType(LLVMInt8TypeInContext(context),
0), "");
tmp = lp_build_gather(gallivm, vs_type.length,
- 32, 32, TRUE,
+ 32, bld.type, TRUE,
fetch_elts, tmp, FALSE);
LLVMBuildStore(builder, tmp, index_store);
}