diff options
author | Roland Scheidegger <[email protected]> | 2016-12-11 23:39:22 +0100 |
---|---|---|
committer | Roland Scheidegger <[email protected]> | 2016-12-21 04:48:24 +0100 |
commit | 8bd67a35c50e68c21aed043de11e095c284d151a (patch) | |
tree | 990f5b573013944b86fca5fe923a9fcc79c88243 /src/gallium/auxiliary/draw | |
parent | 5b950319ced820ee112f38f69b5694179c15815d (diff) |
gallivm: optimize gather a bit, by using supplied destination type
By using a dst_type in the the gather interface, gather has some more
knowledge about how values should be fetched.
E.g. if this is a 3x32bit fetch and dst_type is 4x32bit vector gather
will no longer do a ZExt with a 96bit scalar value to 128bit, but
just fetch the 96bit as 3x32bit vector (this is still going to be
2 loads of course, but the loads can be done directly to simd vector
that way).
Also, we can now do some try to use the right int/float type. This should
make no difference really since there's typically no domain transition
penalties for such simd loads, however it actually makes a difference
since llvm will use different shuffle lowering afterwards so the caller
can use this to trick llvm into using sane shuffle afterwards (and yes
llvm is really stupid there - nothing against using the shuffle
instruction from the correct domain, but not at the cost of doing 3 times
more shuffles, the case which actually matters is refusal to use shufps
for integer values).
Also do some attempt to avoid things which look great on paper but llvm
doesn't really handle (e.g. fetching 3-element 8 bit and 16 bit vectors
which is simply disastrous - I suspect type legalizer is to blame trying
to extend these vectors to 128bit types somehow, so fetching these with
scalars like before which is suboptimal due to the ZExt).
Remove the ability for truncation (no point, this is gather, not conversion)
as it is complex enough already.
While here also implement not just the float, but also the 64bit avx2
gathers (disabled though since based on the theoretical numbers the benefit
just isn't there at all until Skylake at least).
Reviewed-by: Jose Fonseca <[email protected]>
Diffstat (limited to 'src/gallium/auxiliary/draw')
-rw-r--r-- | src/gallium/auxiliary/draw/draw_llvm.c | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index c5485728e42..19b75a5003b 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -1864,7 +1864,7 @@ draw_llvm_generate(struct draw_llvm *llvm, struct draw_llvm_variant *variant) LLVMPointerType(LLVMInt8TypeInContext(context), 0), ""); tmp = lp_build_gather(gallivm, vs_type.length, - 32, 32, TRUE, + 32, bld.type, TRUE, fetch_elts, tmp, FALSE); LLVMBuildStore(builder, tmp, index_store); } |