summaryrefslogtreecommitdiffstats
path: root/CleanSpec.mk
diff options
context:
space:
mode:
authorRoland Scheidegger <[email protected]>2016-12-11 23:41:07 +0100
committerRoland Scheidegger <[email protected]>2016-12-21 04:48:24 +0100
commit3c98e3cd63012246346e6054c5c16d368f899062 (patch)
treee12fffae96933588bebe82bb1fda9b160cb77d1e /CleanSpec.mk
parent8bd67a35c50e68c21aed043de11e095c284d151a (diff)
gallivm: provide soa fetch path handling formats with more than 32bit
This previously always fell back to AoS conversion. Even for 4-float formats (which is the optimal case by far for that fallback case) this was suboptimal, since it meant the conversion couldn't be done with 256bit vectors. While this may still only be partly possible for some formats, (unless there's AVX2 support) at least the transpose can be done with half the unpacks (and before using the transpose for AoS fallbacks, it was worse still). With less than 4 channels, things got way worse with the AoS fallback quickly even with 128bit vectors. The strategy is pretty much the same as the existing one for formats which fit into 32 bits, except there's now multiple vectors to be fetched (2 or 4 to be exact), which need to be shuffled first (if it's 4 vectors, this amounts to a transpose, for 2 it's a bit different), then the unpack is done the same (with the exception that the shift of the channels is now modulo 32, and we need to select the right vector). In fact the most complex part about it is to get the shuffles right for separating into lo/hi parts for AVX/AVX2... This also makes use of the new ability of gather to use provided type information, which we abuse to outsmart llvm so we get decent shuffles, and to fetch 3x32bit vectors without having to ZExt the scalar. And just because we can, we handle double formats too, albeit they are a bit different (draw sometimes needs to handle that). v2: fix typo float/int bug (generating inefficient code). Reviewed-by: Jose Fonseca <[email protected]>
Diffstat (limited to 'CleanSpec.mk')
0 files changed, 0 insertions, 0 deletions