diff options
author | Roland Scheidegger <[email protected]> | 2013-06-07 21:20:01 +0200 |
---|---|---|
committer | Roland Scheidegger <[email protected]> | 2013-06-08 17:33:51 +0200 |
commit | 213c207b3ac40ae769afe01b8578f566b5e7840d (patch) | |
tree | a2c21acecead6f1e470a04f3afe73260fcb56d68 /src/gallium | |
parent | 0aca2c6b608b80661cb8fd35eb08195ab95743f5 (diff) |
gallivm: work around slow code generated for interleaving 128bit vectors
We use 128bit vector interleave for untwiddling in the blend code (with
256bit vectors). llvm generates terrible code for this for some reason,
so instead of generating a shuffle for 2 128bit vectors use a
extract/insert shuffle instead (it only seems to matter we're not using
128bit wide vectors for the shuffle). This decreases instruction count of
the blend code generated for a rgba8 render target without blending from
169 to 113 with llvm 3.1 and from 136 to 114 in llvm 3.2/3.3, and I got
a ~8% (llvm 3.1) and ~5% (3.2/3.3) performance improvement in gears.
(The generated code is still not terribly good as we could actually avoid
the interleaving completely but llvm can't know this.)
Reviewed-by: Jose Fonseca <[email protected]>
Diffstat (limited to 'src/gallium')
-rw-r--r-- | src/gallium/auxiliary/gallivm/lp_bld_pack.c | 22 |
1 files changed, 22 insertions, 0 deletions
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c b/src/gallium/auxiliary/gallivm/lp_bld_pack.c index 14fcd385798..22a4f5a8fdd 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c @@ -271,6 +271,28 @@ lp_build_interleave2(struct gallivm_state *gallivm, { LLVMValueRef shuffle; + if (type.length == 2 && type.width == 128 && util_cpu_caps.has_avx) { + /* + * XXX: This is a workaround for llvm code generation deficiency. Strangely + * enough, while this needs vinsertf128/vextractf128 instructions (hence + * a natural match when using 2x128bit vectors) the "normal" unpack shuffle + * generates code ranging from atrocious (llvm 3.1) to terrible (llvm 3.2, 3.3). + * So use some different shuffles instead (the exact shuffles don't seem to + * matter, as long as not using 128bit wide vectors, works with 8x32 or 4x64). + */ + struct lp_type tmp_type = type; + LLVMValueRef srchalf[2], tmpdst; + tmp_type.length = 4; + tmp_type.width = 64; + a = LLVMBuildBitCast(gallivm->builder, a, lp_build_vec_type(gallivm, tmp_type), ""); + b = LLVMBuildBitCast(gallivm->builder, b, lp_build_vec_type(gallivm, tmp_type), ""); + srchalf[0] = lp_build_extract_range(gallivm, a, lo_hi * 2, 2); + srchalf[1] = lp_build_extract_range(gallivm, b, lo_hi * 2, 2); + tmp_type.length = 2; + tmpdst = lp_build_concat(gallivm, srchalf, tmp_type, 2); + return LLVMBuildBitCast(gallivm->builder, tmpdst, lp_build_vec_type(gallivm, type), ""); + } + shuffle = lp_build_const_unpack_shuffle(gallivm, type.length, lo_hi); return LLVMBuildShuffleVector(gallivm->builder, a, b, shuffle, ""); |