diff options
author | José Fonseca <[email protected]> | 2012-08-31 17:01:50 +0100 |
---|---|---|
committer | José Fonseca <[email protected]> | 2012-09-04 08:49:00 +0100 |
commit | 7eb504019731368fd55f01e0264b195d4f99ae93 (patch) | |
tree | 5e64eeb49422420d7b82356462bcc759bc27d270 /src/gallium/auxiliary/gallivm | |
parent | 9a31e090efb15ec34e7a1a5e707d600a11d74925 (diff) |
gallivm,llvmpipe: Use 4-wide vectors on AMD Bulldozer.
8-wide vectors is slower.
Reviewed-by: Roland Scheidegger <[email protected]>
Diffstat (limited to 'src/gallium/auxiliary/gallivm')
-rw-r--r-- | src/gallium/auxiliary/gallivm/lp_bld_init.c | 10 |
1 files changed, 9 insertions, 1 deletions
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c b/src/gallium/auxiliary/gallivm/lp_bld_init.c index 068a2cd7915..ffbe3eaed2c 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_init.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c @@ -434,8 +434,16 @@ lp_build_init(void) util_cpu_detect(); + /* AMD Bulldozer AVX's throughput is the same as SSE2; and because using + * 8-wide vector needs more floating ops than 4-wide (due to padding), it is + * actually more efficient to use 4-wide vectors on this processor. + * + * See also: + * - http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/2 + */ if (HAVE_AVX && - util_cpu_caps.has_avx) { + util_cpu_caps.has_avx && + util_cpu_caps.has_intel) { lp_native_vector_width = 256; } else { /* Leave it at 128, even when no SIMD extensions are available. |