gallivm,llvmpipe: Use 4-wide vectors on AMD Bulldozer.

8-wide vectors is slower. Reviewed-by: Roland Scheidegger <[email protected]>
author: José Fonseca <[email protected]> 2012-08-31 17:01:50 +0100
committer: José Fonseca <[email protected]> 2012-09-04 08:49:00 +0100
commit: 7eb504019731368fd55f01e0264b195d4f99ae93 (patch)
tree: 5e64eeb49422420d7b82356462bcc759bc27d270 /src/gallium/auxiliary/gallivm
parent: 9a31e090efb15ec34e7a1a5e707d600a11d74925 (diff)
1 files changed, 9 insertions, 1 deletions
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c b/src/gallium/auxiliary/gallivm/lp_bld_init.c
index 068a2cd7915..ffbe3eaed2c 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_init.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c
@@ -434,8 +434,16 @@ lp_build_init(void)
 
    util_cpu_detect();
 
+   /* AMD Bulldozer AVX's throughput is the same as SSE2; and because using
+    * 8-wide vector needs more floating ops than 4-wide (due to padding), it is
+    * actually more efficient to use 4-wide vectors on this processor.
+    *
+    * See also:
+    * - http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/2
+    */
    if (HAVE_AVX &&
-       util_cpu_caps.has_avx) {
+       util_cpu_caps.has_avx &&
+       util_cpu_caps.has_intel) {
       lp_native_vector_width = 256;
    } else {
       /* Leave it at 128, even when no SIMD extensions are available.
author	José Fonseca <[email protected]>	2012-08-31 17:01:50 +0100
committer	José Fonseca <[email protected]>	2012-09-04 08:49:00 +0100
commit	7eb504019731368fd55f01e0264b195d4f99ae93 (patch)
tree	5e64eeb49422420d7b82356462bcc759bc27d270 /src/gallium/auxiliary/gallivm
parent	9a31e090efb15ec34e7a1a5e707d600a11d74925 (diff)