Add an accelerated version of F_TO_I for x86_64

According to a quick micro-benchmark, this new version is 20% faster on my Haswell laptop. v2: Removed the XXX note about x86_64 from the comment v3: Use an intrinsic instead of an __asm__ block. This should give us MSVC support for free. v4: Enable it for all x86_64 builds, not just with USE_X86_64_ASM Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
author: Jason Ekstrand <[email protected]> 2014-07-21 16:46:39 -0700
committer: Jason Ekstrand <[email protected]> 2014-07-24 12:44:56 -0700
commit: 989d2e370993c87d1bbda4950657bfcc5b0a58dd (patch)
tree: 1b9a048a63e291fb68d7c1a2bf00e08d69d7e672
parent: 2a33510f1649f2ef5c5b2d693aa89ef0efc5dcfb (diff)
1 files changed, 5 insertions, 1 deletions
diff --git a/src/mesa/main/imports.h b/src/mesa/main/imports.h
index af780b2498f..09e55ebf0ff 100644
--- a/src/mesa/main/imports.h
+++ b/src/mesa/main/imports.h
@@ -274,10 +274,12 @@ static inline int IROUND_POS(float f)
    return (int) (f + 0.5F);
 }
 
+#ifdef __x86_64__
+#  include <xmmintrin.h>
+#endif
 
 /**
  * Convert float to int using a fast method.  The rounding mode may vary.
- * XXX We could use an x86-64/SSE2 version here.
  */
 static inline int F_TO_I(float f)
 {
@@ -292,6 +294,8 @@ static inline int F_TO_I(float f)
 	 fistp r
 	}
    return r;
+#elif defined(__x86_64__)
+   return _mm_cvt_ss2si(_mm_load_ss(&f));
 #else
    return IROUND(f);
 #endif
author	Jason Ekstrand <[email protected]>	2014-07-21 16:46:39 -0700
committer	Jason Ekstrand <[email protected]>	2014-07-24 12:44:56 -0700
commit	989d2e370993c87d1bbda4950657bfcc5b0a58dd (patch)
tree	1b9a048a63e291fb68d7c1a2bf00e08d69d7e672
parent	2a33510f1649f2ef5c5b2d693aa89ef0efc5dcfb (diff)