summaryrefslogtreecommitdiffstats
path: root/include
diff options
context:
space:
mode:
authorGvozden Neskovic <[email protected]>2016-07-06 13:42:04 +0200
committerBrian Behlendorf <[email protected]>2016-08-16 14:11:14 -0700
commit70b258fc962fd40673b9a47574cb83d8438e7d94 (patch)
tree6e45c08b144622dc78f1106681ce5566c77b588d /include
parent32ffaa3de58981814342fe6d3556c03d41d121f8 (diff)
Fletcher4 implementation using avx512f instruction set
Algorithm runs 8 parallel sums, consuming 8x uint32_t elements per loop iteration. Size alignment of main fletcher4 methods is adjusted accordingly. New implementation is called 'avx512f'. Note: byteswap method can be implemented more efficiently when avx512bw hardware becomes available. Currently, it is ~ 2x slower than native method. Table shows result of full (native) fletcher4 calculation for different buffer size: fletcher4 4KB 16KB 64KB 128KB 256KB 1MB 16MB -------------------------------------------------------------------- [scalar] 1213 1228 1231 1231 1225 1200 1160 [sse2] 2374 2442 2459 2456 2462 2250 2220 [avx2] 4288 4753 4871 4893 4900 4050 3882 [avx512f] 5975 8445 9196 9221 9262 6307 5620 Signed-off-by: Gvozden Neskovic <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4952
Diffstat (limited to 'include')
-rw-r--r--include/zfs_fletcher.h4
1 files changed, 4 insertions, 0 deletions
diff --git a/include/zfs_fletcher.h b/include/zfs_fletcher.h
index afc3936c0..ecba4ada7 100644
--- a/include/zfs_fletcher.h
+++ b/include/zfs_fletcher.h
@@ -73,6 +73,10 @@ extern const fletcher_4_ops_t fletcher_4_ssse3_ops;
extern const fletcher_4_ops_t fletcher_4_avx2_ops;
#endif
+#if defined(__x86_64) && defined(HAVE_AVX512F)
+extern const fletcher_4_ops_t fletcher_4_avx512f_ops;
+#endif
+
#ifdef __cplusplus
}
#endif