From 70b258fc962fd40673b9a47574cb83d8438e7d94 Mon Sep 17 00:00:00 2001 From: Gvozden Neskovic Date: Wed, 6 Jul 2016 13:42:04 +0200 Subject: Fletcher4 implementation using avx512f instruction set Algorithm runs 8 parallel sums, consuming 8x uint32_t elements per loop iteration. Size alignment of main fletcher4 methods is adjusted accordingly. New implementation is called 'avx512f'. Note: byteswap method can be implemented more efficiently when avx512bw hardware becomes available. Currently, it is ~ 2x slower than native method. Table shows result of full (native) fletcher4 calculation for different buffer size: fletcher4 4KB 16KB 64KB 128KB 256KB 1MB 16MB -------------------------------------------------------------------- [scalar] 1213 1228 1231 1231 1225 1200 1160 [sse2] 2374 2442 2459 2456 2462 2250 2220 [avx2] 4288 4753 4871 4893 4900 4050 3882 [avx512f] 5975 8445 9196 9221 9262 6307 5620 Signed-off-by: Gvozden Neskovic Signed-off-by: Brian Behlendorf Issue #4952 --- module/zcommon/Makefile.in | 1 + 1 file changed, 1 insertion(+) (limited to 'module/zcommon/Makefile.in') diff --git a/module/zcommon/Makefile.in b/module/zcommon/Makefile.in index 958835edf..7dffd5228 100644 --- a/module/zcommon/Makefile.in +++ b/module/zcommon/Makefile.in @@ -18,3 +18,4 @@ $(MODULE)-objs += zpool_prop.o $(MODULE)-$(CONFIG_X86) += zfs_fletcher_intel.o $(MODULE)-$(CONFIG_X86) += zfs_fletcher_sse.o +$(MODULE)-$(CONFIG_X86) += zfs_fletcher_avx512.o -- cgit v1.2.3