From 70b258fc962fd40673b9a47574cb83d8438e7d94 Mon Sep 17 00:00:00 2001 From: Gvozden Neskovic Date: Wed, 6 Jul 2016 13:42:04 +0200 Subject: Fletcher4 implementation using avx512f instruction set Algorithm runs 8 parallel sums, consuming 8x uint32_t elements per loop iteration. Size alignment of main fletcher4 methods is adjusted accordingly. New implementation is called 'avx512f'. Note: byteswap method can be implemented more efficiently when avx512bw hardware becomes available. Currently, it is ~ 2x slower than native method. Table shows result of full (native) fletcher4 calculation for different buffer size: fletcher4 4KB 16KB 64KB 128KB 256KB 1MB 16MB -------------------------------------------------------------------- [scalar] 1213 1228 1231 1231 1225 1200 1160 [sse2] 2374 2442 2459 2456 2462 2250 2220 [avx2] 4288 4753 4871 4893 4900 4050 3882 [avx512f] 5975 8445 9196 9221 9262 6307 5620 Signed-off-by: Gvozden Neskovic Signed-off-by: Brian Behlendorf Issue #4952 --- man/man5/zfs-module-parameters.5 | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'man/man5') diff --git a/man/man5/zfs-module-parameters.5 b/man/man5/zfs-module-parameters.5 index 3e62a4436..b4ad3700f 100644 --- a/man/man5/zfs-module-parameters.5 +++ b/man/man5/zfs-module-parameters.5 @@ -883,14 +883,14 @@ Default value: \fB67,108,864\fR. Select a fletcher 4 implementation. .sp Supported selectors are: \fBfastest\fR, \fBscalar\fR, \fBsse2\fR, \fBssse3\fR, -and \fBavx2\fR. All of the selectors except \fBfastest\fR and \fBscalar\fR -require instruction set extensions to be available and will only appear if ZFS -detects that they are present at runtime. If multiple implementations of -fletcher 4 are available, the \fBfastest\fR will be chosen using a micro -benchmark. Selecting \fBscalar\fR results in the original CPU based calculation -being used. Selecting any option other than \fBfastest\fR and \fBscalar\fR -results in vector instructions from the respective CPU instruction set being -used. +\fBavx2\fR, and \fBavx512f\fR. +All of the selectors except \fBfastest\fR and \fBscalar\fR require instruction +set extensions to be available and will only appear if ZFS detects that they are +present at runtime. If multiple implementations of fletcher 4 are available, +the \fBfastest\fR will be chosen using a micro benchmark. Selecting \fBscalar\fR +results in the original, CPU based calculation, being used. Selecting any option +other than \fBfastest\fR and \fBscalar\fR results in vector instructions from +the respective CPU instruction set being used. .sp Default value: \fBfastest\fR. .RE -- cgit v1.2.3