diff options
author | Gvozden Neskovic <[email protected]> | 2016-07-12 17:50:54 +0200 |
---|---|---|
committer | Brian Behlendorf <[email protected]> | 2016-08-16 14:11:55 -0700 |
commit | fc897b24b2efafccb5c9e915b81dc5f797673e72 (patch) | |
tree | c09331359acb44530271bc21d49deb6a8020a96e /lib | |
parent | 70b258fc962fd40673b9a47574cb83d8438e7d94 (diff) |
Rework of fletcher_4 module
- Benchmark memory block is increased to 128kiB to reflect real block sizes more
accurately. Measurements include all three stages needed for checksum generation,
i.e. `init()/compute()/fini()`. The inner loop is repeated multiple times to offset
overhead of time function.
- Fastest implementation selects native and byteswap methods independently in
benchmark. To support this new function pointers `init_byteswap()/fini_byteswap()`
are introduced.
- Implementation mutex lock is replaced by atomic variable.
- To save time, benchmark is not executed in userspace. Instead, highest supported
implementation is used for fastest. Default userspace selector is still 'cycle'.
- `fletcher_4_native/byteswap()` methods use incremental methods to finish
calculation if data size is not multiple of vector stride (currently 64B).
- Added `fletcher_4_native_varsize()` special purpose method for use when buffer size
is not known in advance. The method does not enforce 4B alignment on buffer size, and
will ignore last (size % 4) bytes of the data buffer.
- Benchmark `kstat` is changed to match the one of vdev_raidz. It now shows
throughput for all supported implementations (in B/s), native and byteswap,
as well as the code [fastest] is running.
Example of `fletcher_4_bench` running on `Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz`:
implementation native byteswap
scalar 4768120823 3426105750
sse2 7947841777 4318964249
ssse3 7951922722 6112191941
avx2 13269714358 11043200912
fastest avx2 avx2
Example of `fletcher_4_bench` running on `Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz`:
implementation native byteswap
scalar 1291115967 1031555336
sse2 2539571138 1280970926
ssse3 2537778746 1080016762
avx2 4950749767 1078493449
avx512f 9581379998 4010029046
fastest avx512f avx512f
Signed-off-by: Gvozden Neskovic <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #4952
Diffstat (limited to 'lib')
-rw-r--r-- | lib/libzfs/libzfs_sendrecv.c | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/lib/libzfs/libzfs_sendrecv.c b/lib/libzfs/libzfs_sendrecv.c index 6adcc7a54..0d19c3bf4 100644 --- a/lib/libzfs/libzfs_sendrecv.c +++ b/lib/libzfs/libzfs_sendrecv.c @@ -1442,7 +1442,7 @@ zfs_send_resume_token_to_nvlist(libzfs_handle_t *hdl, const char *token) /* verify checksum */ zio_cksum_t cksum; - fletcher_4_native(compressed, len, &cksum); + fletcher_4_native_varsize(compressed, len, &cksum); if (cksum.zc_word[0] != checksum) { free(compressed); zfs_error_aux(hdl, dgettext(TEXT_DOMAIN, |