diff options
author | Gvozden Neskovic <[email protected]> | 2016-07-12 17:50:54 +0200 |
---|---|---|
committer | Brian Behlendorf <[email protected]> | 2016-08-16 14:11:55 -0700 |
commit | fc897b24b2efafccb5c9e915b81dc5f797673e72 (patch) | |
tree | c09331359acb44530271bc21d49deb6a8020a96e /module/zfs | |
parent | 70b258fc962fd40673b9a47574cb83d8438e7d94 (diff) |
Rework of fletcher_4 module
- Benchmark memory block is increased to 128kiB to reflect real block sizes more
accurately. Measurements include all three stages needed for checksum generation,
i.e. `init()/compute()/fini()`. The inner loop is repeated multiple times to offset
overhead of time function.
- Fastest implementation selects native and byteswap methods independently in
benchmark. To support this new function pointers `init_byteswap()/fini_byteswap()`
are introduced.
- Implementation mutex lock is replaced by atomic variable.
- To save time, benchmark is not executed in userspace. Instead, highest supported
implementation is used for fastest. Default userspace selector is still 'cycle'.
- `fletcher_4_native/byteswap()` methods use incremental methods to finish
calculation if data size is not multiple of vector stride (currently 64B).
- Added `fletcher_4_native_varsize()` special purpose method for use when buffer size
is not known in advance. The method does not enforce 4B alignment on buffer size, and
will ignore last (size % 4) bytes of the data buffer.
- Benchmark `kstat` is changed to match the one of vdev_raidz. It now shows
throughput for all supported implementations (in B/s), native and byteswap,
as well as the code [fastest] is running.
Example of `fletcher_4_bench` running on `Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz`:
implementation native byteswap
scalar 4768120823 3426105750
sse2 7947841777 4318964249
ssse3 7951922722 6112191941
avx2 13269714358 11043200912
fastest avx2 avx2
Example of `fletcher_4_bench` running on `Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz`:
implementation native byteswap
scalar 1291115967 1031555336
sse2 2539571138 1280970926
ssse3 2537778746 1080016762
avx2 4950749767 1078493449
avx512f 9581379998 4010029046
fastest avx512f avx512f
Signed-off-by: Gvozden Neskovic <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #4952
Diffstat (limited to 'module/zfs')
-rw-r--r-- | module/zfs/dsl_dataset.c | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/module/zfs/dsl_dataset.c b/module/zfs/dsl_dataset.c index 5d7847d46..dd390d49a 100644 --- a/module/zfs/dsl_dataset.c +++ b/module/zfs/dsl_dataset.c @@ -1770,7 +1770,7 @@ get_receive_resume_stats(dsl_dataset_t *ds, nvlist_t *nv) compressed_size = gzip_compress(packed, compressed, packed_size, packed_size, 6); - fletcher_4_native(compressed, compressed_size, &cksum); + fletcher_4_native_varsize(compressed, compressed_size, &cksum); str = kmem_alloc(compressed_size * 2 + 1, KM_SLEEP); for (i = 0; i < compressed_size; i++) { |