diff options
author | Gvozden Neskovic <[email protected]> | 2016-07-12 17:50:54 +0200 |
---|---|---|
committer | Brian Behlendorf <[email protected]> | 2016-08-16 14:11:55 -0700 |
commit | fc897b24b2efafccb5c9e915b81dc5f797673e72 (patch) | |
tree | c09331359acb44530271bc21d49deb6a8020a96e /include | |
parent | 70b258fc962fd40673b9a47574cb83d8438e7d94 (diff) |
Rework of fletcher_4 module
- Benchmark memory block is increased to 128kiB to reflect real block sizes more
accurately. Measurements include all three stages needed for checksum generation,
i.e. `init()/compute()/fini()`. The inner loop is repeated multiple times to offset
overhead of time function.
- Fastest implementation selects native and byteswap methods independently in
benchmark. To support this new function pointers `init_byteswap()/fini_byteswap()`
are introduced.
- Implementation mutex lock is replaced by atomic variable.
- To save time, benchmark is not executed in userspace. Instead, highest supported
implementation is used for fastest. Default userspace selector is still 'cycle'.
- `fletcher_4_native/byteswap()` methods use incremental methods to finish
calculation if data size is not multiple of vector stride (currently 64B).
- Added `fletcher_4_native_varsize()` special purpose method for use when buffer size
is not known in advance. The method does not enforce 4B alignment on buffer size, and
will ignore last (size % 4) bytes of the data buffer.
- Benchmark `kstat` is changed to match the one of vdev_raidz. It now shows
throughput for all supported implementations (in B/s), native and byteswap,
as well as the code [fastest] is running.
Example of `fletcher_4_bench` running on `Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz`:
implementation native byteswap
scalar 4768120823 3426105750
sse2 7947841777 4318964249
ssse3 7951922722 6112191941
avx2 13269714358 11043200912
fastest avx2 avx2
Example of `fletcher_4_bench` running on `Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz`:
implementation native byteswap
scalar 1291115967 1031555336
sse2 2539571138 1280970926
ssse3 2537778746 1080016762
avx2 4950749767 1078493449
avx512f 9581379998 4010029046
fastest avx512f avx512f
Signed-off-by: Gvozden Neskovic <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #4952
Diffstat (limited to 'include')
-rw-r--r-- | include/zfs_fletcher.h | 26 |
1 files changed, 21 insertions, 5 deletions
diff --git a/include/zfs_fletcher.h b/include/zfs_fletcher.h index ecba4ada7..f0cfbd573 100644 --- a/include/zfs_fletcher.h +++ b/include/zfs_fletcher.h @@ -35,11 +35,20 @@ extern "C" { /* * fletcher checksum functions + * + * Note: Fletcher checksum methods expect buffer size to be 4B aligned. This + * limitation stems from the algorithm design. Performing incremental checksum + * without said alignment would yield different results. Therefore, the code + * includes assertions for the size alignment. + * For compatibility, it is required that some code paths calculate checksum of + * non-aligned buffer sizes. For this purpose, `fletcher_4_native_varsize()` + * checksum method is added. This method will ignore last (size % 4) bytes of + * the data buffer. */ - void fletcher_2_native(const void *, uint64_t, zio_cksum_t *); void fletcher_2_byteswap(const void *, uint64_t, zio_cksum_t *); void fletcher_4_native(const void *, uint64_t, zio_cksum_t *); +void fletcher_4_native_varsize(const void *, uint64_t, zio_cksum_t *); void fletcher_4_byteswap(const void *, uint64_t, zio_cksum_t *); void fletcher_4_incremental_native(const void *, uint64_t, zio_cksum_t *); @@ -49,14 +58,21 @@ int fletcher_4_impl_set(const char *selector); void fletcher_4_init(void); void fletcher_4_fini(void); + /* * fletcher checksum struct */ +typedef void (*fletcher_4_init_f)(zio_cksum_t *); +typedef void (*fletcher_4_fini_f)(zio_cksum_t *); +typedef void (*fletcher_4_compute_f)(const void *, uint64_t, zio_cksum_t *); + typedef struct fletcher_4_func { - void (*init)(zio_cksum_t *); - void (*fini)(zio_cksum_t *); - void (*compute)(const void *, uint64_t, zio_cksum_t *); - void (*compute_byteswap)(const void *, uint64_t, zio_cksum_t *); + fletcher_4_init_f init_native; + fletcher_4_fini_f fini_native; + fletcher_4_compute_f compute_native; + fletcher_4_init_f init_byteswap; + fletcher_4_fini_f fini_byteswap; + fletcher_4_compute_f compute_byteswap; boolean_t (*valid)(void); const char *name; } fletcher_4_ops_t; |