Rework of fletcher_4 module

- Benchmark memory block is increased to 128kiB to reflect real block sizes more accurately. Measurements include all three stages needed for checksum generation, i.e. `init()/compute()/fini()`. The inner loop is repeated multiple times to offset overhead of time function. - Fastest implementation selects native and byteswap methods independently in benchmark. To support this new function pointers `init_byteswap()/fini_byteswap()` are introduced. - Implementation mutex lock is replaced by atomic variable. - To save time, benchmark is not executed in userspace. Instead, highest supported implementation is used for fastest. Default userspace selector is still 'cycle'. - `fletcher_4_native/byteswap()` methods use incremental methods to finish calculation if data size is not multiple of vector stride (currently 64B). - Added `fletcher_4_native_varsize()` special purpose method for use when buffer size is not known in advance. The method does not enforce 4B alignment on buffer size, and will ignore last (size % 4) bytes of the data buffer. - Benchmark `kstat` is changed to match the one of vdev_raidz. It now shows throughput for all supported implementations (in B/s), native and byteswap, as well as the code [fastest] is running. Example of `fletcher_4_bench` running on `Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz`: implementation native byteswap scalar 4768120823 3426105750 sse2 7947841777 4318964249 ssse3 7951922722 6112191941 avx2 13269714358 11043200912 fastest avx2 avx2 Example of `fletcher_4_bench` running on `Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz`: implementation native byteswap scalar 1291115967 1031555336 sse2 2539571138 1280970926 ssse3 2537778746 1080016762 avx2 4950749767 1078493449 avx512f 9581379998 4010029046 fastest avx512f avx512f Signed-off-by: Gvozden Neskovic <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4952
author: Gvozden Neskovic <[email protected]> 2016-07-12 17:50:54 +0200
committer: Brian Behlendorf <[email protected]> 2016-08-16 14:11:55 -0700
commit: fc897b24b2efafccb5c9e915b81dc5f797673e72 (patch)
tree: c09331359acb44530271bc21d49deb6a8020a96e /include
parent: 70b258fc962fd40673b9a47574cb83d8438e7d94 (diff)
1 files changed, 21 insertions, 5 deletions
diff --git a/include/zfs_fletcher.h b/include/zfs_fletcher.h
index ecba4ada7..f0cfbd573 100644
--- a/include/zfs_fletcher.h
+++ b/include/zfs_fletcher.h
@@ -35,11 +35,20 @@ extern "C" {
 
 /*
  * fletcher checksum functions
+ *
+ * Note: Fletcher checksum methods expect buffer size to be 4B aligned. This
+ * limitation stems from the algorithm design. Performing incremental checksum
+ * without said alignment would yield different results. Therefore, the code
+ * includes assertions for the size alignment.
+ * For compatibility, it is required that some code paths calculate checksum of
+ * non-aligned buffer sizes. For this purpose, `fletcher_4_native_varsize()`
+ * checksum method is added. This method will ignore last (size % 4) bytes of
+ * the data buffer.
  */
-
 void fletcher_2_native(const void *, uint64_t, zio_cksum_t *);
 void fletcher_2_byteswap(const void *, uint64_t, zio_cksum_t *);
 void fletcher_4_native(const void *, uint64_t, zio_cksum_t *);
+void fletcher_4_native_varsize(const void *, uint64_t, zio_cksum_t *);
 void fletcher_4_byteswap(const void *, uint64_t, zio_cksum_t *);
 void fletcher_4_incremental_native(const void *, uint64_t,
     zio_cksum_t *);
@@ -49,14 +58,21 @@ int fletcher_4_impl_set(const char *selector);
 void fletcher_4_init(void);
 void fletcher_4_fini(void);
 
+
 /*
  * fletcher checksum struct
  */
+typedef void (*fletcher_4_init_f)(zio_cksum_t *);
+typedef void (*fletcher_4_fini_f)(zio_cksum_t *);
+typedef void (*fletcher_4_compute_f)(const void *, uint64_t, zio_cksum_t *);
+
 typedef struct fletcher_4_func {
-	void (*init)(zio_cksum_t *);
-	void (*fini)(zio_cksum_t *);
-	void (*compute)(const void *, uint64_t, zio_cksum_t *);
-	void (*compute_byteswap)(const void *, uint64_t, zio_cksum_t *);
+	fletcher_4_init_f init_native;
+	fletcher_4_fini_f fini_native;
+	fletcher_4_compute_f compute_native;
+	fletcher_4_init_f init_byteswap;
+	fletcher_4_fini_f fini_byteswap;
+	fletcher_4_compute_f compute_byteswap;
 	boolean_t (*valid)(void);
 	const char *name;
 } fletcher_4_ops_t;
author	Gvozden Neskovic <[email protected]>	2016-07-12 17:50:54 +0200
committer	Brian Behlendorf <[email protected]>	2016-08-16 14:11:55 -0700
commit	fc897b24b2efafccb5c9e915b81dc5f797673e72 (patch)
tree	c09331359acb44530271bc21d49deb6a8020a96e /include
parent	70b258fc962fd40673b9a47574cb83d8438e7d94 (diff)