Improve zfs send performance by bypassing the ARC

When doing a zfs send on a dataset with small recordsize (e.g. 8K), performance is dominated by the per-block overheads. This is especially true with `zfs send --compressed`, which further reduces the amount of data sent, for the same number of blocks. Several threads are involved, but the limiting factor is the `send_prefetch` thread, which is 100% on CPU. The main job of the `send_prefetch` thread is to issue zio's for the data that will be needed by the main thread. It does this by calling `arc_read(ARC_FLAG_PREFETCH)`. This has an immediate cost of creating an arc_hdr, which takes around 14% of one CPU. It also induces later costs by other threads: * Since the data was only prefetched, dmu_send()->dmu_dump_write() will need to call arc_read() again to get the data. This will have to look up the arc_hdr in the hash table and copy the data from the scatter ABD in the arc_hdr to a linear ABD in arc_buf. This takes 27% of one CPU. * dmu_dump_write() needs to arc_buf_destroy() This takes 11% of one CPU. * arc_adjust() will need to evict this arc_hdr, taking about 50% of one CPU. All of these costs can be avoided by bypassing the ARC if the data is not already cached. This commit changes `zfs send` to check for the data in the ARC, and if it is not found then we directly call `zio_read()`, reading the data into a linear ABD which is used by dmu_dump_write() directly. The performance improvement is best expressed in terms of how many blocks can be processed by `zfs send` in one second. This change increases the metric by 50%, from ~100,000 to ~150,000. When the amount of data per block is small (e.g. 2KB), there is a corresponding reduction in the elapsed time of `zfs send >/dev/null` (from 86 minutes to 58 minutes in this test case). In addition to improving the performance of `zfs send`, this change makes `zfs send` not pollute the ARC cache. In most cases the data will not be reused, so this allows us to keep caching useful data in the MRU (hit-once) part of the ARC. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10067
author: Matthew Ahrens <[email protected]> 2020-03-10 10:51:04 -0700
committer: GitHub <[email protected]> 2020-03-10 10:51:04 -0700
commit: 1dc32a67e93bbc8d650943f1a460abb9ff6c5083 (patch)
tree: ba8ab8eb3c5a7beeaabb62c85456c2e0f38d2ee0 /include
parent: 9be70c37844c5d832ffef6f0ed2651a332b35dbd (diff)
2 files changed, 7 insertions, 0 deletions
diff --git a/include/sys/arc.h b/include/sys/arc.h
index f6dea3fbd..6f56f732a 100644
--- a/include/sys/arc.h
+++ b/include/sys/arc.h
@@ -147,6 +147,12 @@ typedef enum arc_flags
 	ARC_FLAG_SHARED_DATA		= 1 << 21,
 
 	/*
+	 * Fail this arc_read() (with ENOENT) if the data is not already present
+	 * in cache.
+	 */
+	ARC_FLAG_CACHED_ONLY		= 1 << 22,
+
+	/*
 	 * The arc buffer's compression mode is stored in the top 7 bits of the
 	 * flags field, so these dummy flags are included so that MDB can
 	 * interpret the enum properly.
diff --git a/include/sys/arc_impl.h b/include/sys/arc_impl.h
index 8eeee54c9..c55640f8b 100644
--- a/include/sys/arc_impl.h
+++ b/include/sys/arc_impl.h
@@ -554,6 +554,7 @@ typedef struct arc_stats {
 	kstat_named_t arcstat_need_free;
 	kstat_named_t arcstat_sys_free;
 	kstat_named_t arcstat_raw_size;
+	kstat_named_t arcstat_cached_only_in_progress;
 } arc_stats_t;
 
 typedef enum free_memory_reason_t {
author	Matthew Ahrens <[email protected]>	2020-03-10 10:51:04 -0700
committer	GitHub <[email protected]>	2020-03-10 10:51:04 -0700
commit	1dc32a67e93bbc8d650943f1a460abb9ff6c5083 (patch)
tree	ba8ab8eb3c5a7beeaabb62c85456c2e0f38d2ee0 /include
parent	9be70c37844c5d832ffef6f0ed2651a332b35dbd (diff)