diff options
author | Matthew Ahrens <[email protected]> | 2020-03-10 10:51:04 -0700 |
---|---|---|
committer | GitHub <[email protected]> | 2020-03-10 10:51:04 -0700 |
commit | 1dc32a67e93bbc8d650943f1a460abb9ff6c5083 (patch) | |
tree | ba8ab8eb3c5a7beeaabb62c85456c2e0f38d2ee0 /module/zfs/arc.c | |
parent | 9be70c37844c5d832ffef6f0ed2651a332b35dbd (diff) |
Improve zfs send performance by bypassing the ARC
When doing a zfs send on a dataset with small recordsize (e.g. 8K),
performance is dominated by the per-block overheads. This is especially
true with `zfs send --compressed`, which further reduces the amount of
data sent, for the same number of blocks. Several threads are involved,
but the limiting factor is the `send_prefetch` thread, which is 100% on
CPU.
The main job of the `send_prefetch` thread is to issue zio's for the
data that will be needed by the main thread. It does this by calling
`arc_read(ARC_FLAG_PREFETCH)`. This has an immediate cost of creating
an arc_hdr, which takes around 14% of one CPU. It also induces later
costs by other threads:
* Since the data was only prefetched, dmu_send()->dmu_dump_write() will
need to call arc_read() again to get the data. This will have to
look up the arc_hdr in the hash table and copy the data from the
scatter ABD in the arc_hdr to a linear ABD in arc_buf. This takes
27% of one CPU.
* dmu_dump_write() needs to arc_buf_destroy() This takes 11% of one
CPU.
* arc_adjust() will need to evict this arc_hdr, taking about 50% of one
CPU.
All of these costs can be avoided by bypassing the ARC if the data is
not already cached. This commit changes `zfs send` to check for the
data in the ARC, and if it is not found then we directly call
`zio_read()`, reading the data into a linear ABD which is used by
dmu_dump_write() directly.
The performance improvement is best expressed in terms of how many
blocks can be processed by `zfs send` in one second. This change
increases the metric by 50%, from ~100,000 to ~150,000. When the amount
of data per block is small (e.g. 2KB), there is a corresponding
reduction in the elapsed time of `zfs send >/dev/null` (from 86 minutes
to 58 minutes in this test case).
In addition to improving the performance of `zfs send`, this change
makes `zfs send` not pollute the ARC cache. In most cases the data will
not be reused, so this allows us to keep caching useful data in the MRU
(hit-once) part of the ARC.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #10067
Diffstat (limited to 'module/zfs/arc.c')
-rw-r--r-- | module/zfs/arc.c | 19 |
1 files changed, 18 insertions, 1 deletions
diff --git a/module/zfs/arc.c b/module/zfs/arc.c index 3df53d2db..d49d85db0 100644 --- a/module/zfs/arc.c +++ b/module/zfs/arc.c @@ -548,7 +548,8 @@ arc_stats_t arc_stats = { { "demand_hit_prescient_prefetch", KSTAT_DATA_UINT64 }, { "arc_need_free", KSTAT_DATA_UINT64 }, { "arc_sys_free", KSTAT_DATA_UINT64 }, - { "arc_raw_size", KSTAT_DATA_UINT64 } + { "arc_raw_size", KSTAT_DATA_UINT64 }, + { "cached_only_in_progress", KSTAT_DATA_UINT64 }, }; #define ARCSTAT_MAX(stat, val) { \ @@ -5563,6 +5564,13 @@ top: if (HDR_IO_IN_PROGRESS(hdr)) { zio_t *head_zio = hdr->b_l1hdr.b_acb->acb_zio_head; + if (*arc_flags & ARC_FLAG_CACHED_ONLY) { + mutex_exit(hash_lock); + ARCSTAT_BUMP(arcstat_cached_only_in_progress); + rc = SET_ERROR(ENOENT); + goto out; + } + ASSERT3P(head_zio, !=, NULL); if ((hdr->b_flags & ARC_FLAG_PRIO_ASYNC_READ) && priority == ZIO_PRIORITY_SYNC_READ) { @@ -5698,12 +5706,21 @@ top: uint64_t size; abd_t *hdr_abd; + if (*arc_flags & ARC_FLAG_CACHED_ONLY) { + rc = SET_ERROR(ENOENT); + if (hash_lock != NULL) + mutex_exit(hash_lock); + goto out; + } + /* * Gracefully handle a damaged logical block size as a * checksum error. */ if (lsize > spa_maxblocksize(spa)) { rc = SET_ERROR(ECKSUM); + if (hash_lock != NULL) + mutex_exit(hash_lock); goto out; } |