summaryrefslogtreecommitdiffstats
path: root/module/zfs/dnode_sync.c
diff options
context:
space:
mode:
authorMatthew Ahrens <[email protected]>2020-12-11 10:26:02 -0800
committerGitHub <[email protected]>2020-12-11 10:26:02 -0800
commitba67d82142bc7034734a49a62998cfc96b1d0038 (patch)
tree571de79e9293abc687e93aea8912431997d00669 /module/zfs/dnode_sync.c
parent7d4b365ce32ecaf97020178f2a847a01f7e35476 (diff)
Improve zfs receive performance with lightweight write
The performance of `zfs receive` can be bottlenecked on the CPU consumed by the `receive_writer` thread, especially when receiving streams with small compressed block sizes. Much of the CPU is spent creating and destroying dbuf's and arc buf's, one for each `WRITE` record in the send stream. This commit introduces the concept of "lightweight writes", which allows `zfs receive` to write to the DMU by providing an ABD, and instantiating only a new type of `dbuf_dirty_record_t`. The dbuf and arc buf for this "dirty leaf block" are not instantiated. Because there is no dbuf with the dirty data, this mechanism doesn't support reading from "lightweight-dirty" blocks (they would see the on-disk state rather than the dirty data). Since the dedup-receive code has been removed, `zfs receive` is write-only, so this works fine. Because there are no arc bufs for the received data, the received data is no longer cached in the ARC. Testing a receive of a stream with average compressed block size of 4KB, this commit improves performance by 50%, while also reducing CPU usage by 50% of a CPU. On a per-block basis, CPU consumed by receive_writer() and dbuf_evict() is now 1/7th (14%) of what it was. Baseline: 450MB/s, CPU in receive_writer() 40% + dbuf_evict() 35% New: 670MB/s, CPU in receive_writer() 17% + dbuf_evict() 0% The code is also restructured in a few ways: Added a `dr_dnode` field to the dbuf_dirty_record_t. This simplifies some existing code that no longer needs `DB_DNODE_ENTER()` and related routines. The new field is needed by the lightweight-type dirty record. To ensure that the `dr_dnode` field remains valid until the dirty record is freed, we have to ensure that the `dnode_move()` doesn't relocate the dnode_t. To do this we keep a hold on the dnode until it's zio's have completed. This is already done by the user-accounting code (`userquota_updates_task()`), this commit extends that so that it always keeps the dnode hold until zio completion (see `dnode_rele_task()`). `dn_dirty_txg` was previously zeroed when the dnode was synced. This was not necessary, since its meaning can be "when was this dnode last dirtied". This change simplifies the new `dnode_rele_task()` code. Removed some dead code related to `DRR_WRITE_BYREF` (dedup receive). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #11105
Diffstat (limited to 'module/zfs/dnode_sync.c')
-rw-r--r--module/zfs/dnode_sync.c6
1 files changed, 4 insertions, 2 deletions
diff --git a/module/zfs/dnode_sync.c b/module/zfs/dnode_sync.c
index ae44cb697..66e48a1e1 100644
--- a/module/zfs/dnode_sync.c
+++ b/module/zfs/dnode_sync.c
@@ -21,7 +21,7 @@
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
- * Copyright (c) 2012, 2018 by Delphix. All rights reserved.
+ * Copyright (c) 2012, 2020 by Delphix. All rights reserved.
* Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
* Copyright 2020 Oxide Computer Company
*/
@@ -851,6 +851,8 @@ dnode_sync(dnode_t *dn, dmu_tx_t *tx)
/*
* Although we have dropped our reference to the dnode, it
* can't be evicted until its written, and we haven't yet
- * initiated the IO for the dnode's dbuf.
+ * initiated the IO for the dnode's dbuf. Additionally, the caller
+ * has already added a reference to the dnode because it's on the
+ * os_synced_dnodes list.
*/
}