diff options
author | Matthew Ahrens <[email protected]> | 2020-12-11 10:26:02 -0800 |
---|---|---|
committer | GitHub <[email protected]> | 2020-12-11 10:26:02 -0800 |
commit | ba67d82142bc7034734a49a62998cfc96b1d0038 (patch) | |
tree | 571de79e9293abc687e93aea8912431997d00669 /module/zfs/dnode_sync.c | |
parent | 7d4b365ce32ecaf97020178f2a847a01f7e35476 (diff) |
Improve zfs receive performance with lightweight write
The performance of `zfs receive` can be bottlenecked on the CPU consumed
by the `receive_writer` thread, especially when receiving streams with
small compressed block sizes. Much of the CPU is spent creating and
destroying dbuf's and arc buf's, one for each `WRITE` record in the send
stream.
This commit introduces the concept of "lightweight writes", which allows
`zfs receive` to write to the DMU by providing an ABD, and instantiating
only a new type of `dbuf_dirty_record_t`. The dbuf and arc buf for this
"dirty leaf block" are not instantiated.
Because there is no dbuf with the dirty data, this mechanism doesn't
support reading from "lightweight-dirty" blocks (they would see the
on-disk state rather than the dirty data). Since the dedup-receive code
has been removed, `zfs receive` is write-only, so this works fine.
Because there are no arc bufs for the received data, the received data
is no longer cached in the ARC.
Testing a receive of a stream with average compressed block size of 4KB,
this commit improves performance by 50%, while also reducing CPU usage
by 50% of a CPU. On a per-block basis, CPU consumed by receive_writer()
and dbuf_evict() is now 1/7th (14%) of what it was.
Baseline: 450MB/s, CPU in receive_writer() 40% + dbuf_evict() 35%
New: 670MB/s, CPU in receive_writer() 17% + dbuf_evict() 0%
The code is also restructured in a few ways:
Added a `dr_dnode` field to the dbuf_dirty_record_t. This simplifies
some existing code that no longer needs `DB_DNODE_ENTER()` and related
routines. The new field is needed by the lightweight-type dirty record.
To ensure that the `dr_dnode` field remains valid until the dirty record
is freed, we have to ensure that the `dnode_move()` doesn't relocate the
dnode_t. To do this we keep a hold on the dnode until it's zio's have
completed. This is already done by the user-accounting code
(`userquota_updates_task()`), this commit extends that so that it always
keeps the dnode hold until zio completion (see `dnode_rele_task()`).
`dn_dirty_txg` was previously zeroed when the dnode was synced. This
was not necessary, since its meaning can be "when was this dnode last
dirtied". This change simplifies the new `dnode_rele_task()` code.
Removed some dead code related to `DRR_WRITE_BYREF` (dedup receive).
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #11105
Diffstat (limited to 'module/zfs/dnode_sync.c')
-rw-r--r-- | module/zfs/dnode_sync.c | 6 |
1 files changed, 4 insertions, 2 deletions
diff --git a/module/zfs/dnode_sync.c b/module/zfs/dnode_sync.c index ae44cb697..66e48a1e1 100644 --- a/module/zfs/dnode_sync.c +++ b/module/zfs/dnode_sync.c @@ -21,7 +21,7 @@ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2012, 2018 by Delphix. All rights reserved. + * Copyright (c) 2012, 2020 by Delphix. All rights reserved. * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved. * Copyright 2020 Oxide Computer Company */ @@ -851,6 +851,8 @@ dnode_sync(dnode_t *dn, dmu_tx_t *tx) /* * Although we have dropped our reference to the dnode, it * can't be evicted until its written, and we haven't yet - * initiated the IO for the dnode's dbuf. + * initiated the IO for the dnode's dbuf. Additionally, the caller + * has already added a reference to the dnode because it's on the + * os_synced_dnodes list. */ } |