diff options
author | Rob Norris <[email protected]> | 2023-06-22 17:46:22 +1000 |
---|---|---|
committer | Brian Behlendorf <[email protected]> | 2024-08-16 12:03:35 -0700 |
commit | cd69ba3d49cdb939cba87e7fd6814608532df92f (patch) | |
tree | 32d51c27ae62b5145e5b9fd99b00930ff3c22f95 /man | |
parent | cbb9ef0a4c8e04358f7d5ddae0eb99d0f703ee21 (diff) |
ddt: dedup log
Adds a log/journal to dedup. At the end of txg, instead of writing the
entry directly to the ZAP, instead its adding to an in-memory tree and
appended to an on-disk object. The on-disk object is only read at
import, to reload the in-memory tree.
Lookups first go the the log tree before going to the ZAP, so
recently-used entries will remain close by in memory. This vastly
reduces overhead from dedup IO, as it will not have to do so many
read/update/write cycles on ZAP leaf nodes.
A flushing facility is added at end of txg, to push logged entries out
to the ZAP. There's actually two separate "logs" (in-memory tree and
on-disk object), one active (recieving updated entries) and one flushing
(writing out to disk). These are swapped (ie flushing begins) based on
memory used by the in-memory log trees and time since we last flushed
something.
The flushing facility monitors the amount of entries coming in and being
flushed out, and calibrates itself to try to flush enough each txg to
keep up with the ingest rate without competing too much with other IO.
Multiple tuneables are provided to control the flushing facility.
All the histograms and stats are update to accomodate the log as a
separate entry store. zdb gains knowledge of how to count them and dump
them. Documentation included!
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Co-authored-by: Allan Jude <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15895
Diffstat (limited to 'man')
-rw-r--r-- | man/man4/zfs.4 | 82 |
1 files changed, 82 insertions, 0 deletions
diff --git a/man/man4/zfs.4 b/man/man4/zfs.4 index 45b6c338a..aae3d7dfb 100644 --- a/man/man4/zfs.4 +++ b/man/man4/zfs.4 @@ -974,6 +974,88 @@ milliseconds until the operation completes. .It Sy zfs_dedup_prefetch Ns = Ns Sy 0 Ns | Ns 1 Pq int Enable prefetching dedup-ed blocks which are going to be freed. . +.It Sy zfs_dedup_log_flush_passes_max Ns = Ns Sy 8 Ns Pq uint +Maximum number of dedup log flush passes (iterations) each transaction. +.Pp +At the start of each transaction, OpenZFS will estimate how many entries it +needs to flush out to keep up with the change rate, taking the amount and time +taken to flush on previous txgs into account (see +.Sy zfs_dedup_log_flush_flow_rate_txgs ) . +It will spread this amount into a number of passes. +At each pass, it will use the amount already flushed and the total time taken +by flushing and by other IO to recompute how much it should do for the remainder +of the txg. +.Pp +Reducing the max number of passes will make flushing more aggressive, flushing +out more entries on each pass. +This can be faster, but also more likely to compete with other IO. +Increasing the max number of passes will put fewer entries onto each pass, +keeping the overhead of dedup changes to a minimum but possibly causing a large +number of changes to be dumped on the last pass, which can blow out the txg +sync time beyond +.Sy zfs_txg_timeout . +. +.It Sy zfs_dedup_log_flush_min_time_ms Ns = Ns Sy 1000 Ns Pq uint +Minimum time to spend on dedup log flush each transaction. +.Pp +At least this long will be spent flushing dedup log entries each transaction, +up to +.Sy zfs_txg_timeout . +This occurs even if doing so would delay the transaction, that is, other IO +completes under this time. +. +.It Sy zfs_dedup_log_flush_entries_min Ns = Ns Sy 1000 Ns Pq uint +Flush at least this many entries each transaction. +.Pp +OpenZFS will estimate how many entries it needs to flush each transaction to +keep up with the ingest rate (see +.Sy zfs_dedup_log_flush_flow_rate_txgs ) . +This sets the minimum for that estimate. +Raising it can force OpenZFS to flush more aggressively, keeping the log small +and so reducing pool import times, but can make it less able to back off if +log flushing would compete with other IO too much. +. +.It Sy zfs_dedup_log_flush_flow_rate_txgs Ns = Ns Sy 10 Ns Pq uint +Number of transactions to use to compute the flow rate. +.Pp +OpenZFS will estimate how many entries it needs to flush each transaction by +monitoring the number of entries changed (ingest rate), number of entries +flushed (flush rate) and time spent flushing (flush time rate) and combining +these into an overall "flow rate". +It will use an exponential weighted moving average over some number of recent +transactions to compute these rates. +This sets the number of transactions to compute these averages over. +Setting it higher can help to smooth out the flow rate in the face of spiky +workloads, but will take longer for the flow rate to adjust to a sustained +change in the ingress rate. +. +.It Sy zfs_dedup_log_txg_max Ns = Ns Sy 8 Ns Pq uint +Max transactions to before starting to flush dedup logs. +.Pp +OpenZFS maintains two dedup logs, one receiving new changes, one flushing. +If there is nothing to flush, it will accumulate changes for no more than this +many transactions before switching the logs and starting to flush entries out. +. +.It Sy zfs_dedup_log_mem_max Ns = Ns Sy 0 Ns Pq u64 +Max memory to use for dedup logs. +.Pp +OpenZFS will spend no more than this much memory on maintaining the in-memory +dedup log. +Flushing will begin when around half this amount is being spent on logs. +The default value of +.Sy 0 +will cause it to be set by +.Sy zfs_dedup_log_mem_max_percent +instead. +. +.It Sy zfs_dedup_log_mem_max_percent Ns = Ns Sy 1 Ns % Pq uint +Max memory to use for dedup logs, as a percentage of total memory. +.Pp +If +.Sy zfs_dedup_log_mem_max +is not set, it will be initialised as a percentage of the total memory in the +system. +. .It Sy zfs_delay_min_dirty_percent Ns = Ns Sy 60 Ns % Pq uint Start to delay each transaction once there is this amount of dirty data, expressed as a percentage of |