diff options
author | Alexander Motin <[email protected]> | 2022-04-26 13:44:21 -0400 |
---|---|---|
committer | Brian Behlendorf <[email protected]> | 2022-07-26 10:10:37 -0700 |
commit | dd9c110ab5d3eb2bf38e2849f66f1534401f605b (patch) | |
tree | b47d3a9989acfc0752d496e5b486aa6cc355a1b4 /man | |
parent | fdb80a230168045fcd9977cb0425109aa63963d3 (diff) |
Improve log spacemap load time
Previous flushing algorithm limited only total number of log blocks to
the minimum of 256K and 4x number of metaslabs in the pool. As result,
system with 1500 disks with 1000 metaslabs each, touching several new
metaslabs each TXG could grow spacemap log to huge size without much
benefits. We've observed one of such systems importing pool for about
45 minutes.
This patch improves the situation from five sides:
- By limiting maximum period for each metaslab to be flushed to 1000
TXGs, that effectively limits maximum number of per-TXG spacemap logs
to load to the same number.
- By making flushing more smooth via accounting number of metaslabs
that were touched after the last flush and actually need another flush,
not just ms_unflushed_txg bump.
- By applying zfs_unflushed_log_block_pct to the number of metaslabs
that were touched after the last flush, not all metaslabs in the pool.
- By aggressively prefetching per-TXG spacemap logs up to 16 TXGs in
advance, making log spacemap load process for wide HDD pool CPU-bound,
accelerating it by many times.
- By reducing zfs_unflushed_log_block_max from 256K to 128K, reducing
single-threaded by nature log processing time from ~10 to ~5 minutes.
As further optimization we could skip bumping ms_unflushed_txg for
metaslabs not touched since the last flush, but that would be an
incompatible change, requiring new pool feature.
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored-By: iXsystems, Inc.
Closes #12789
Diffstat (limited to 'man')
-rw-r--r-- | man/man4/zfs.4 | 13 |
1 files changed, 9 insertions, 4 deletions
diff --git a/man/man4/zfs.4 b/man/man4/zfs.4 index 3eeed8f43..5e4587285 100644 --- a/man/man4/zfs.4 +++ b/man/man4/zfs.4 @@ -966,13 +966,13 @@ log spacemap in memory, in bytes. Part of overall system memory that ZFS allows to be used for unflushed metadata changes by the log spacemap, in millionths. . -.It Sy zfs_unflushed_log_block_max Ns = Ns Sy 262144 Po 256k Pc Pq ulong +.It Sy zfs_unflushed_log_block_max Ns = Ns Sy 131072 Po 128k Pc Pq ulong Describes the maximum number of log spacemap blocks allowed for each pool. The default value means that the space in all the log spacemaps can add up to no more than -.Sy 262144 +.Sy 131072 blocks (which means -.Em 32GB +.Em 16GB of logical space before compression and ditto blocks, assuming that blocksize is .Em 128kB ) . @@ -1002,7 +1002,12 @@ Thus we always allow at least this many log blocks. .It Sy zfs_unflushed_log_block_pct Ns = Ns Sy 400 Ns % Pq ulong Tunable used to determine the number of blocks that can be used for the spacemap log, expressed as a percentage of the total number of -metaslabs in the pool. +unflushed metaslabs in the pool. +. +.It Sy zfs_unflushed_log_txg_max Ns = Ns Sy 1000 Pq ulong +Tunable limiting maximum time in TXGs any metaslab may remain unflushed. +It effectively limits maximum number of unflushed per-TXG spacemap logs +that need to be read after unclean pool export. . .It Sy zfs_unlink_suspend_progress Ns = Ns Sy 0 Ns | Ns 1 Pq uint When enabled, files will not be asynchronously removed from the list of pending |