diff options
author | Paul Dagnelie <[email protected]> | 2019-08-16 08:08:21 -0700 |
---|---|---|
committer | Brian Behlendorf <[email protected]> | 2019-08-16 09:08:21 -0600 |
commit | f09fda5071813751ba3fa77c28e588689795e17e (patch) | |
tree | 164564e5c5a88412d05477214c06cda69f929858 /module/zfs/vdev_trim.c | |
parent | 9323aad14d2f99d6fff1e50cce25fa6361495ec4 (diff) |
Cap metaslab memory usage
On systems with large amounts of storage and high fragmentation, a huge
amount of space can be used by storing metaslab range trees. Since
metaslabs are only unloaded during a txg sync, and only if they have
been inactive for 8 txgs, it is possible to get into a state where all
of the system's memory is consumed by range trees and metaslabs, and
txgs cannot sync. While ZFS knows how to evict ARC data when needed,
it has no such mechanism for range tree data. This can result in boot
hangs for some system configurations.
First, we add the ability to unload metaslabs outside of syncing
context. Second, we store a multilist of all loaded metaslabs, sorted
by their selection txg, so we can quickly identify the oldest
metaslabs. We use a multilist to reduce lock contention during heavy
write workloads. Finally, we add logic that will unload a metaslab
when we're loading a new metaslab, if we're using more than a certain
fraction of the available memory on range trees.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Sebastien Roy <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes #9128
Diffstat (limited to 'module/zfs/vdev_trim.c')
-rw-r--r-- | module/zfs/vdev_trim.c | 10 |
1 files changed, 5 insertions, 5 deletions
diff --git a/module/zfs/vdev_trim.c b/module/zfs/vdev_trim.c index 5ad47cccd..70b122a0a 100644 --- a/module/zfs/vdev_trim.c +++ b/module/zfs/vdev_trim.c @@ -837,7 +837,7 @@ vdev_trim_thread(void *arg) */ if (msp->ms_sm == NULL && vd->vdev_trim_partial) { mutex_exit(&msp->ms_lock); - metaslab_enable(msp, B_FALSE); + metaslab_enable(msp, B_FALSE, B_FALSE); spa_config_enter(spa, SCL_CONFIG, FTAG, RW_READER); vdev_trim_calculate_progress(vd); continue; @@ -849,7 +849,7 @@ vdev_trim_thread(void *arg) mutex_exit(&msp->ms_lock); error = vdev_trim_ranges(&ta); - metaslab_enable(msp, B_TRUE); + metaslab_enable(msp, B_TRUE, B_FALSE); spa_config_enter(spa, SCL_CONFIG, FTAG, RW_READER); range_tree_vacate(ta.trim_tree, NULL, NULL); @@ -1154,7 +1154,7 @@ vdev_autotrim_thread(void *arg) if (msp->ms_sm == NULL || range_tree_is_empty(msp->ms_trim)) { mutex_exit(&msp->ms_lock); - metaslab_enable(msp, B_FALSE); + metaslab_enable(msp, B_FALSE, B_FALSE); continue; } @@ -1170,7 +1170,7 @@ vdev_autotrim_thread(void *arg) */ if (msp->ms_disabled > 1) { mutex_exit(&msp->ms_lock); - metaslab_enable(msp, B_FALSE); + metaslab_enable(msp, B_FALSE, B_FALSE); continue; } @@ -1288,7 +1288,7 @@ vdev_autotrim_thread(void *arg) range_tree_vacate(trim_tree, NULL, NULL); range_tree_destroy(trim_tree); - metaslab_enable(msp, issued_trim); + metaslab_enable(msp, issued_trim, B_FALSE); spa_config_enter(spa, SCL_CONFIG, FTAG, RW_READER); for (uint64_t c = 0; c < children; c++) { |