summaryrefslogtreecommitdiffstats
path: root/include/sys/spa_impl.h
diff options
context:
space:
mode:
authorDon Brady <[email protected]>2016-10-13 18:59:18 -0600
committerBrian Behlendorf <[email protected]>2016-10-13 17:59:18 -0700
commit3dfb57a35e8cbaa7c424611235d669f3c575ada1 (patch)
treed0958fdc57be43a540bba035580f0d8b39f1a99c /include/sys/spa_impl.h
parenta85a90557dfc70e09475c156a376f6923a6c89f0 (diff)
OpenZFS 7090 - zfs should throttle allocations
OpenZFS 7090 - zfs should throttle allocations Authored by: George Wilson <[email protected]> Reviewed by: Alex Reece <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Sebastien Roy <[email protected]> Approved by: Matthew Ahrens <[email protected]> Ported-by: Don Brady <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> When write I/Os are issued, they are issued in block order but the ZIO pipeline will drive them asynchronously through the allocation stage which can result in blocks being allocated out-of-order. It would be nice to preserve as much of the logical order as possible. In addition, the allocations are equally scattered across all top-level VDEVs but not all top-level VDEVs are created equally. The pipeline should be able to detect devices that are more capable of handling allocations and should allocate more blocks to those devices. This allows for dynamic allocation distribution when devices are imbalanced as fuller devices will tend to be slower than empty devices. The change includes a new pool-wide allocation queue which would throttle and order allocations in the ZIO pipeline. The queue would be ordered by issued time and offset and would provide an initial amount of allocation of work to each top-level vdev. The allocation logic utilizes a reservation system to reserve allocations that will be performed by the allocator. Once an allocation is successfully completed it's scheduled on a given top-level vdev. Each top-level vdev maintains a maximum number of allocations that it can handle (mg_alloc_queue_depth). The pool-wide reserved allocations (top-levels * mg_alloc_queue_depth) are distributed across the top-level vdevs metaslab groups and round robin across all eligible metaslab groups to distribute the work. As top-levels complete their work, they receive additional work from the pool-wide allocation queue until the allocation queue is emptied. OpenZFS-issue: https://www.illumos.org/issues/7090 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4756c3d7 Closes #5258 Porting Notes: - Maintained minimal stack in zio_done - Preserve linux-specific io sizes in zio_write_compress - Added module params and documentation - Updated to use optimize AVL cmp macros
Diffstat (limited to 'include/sys/spa_impl.h')
-rw-r--r--include/sys/spa_impl.h2
1 files changed, 2 insertions, 0 deletions
diff --git a/include/sys/spa_impl.h b/include/sys/spa_impl.h
index cb1d16ad5..88bde98dc 100644
--- a/include/sys/spa_impl.h
+++ b/include/sys/spa_impl.h
@@ -165,6 +165,8 @@ struct spa {
uint64_t spa_last_synced_guid; /* last synced guid */
list_t spa_config_dirty_list; /* vdevs with dirty config */
list_t spa_state_dirty_list; /* vdevs with dirty state */
+ kmutex_t spa_alloc_lock;
+ avl_tree_t spa_alloc_tree;
spa_aux_vdev_t spa_spares; /* hot spares */
spa_aux_vdev_t spa_l2cache; /* L2ARC cache devices */
nvlist_t *spa_label_features; /* Features for reading MOS */