aboutsummaryrefslogtreecommitdiffstats
path: root/module/zfs/spa_log_spacemap.c
Commit message (Collapse)AuthorAgeFilesLines
* Extend import_progress kstat with a notes fieldDon Brady2023-12-051-3/+9
| | | | | | | | | | | | | Detail the import progress of log spacemaps as they can take a very long time. Also grab the spa_note() messages to, as they provide insight into what is happening Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Co-authored-by: Allan Jude <[email protected]> Closes #15539
* Refactor dmu_prefetch().Alexander Motin2023-08-071-2/+2
| | | | | | | | | | | | | | | | | | - Split dmu_prefetch_dnode() from dmu_prefetch() into a separate function. It is quite inconvenient to read the code where len = 0 means dnode prefetch instead indirect/data prefetch. One function doing both has no benefits, since the code paths are independent. - Improve dmu_prefetch() handling of long block ranges. Instead of limiting L0 data length to prefetch for to dmu_prefetch_max, make dmu_prefetch_max limit the actual amount of prefetch at the specified level, and, if there is more, prefetch all the rest at higher indirection level. It should improve random access times within the prefetched range of any length, reducing importance of specific dmu_prefetch_max value. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #15076
* Address warnings about possible division by zero from clangsaRichard Yao2022-11-031-1/+2
| | | | | | | | | | | | | | | | * The complaint in ztest_replay_write() is only possible if something went horribly wrong. An assertion will silence this and if it goes off, we will know that something is wrong. * The complaint in spa_estimate_metaslabs_to_flush() is not impossible, but seems very unlikely. We resolve this by passing the value from the `MIN()` that does not go to infinity when the variable is zero. There was a third report from Clang's scan-build, but that was a definite false positive and disappeared when checked again through Clang's static analyzer with Z3 refution via CodeChecker. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14124
* Cleanup: 64-bit kernel module parameters should use fixed width typesRichard Yao2022-10-131-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Various module parameters such as `zfs_arc_max` were originally `uint64_t` on OpenSolaris/Illumos, but were changed to `unsigned long` for Linux compatibility because Linux's kernel default module parameter implementation did not support 64-bit types on 32-bit platforms. This caused problems when porting OpenZFS to Windows because its LLP64 memory model made `unsigned long` a 32-bit type on 64-bit, which created the undesireable situation that parameters that should accept 64-bit values could not on 64-bit Windows. Upon inspection, it turns out that the Linux kernel module parameter interface is extensible, such that we are allowed to define our own types. Rather than maintaining the original type change via hacks to to continue shrinking module parameters on 32-bit Linux, we implement support for 64-bit module parameters on Linux. After doing a review of all 64-bit kernel parameters (found via the man page and also proposed changes by Andrew Innes), the kernel module parameters fell into a few groups: Parameters that were originally 64-bit on Illumos: * dbuf_cache_max_bytes * dbuf_metadata_cache_max_bytes * l2arc_feed_min_ms * l2arc_feed_secs * l2arc_headroom * l2arc_headroom_boost * l2arc_write_boost * l2arc_write_max * metaslab_aliquot * metaslab_force_ganging * zfetch_array_rd_sz * zfs_arc_max * zfs_arc_meta_limit * zfs_arc_meta_min * zfs_arc_min * zfs_async_block_max_blocks * zfs_condense_max_obsolete_bytes * zfs_condense_min_mapping_bytes * zfs_deadman_checktime_ms * zfs_deadman_synctime_ms * zfs_initialize_chunk_size * zfs_initialize_value * zfs_lua_max_instrlimit * zfs_lua_max_memlimit * zil_slog_bulk Parameters that were originally 32-bit on Illumos: * zfs_per_txg_dirty_frees_percent Parameters that were originally `ssize_t` on Illumos: * zfs_immediate_write_sz Note that `ssize_t` is `int32_t` on 32-bit and `int64_t` on 64-bit. It has been upgraded to 64-bit. Parameters that were `long`/`unsigned long` because of Linux/FreeBSD influence: * l2arc_rebuild_blocks_min_l2size * zfs_key_max_salt_uses * zfs_max_log_walking * zfs_max_logsm_summary_length * zfs_metaslab_max_size_cache_sec * zfs_min_metaslabs_to_flush * zfs_multihost_interval * zfs_unflushed_log_block_max * zfs_unflushed_log_block_min * zfs_unflushed_log_block_pct * zfs_unflushed_max_mem_amt * zfs_unflushed_max_mem_ppm New parameters that do not exist in Illumos: * l2arc_trim_ahead * vdev_file_logical_ashift * vdev_file_physical_ashift * zfs_arc_dnode_limit * zfs_arc_dnode_limit_percent * zfs_arc_dnode_reduce_percent * zfs_arc_meta_limit_percent * zfs_arc_sys_free * zfs_deadman_ziotime_ms * zfs_delete_blocks * zfs_history_output_max * zfs_livelist_max_entries * zfs_max_async_dedup_frees * zfs_max_nvlist_src_size * zfs_rebuild_max_segment * zfs_rebuild_vdev_limit * zfs_unflushed_log_txg_max * zfs_vdev_max_auto_ashift * zfs_vdev_min_auto_ashift * zfs_vnops_read_chunk_size * zvol_max_discard_blocks Rather than clutter the lists with commentary, the module parameters that need comments are repeated below. A few parameters were defined in Linux/FreeBSD specific code, where the use of ulong/long is not an issue for portability, so we leave them alone: * zfs_delete_blocks * zfs_key_max_salt_uses * zvol_max_discard_blocks The documentation for a few parameters was found to be incorrect: * zfs_deadman_checktime_ms - incorrectly documented as int * zfs_delete_blocks - not documented as Linux only * zfs_history_output_max - incorrectly documented as int * zfs_vnops_read_chunk_size - incorrectly documented as long * zvol_max_discard_blocks - incorrectly documented as ulong The documentation for these has been fixed, alongside the changes to document the switch to fixed width types. In addition, several kernel module parameters were percentages or held ashift values, so being 64-bit never made sense for them. They have been downgraded to 32-bit: * vdev_file_logical_ashift * vdev_file_physical_ashift * zfs_arc_dnode_limit_percent * zfs_arc_dnode_reduce_percent * zfs_arc_meta_limit_percent * zfs_per_txg_dirty_frees_percent * zfs_unflushed_log_block_pct * zfs_vdev_max_auto_ashift * zfs_vdev_min_auto_ashift Of special note are `zfs_vdev_max_auto_ashift` and `zfs_vdev_min_auto_ashift`, which were already defined as `uint64_t`, and passed to the kernel as `ulong`. This is inherently buggy on big endian 32-bit Linux, since the values would not be written to the correct locations. 32-bit FreeBSD was unaffected because its sysctl code correctly treated this as a `uint64_t`. Lastly, a code comment suggests that `zfs_arc_sys_free` is Linux-specific, but there is nothing to indicate to me that it is Linux-specific. Nothing was done about that. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Original-patch-by: Andrew Innes <[email protected]> Original-patch-by: Jorgen Lundman <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #13984 Closes #14004
* Add defensive assertionsRichard Yao2022-10-121-0/+1
| | | | | | | | | | | | | | | | | Coverity complains about possible bugs involving referencing NULL return values and division by zero. The division by zero bugs require that a block pointer be corrupt, either from in-memory corruption, or on-disk corruption. The NULL return value complaints are only bugs if assumptions that we make about the state of data structures are wrong. Some seem impossible to be wrong and thus are false positives, while others are hard to analyze. Rather than dismiss these as false positives by assuming we know better, we add defensive assertions to let us know when our assumptions are wrong. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #13972
* Cleanup: Use OpenSolaris functions to call schedulerRichard Yao2022-09-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In our codebase, `cond_resched() and `schedule()` are Linux kernel functions that have replaced the OpenSolaris `kpreempt()` functions in the codebase to such an extent that `kpreempt()` in zfs_context.h was broken. Nobody noticed because we did not actually use it. The header had defined `kpreempt()` as `yield()`, which works on OpenSolaris and Illumos where `sched_yield()` is a wrapper for `yield()`, but that does not work on any other platform. The FreeBSD platform specific code implemented shims for these, but the shim for `schedule()` forced us to wait, which is different than merely rescheduling to another thread as the original Linux code does, while the shim for `cond_resched()` had the same definition as its kernel kpreempt() shim. After studying this, I have concluded that we should reintroduce the kpreempt() function in platform independent code with the following definitions: - In the Linux kernel: kpreempt(unused) -> cond_resched() - In the FreeBSD kernel: kpreempt(unused) -> kern_yield(PRI_USER) - In userspace: kpreempt(unused) -> sched_yield() In userspace, nothing changes from this cleanup. In the kernels, the function `fm_fini()` will now call `kern_yield(PRI_USER)` on FreeBSD and `cond_resched()` on Linux. This is instead of `pause("schedule", 1)` on FreeBSD and `schedule()` on Linux. This makes our behavior consistent across platforms. Note that Linux's SPL continues to use `cond_resched()` and `schedule()`. However, those functions have been removed from both the FreeBSD code and userspace code. This should have the benefit of making it slightly easier to port the code to new platforms by making how things should be mapped less confusing. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Neal Gompa <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #13845
* Replace dead opensolaris.org license linkTino Reichardt2022-07-111-1/+1
| | | | | | | | | The commit replaces all findings of the link: http://www.opensolaris.org/os/licensing with this one: https://opensource.org/licenses/CDDL-1.0 Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13619
* Remove wrong assertion in log spacemapAlexander Motin2022-06-011-6/+0
| | | | | | | | | | | | | | | | | | | It is typical, but not generally true that if log summary has more blocks it must also have unflushed metaslabs. Normally with metaslabs flushed in order it works, but there are known exceptions, such as device removal or metaslab being loaded during its flush attempt. Before 600a02b8844 if spa_flush_metaslabs() hit loading metaslab it usually stopped (unless memlimit is also exceeded), but now it may flush more metaslabs, just skipping that particular one. This increased chances of assertion to fire when the skipped metaslab is flushed on next iteration if all other metaslabs in that summary entry are already flushed out of order. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #13486 Closes #13513
* Improve log spacemap load timeAlexander Motin2022-04-261-74/+157
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous flushing algorithm limited only total number of log blocks to the minimum of 256K and 4x number of metaslabs in the pool. As result, system with 1500 disks with 1000 metaslabs each, touching several new metaslabs each TXG could grow spacemap log to huge size without much benefits. We've observed one of such systems importing pool for about 45 minutes. This patch improves the situation from five sides: - By limiting maximum period for each metaslab to be flushed to 1000 TXGs, that effectively limits maximum number of per-TXG spacemap logs to load to the same number. - By making flushing more smooth via accounting number of metaslabs that were touched after the last flush and actually need another flush, not just ms_unflushed_txg bump. - By applying zfs_unflushed_log_block_pct to the number of metaslabs that were touched after the last flush, not all metaslabs in the pool. - By aggressively prefetching per-TXG spacemap logs up to 16 TXGs in advance, making log spacemap load process for wide HDD pool CPU-bound, accelerating it by many times. - By reducing zfs_unflushed_log_block_max from 256K to 128K, reducing single-threaded by nature log processing time from ~10 to ~5 minutes. As further optimization we could skip bumping ms_unflushed_txg for metaslabs not touched since the last flush, but that would be an incompatible change, requiring new pool feature. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12789
* Clean up CSTYLEDsнаб2022-01-261-22/+22
| | | | | | | | | | | | | | | | | | | | 69 CSTYLED BEGINs remain, appx. 30 of which can be removed if cstyle(1) had a useful policy regarding CALL(ARG1, ARG2, ARG3); above 2 lines. As it stands, it spits out *both* sysctl_os.c: 385: continuation line should be indented by 4 spaces sysctl_os.c: 385: indent by spaces instead of tabs which is very cool Another >10 could be fixed by removing "ulong" &al. handling. I don't foresee anyone actually using it intentionally (does it even exist in modern headers? why did it in the first place?). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #12993
* module/*.ko: prune .data, global .rodataнаб2022-01-141-9/+9
| | | | | | | | | | | | Evaluated every variable that lives in .data (and globals in .rodata) in the kernel modules, and constified/eliminated/localised them appropriately. This means that all read-only data is now actually read-only data, and, if possible, at file scope. A lot of previously- global-symbols became inlinable (and inlined!) constants. Probably not in a big Wowee Performance Moment, but hey. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #12899
* Replace /*PRINTFLIKEn*/ with attribute(printf)наб2021-07-261-2/+2
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Issue #12201
* Balance parentheses in parameter descriptionspstef2021-04-111-1/+1
| | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Piotr Paweł Stefaniak <[email protected]> Closes #11882
* Make module tunables cross platformMatthew Macy2019-09-051-41/+28
| | | | | | | | | | | Adds ZFS_MODULE_PARAM to abstract module parameter setting to operating systems other than Linux. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9230
* Fix typos in module/zfs/Andrea Gelmini2019-09-021-3/+3
| | | | | | | | Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #9240
* Cap metaslab memory usagePaul Dagnelie2019-08-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | On systems with large amounts of storage and high fragmentation, a huge amount of space can be used by storing metaslab range trees. Since metaslabs are only unloaded during a txg sync, and only if they have been inactive for 8 txgs, it is possible to get into a state where all of the system's memory is consumed by range trees and metaslabs, and txgs cannot sync. While ZFS knows how to evict ARC data when needed, it has no such mechanism for range tree data. This can result in boot hangs for some system configurations. First, we add the ability to unload metaslabs outside of syncing context. Second, we store a multilist of all loaded metaslabs, sorted by their selection txg, so we can quickly identify the oldest metaslabs. We use a multilist to reduce lock contention during heavy write workloads. Finally, we add logic that will unload a metaslab when we're loading a new metaslab, if we're using more than a certain fraction of the available memory on range trees. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Sebastien Roy <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #9128
* Sort log spacemap tunables in alphabetical orderSerapheim Dimitropoulos2019-08-121-32/+32
| | | | | | | | | | | Beside the whole commit being a nit in reality it should bring the diffs of the spa_log_spacemap.c source file between ZoL and delphix/zfs to 0. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #9143
* 9072 handle error of zap_cursor_retrieve() for log spacemap zapSerapheim Dimitropoulos2019-07-301-2/+28
| | | | | | | | | | | | In spa_ld_log_sm_metadata(), it is possible for zap_cursor_retrieve() to return errors other than the expected ENOENT (e.g. when we are at the end of the zap). Ensure that these error cases are handled correctly by the import path. Reviewed by: Brian Behlendorf <[email protected]> Reviewed by: Sara Hartse <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #9074
* Log Spacemap ProjectSerapheim Dimitropoulos2019-07-161-0/+1308
= Motivation At Delphix we've seen a lot of customer systems where fragmentation is over 75% and random writes take a performance hit because a lot of time is spend on I/Os that update on-disk space accounting metadata. Specifically, we seen cases where 20% to 40% of sync time is spend after sync pass 1 and ~30% of the I/Os on the system is spent updating spacemaps. The problem is that these pools have existed long enough that we've touched almost every metaslab at least once, and random writes scatter frees across all metaslabs every TXG, thus appending to their spacemaps and resulting in many I/Os. To give an example, assuming that every VDEV has 200 metaslabs and our writes fit within a single spacemap block (generally 4K) we have 200 I/Os. Then if we assume 2 levels of indirection, we need 400 additional I/Os and since we are talking about metadata for which we keep 2 extra copies for redundancy we need to triple that number, leading to a total of 1800 I/Os per VDEV every TXG. We could try and decrease the number of metaslabs so we have less I/Os per TXG but then each metaslab would cover a wider range on disk and thus would take more time to be loaded in memory from disk. In addition, after it's loaded, it's range tree would consume more memory. Another idea would be to just increase the spacemap block size which would allow us to fit more entries within an I/O block resulting in fewer I/Os per metaslab and a speedup in loading time. The problem is still that we don't deal with the number of I/Os going up as the number of metaslabs is increasing and the fact is that we generally write a lot to a few metaslabs and a little to the rest of them. Thus, just increasing the block size would actually waste bandwidth because we won't be utilizing our bigger block size. = About this patch This patch introduces the Log Spacemap project which provides the solution to the above problem while taking into account all the aforementioned tradeoffs. The details on how it achieves that can be found in the references sections below and in the code (see Big Theory Statement in spa_log_spacemap.c). Even though the change is fairly constraint within the metaslab and lower-level SPA codepaths, there is a side-change that is user-facing. The change is that VDEV IDs from VDEV holes will no longer be reused. To give some background and reasoning for this, when a log device is removed and its VDEV structure was replaced with a hole (or was compacted; if at the end of the vdev array), its vdev_id could be reused by devices added after that. Now with the pool-wide space maps recording the vdev ID, this behavior can cause problems (e.g. is this entry referring to a segment in the new vdev or the removed log?). Thus, to simplify things the ID reuse behavior is gone and now vdev IDs for top-level vdevs are truly unique within a pool. = Testing The illumos implementation of this feature has been used internally for a year and has been in production for ~6 months. For this patch specifically there don't seem to be any regressions introduced to ZTS and I have been running zloop for a week without any related problems. = Performance Analysis (Linux Specific) All performance results and analysis for illumos can be found in the links of the references. Redoing the same experiments in Linux gave similar results. Below are the specifics of the Linux run. After the pool reached stable state the percentage of the time spent in pass 1 per TXG was 64% on average for the stock bits while the log spacemap bits stayed at 95% during the experiment (graph: sdimitro.github.io/img/linux-lsm/PercOfSyncInPassOne.png). Sync times per TXG were 37.6 seconds on average for the stock bits and 22.7 seconds for the log spacemap bits (related graph: sdimitro.github.io/img/linux-lsm/SyncTimePerTXG.png). As a result the log spacemap bits were able to push more TXGs, which is also the reason why all graphs quantified per TXG have more entries for the log spacemap bits. Another interesting aspect in terms of txg syncs is that the stock bits had 22% of their TXGs reach sync pass 7, 55% reach sync pass 8, and 20% reach 9. The log space map bits reached sync pass 4 in 79% of their TXGs, sync pass 7 in 19%, and sync pass 8 at 1%. This emphasizes the fact that not only we spend less time on metadata but we also iterate less times to convergence in spa_sync() dirtying objects. [related graphs: stock- sdimitro.github.io/img/linux-lsm/NumberOfPassesPerTXGStock.png lsm- sdimitro.github.io/img/linux-lsm/NumberOfPassesPerTXGLSM.png] Finally, the improvement in IOPs that the userland gains from the change is approximately 40%. There is a consistent win in IOPS as you can see from the graphs below but the absolute amount of improvement that the log spacemap gives varies within each minute interval. sdimitro.github.io/img/linux-lsm/StockVsLog3Days.png sdimitro.github.io/img/linux-lsm/StockVsLog10Hours.png = Porting to Other Platforms For people that want to port this commit to other platforms below is a list of ZoL commits that this patch depends on: Make zdb results for checkpoint tests consistent db587941c5ff6dea01932bb78f70db63cf7f38ba Update vdev_is_spacemap_addressable() for new spacemap encoding 419ba5914552c6185afbe1dd17b3ed4b0d526547 Simplify spa_sync by breaking it up to smaller functions 8dc2197b7b1e4d7ebc1420ea30e51c6541f1d834 Factor metaslab_load_wait() in metaslab_load() b194fab0fb6caad18711abccaff3c69ad8b3f6d3 Rename range_tree_verify to range_tree_verify_not_present df72b8bebe0ebac0b20e0750984bad182cb6564a Change target size of metaslabs from 256GB to 16GB c853f382db731e15a87512f4ef1101d14d778a55 zdb -L should skip leak detection altogether 21e7cf5da89f55ce98ec1115726b150e19eefe89 vs_alloc can underflow in L2ARC vdevs 7558997d2f808368867ca7e5234e5793446e8f3f Simplify log vdev removal code 6c926f426a26ffb6d7d8e563e33fc176164175cb Get rid of space_map_update() for ms_synced_length 425d3237ee88abc53d8522a7139c926d278b4b7f Introduce auxiliary metaslab histograms 928e8ad47d3478a3d5d01f0dd6ae74a9371af65e Error path in metaslab_load_impl() forgets to drop ms_sync_lock 8eef997679ba54547f7d361553d21b3291f41ae7 = References Background, Motivation, and Internals of the Feature - OpenZFS 2017 Presentation: youtu.be/jj2IxRkl5bQ - Slides: slideshare.net/SerapheimNikolaosDim/zfs-log-spacemaps-project Flushing Algorithm Internals & Performance Results (Illumos Specific) - Blogpost: sdimitro.github.io/post/zfs-lsm-flushing/ - OpenZFS 2018 Presentation: youtu.be/x6D2dHRjkxw - Slides: slideshare.net/SerapheimNikolaosDim/zfs-log-spacemap-flushing-algorithm Upstream Delphix Issues: DLPX-51539, DLPX-59659, DLPX-57783, DLPX-61438, DLPX-41227, DLPX-59320 DLPX-63385 Reviewed-by: Sean Eric Fagan <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8442