summaryrefslogtreecommitdiffstats
path: root/man
Commit message (Collapse)AuthorAgeFilesLines
* zpool iostat should print headers when terminal fillsDamian Wojsław2019-01-231-3/+9
| | | | | | | | | | | | | | | | When `zpool iostat` fills the terminal the headers should be printed again. `zpool iostat -n` can be used to suppress this. If the command is not attached to a tty, headers will not be printed so as to not break existing scripts. Reviewed-by: Joshua M. Clulow <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Damian Wojsław <[email protected]> Closes #8235 Closes #8262
* ztest: split block reconstructionBrian Behlendorf2019-01-162-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Increase the default allowed number of reconstruction attempts. There's not an exact right number for this setting. It needs to be set large enough to cover any realistic failure scenarios and small enough to avoid stalling the IO pipeline and invoking the dead man detection. The current value of 256 was empirically determined to be too low based on multi-day runs of ztest. The fault injection code would inject more damage than could be reconstructed given the relatively small number of attempts. However, in all observed cases the block could be reconstructed using a slightly higher limit. Based on local testing increasing the default value to 4096 was determined to strike the best balance. Checking all combinations takes less than 10s in the worst case, and has so far eliminated the vast majority of false positives detected by ztest. This delay is roughly on par with how long retries may be performed to a misbehaving HDD and was deemed to be reasonable. Better to err on the side of a brief delay rather than fail to reconstruct the data. Lastly, the -Y flag has been added to zdb to make it easy to try all possible combinations when performing split block reconstruction. For badly damaged blocks with 18 splits, they can be fully enumerated within a few minutes. This has been done to ensure permanent errors are never incorrectly reported when ztest verifies the pool with zdb. Reviewed by: Tom Caputi <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8271
* Disable 'zfs remap' commandBrian Behlendorf2019-01-152-15/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The implementation of 'zfs remap' has proven to be problematic since it modifies the objset (but not its logical contents) by dirtying metadata without owning it. The consequence of which is that dmu_objset_remap_indirects() is vulnerable to certain races. For example, if we are in the middle of receiving into the filesystem while it is being remapped. Then it is possible we could evict the objset when the receive completes (see dsl_dataset_clone_swap_sync_impl, or dmu_recv_end_sync), but dmu_objset_remap_indirects() may be still using the objset. The result of which would be a panic. Extended runs of ztest(8) have exposed other possible races which can occur when using 'zfs remap'. Several of these have been fixed but there may be others which have not yet been encountered and diagnosed. Furthermore, the ability to manually remap a filesystem is no longer particularly useful now that the removal code can map large chunks. Coupled with the fact that explaining what this command does and why it may be useful requires a detailed understanding of the internals of device removal. These are details users should not be bothered with. Therefore, the 'zfs remap' command is being disabled but not entirely removed. It may be removed in the future or potentially reworked to address the issues described above. Since 'zfs remap' has never been part of a tagged release its removal is expected to have minimal impact. The ZTS tests have been updated to continue to exercise the command to prevent atrophy, but it has been removed entirely from ztest(8). Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8238
* zfs.8 uses wrong snapshot names in Example 15Benjamin Gentil2019-01-071-4/+4
| | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: bunder2015 <[email protected]> Signed-off-by: Benjamin Gentil <[email protected]> Closes #8241
* Add 'zpool status -i' optionBrian Behlendorf2019-01-071-4/+6
| | | | | | | | | | | | | | | | | | | | | Only display the full details of the vdev initialization state in 'zpool status' output when requested with the -i option. By default display '(initializing)' after vdevs when they are being actively initialized. This is consistent with the established precident of appending '(resilvering), etc' and fits within the default 80 column terminal width making it easy to read. Additionally, updated the 'zpool initialize' documentation to make it clear the options are mutually exclusive, but allow duplicate options like all other zfs/zpool commands. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Tim Chase <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8230
* OpenZFS 9102 - zfs should be able to initialize storage devicesGeorge Wilson2019-01-072-0/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM ======== The first access to a block incurs a performance penalty on some platforms (e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are "thick provisioned", where supported by the platform (VMware). This can create a large delay in getting a new virtual machines up and running (or adding storage to an existing Engine). If the thick provision step is omitted, write performance will be suboptimal until all blocks on the LUN have been written. SOLUTION ========= This feature introduces a way to 'initialize' the disks at install or in the background to make sure we don't incur this first read penalty. When an entire LUN is added to ZFS, we make all space available immediately, and allow ZFS to find unallocated space and zero it out. This works with concurrent writes to arbitrary offsets, ensuring that we don't zero out something that has been (or is in the middle of being) written. This scheme can also be applied to existing pools (affecting only free regions on the vdev). Detailed design: - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...] - start, suspend, or cancel initialization - Creates new open-context thread for each vdev - Thread iterates through all metaslabs in this vdev - Each metaslab: - select a metaslab - load the metaslab - mark the metaslab as being zeroed - walk all free ranges within that metaslab and translate them to ranges on the leaf vdev - issue a "zeroing" I/O on the leaf vdev that corresponds to a free range on the metaslab we're working on - continue until all free ranges for this metaslab have been "zeroed" - reset/unmark the metaslab being zeroed - if more metaslabs exist, then repeat above tasks. - if no more metaslabs, then we're done. - progress for the initialization is stored on-disk in the vdev’s leaf zap object. The following information is stored: - the last offset that has been initialized - the state of the initialization process (i.e. active, suspended, or canceled) - the start time for the initialization - progress is reported via the zpool status command and shows information for each of the vdevs that are initializing Porting notes: - Added zfs_initialize_value module parameter to set the pattern written by "zpool initialize". - Added zfs_vdev_{initializing,removal}_{min,max}_active module options. Authored by: George Wilson <[email protected]> Reviewed by: John Wren Kennedy <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: loli10K <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Richard Lowe <[email protected]> Signed-off-by: Tim Chase <[email protected]> Ported-by: Tim Chase <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9102 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb Closes #8230
* Minor tweaks to zfs.8 man page for POSIX ACLsRichard Laager2018-12-261-6/+6
| | | | | | | | | | | | | | | | * Capitalize POSIX in POSIX ACLs. This change makes the POSIX in POSIX ACLs all caps, which is both correct and consistent with the rest of the man page. * Slightly reword part of zfs.8. I tweaked a sentence to add a missing comma, and as long as I was editing, removed a couple unnecessary words. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: bunder2015 <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8220
* Add enclosure_symlinks option to vdev_idTony Hutter2018-12-141-0/+34
| | | | | | | | | | | | | | | | Add an 'enclosure_symlinks' option to vdev_id.conf. This creates consistently named symlinks to the enclosure devices (/dev/sg*) based off the configuration in vdev_id.conf. The enclosure symlinks show up in /dev/by-enclosure/<prefix>-<channel><num>. The links make it make it easy to run sg_ses on a particular enclosure device. The enclosure links are created in addition to the normal /dev/disk/by-vdev links. 'enclosure_symlinks' is only valid in sas_direct configurations. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Simon Guest <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #8194
* OpenZFS 9963 - Separate tunable for disabling ZIL vdev flushPrakash Surya2018-12-071-4/+16
| | | | | | | | | | | | | | | | | | | | Porting Notes: * Add options to zfs-module-parameters(5) man page. * zfs_nocacheflush move to vdev.c instead of vdev_disk.c, since the latter doesn't get built for user space. Authored by: Prakash Surya <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Brad Lewis <[email protected]> Reviewed by: Patrick Mooney <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed by: George Melikov <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Signed-off-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9963 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f8fdf68125 Closes #8186
* Detect IO errors during device removalBrian Behlendorf2018-12-042-3/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Detect IO errors during device removal While device removal cannot verify the checksums of individual blocks during device removal, it can reasonably detect hard IO errors from the leaf vdevs. Failure to perform this error checking can result in device removal completing successfully, but moving no data which will permanently corrupt the pool. Situation 1: faulted/degraded vdevs In the configuration shown below, the removal of mirror-0 will permanently corrupt the pool. Device removal will preferentially copy data from 'vdev1 -> vdev3' and from 'vdev2 -> vdev4'. Which in this case will result in nothing being copied since one vdev in each of those groups in unavailable. However, device removal will complete successfully since all IO errors are ignored. tank DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 /var/tmp/vdev1 FAULTED 0 0 0 external fault /var/tmp/vdev2 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 /var/tmp/vdev3 ONLINE 0 0 0 /var/tmp/vdev4 FAULTED 0 0 0 external fault This issue is resolved by updating the source child selection logic to exclude unreadable leaf vdevs. Additionally, unwritable destination child vdevs which can never succeed are skipped to prevent generating a large number of write IO errors. Situation 2: individual hard IO errors During removal if an unexpected hard IO error is encountered when either reading or writing the child vdev the entire removal operation is cancelled. While it may be possible to reconstruct the data after removal that cannot be guaranteed. The only strictly safe thing to do is to cancel the removal. As a future improvement we may want to instead suspend the removal process and allow the damaged region to be retried. But that work is left for another time, hard IO errors during the removal process are expected to be exceptionally rare. Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #6900 Closes #8161
* Fix typo in update to zfs-module-parameters(5)Rich Ercolani2018-11-261-1/+1
| | | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: bunder2015 <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #8153
* man/zfs.8: document 'received' property sourceChristian Schwarz2018-11-201-2/+3
| | | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: Christian Schwarz <[email protected]> Closes #8134
* Add zpool status -s (slow I/Os) and -p (parseable)Tony Hutter2018-11-083-18/+27
| | | | | | | | | | | | | | | | | | This patch adds a new slow I/Os (-s) column to zpool status to show the number of VDEV slow I/Os. This is the number of I/Os that didn't complete in zio_slow_io_ms milliseconds. It also adds a new parsable (-p) flag to display exact values. NAME STATE READ WRITE CKSUM SLOW testpool ONLINE 0 0 0 - mirror-0 ONLINE 0 0 0 - loop0 ONLINE 0 0 0 20 loop1 ONLINE 0 0 0 0 Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #7756 Closes #6885
* Update zfs-events.5 with info from PSARC 2009/497Richard Elling2018-11-011-53/+89
| | | | | | | | | | Update zfs-events.5 with info from PSARC 2009/497 regarding ereport fields. Also updates ZIO_STAGE_* and ZIO_FLAG_* descriptions to match current source. Reviewed by: loli10K <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Elling <[email protected]> Closes #8057
* Defer new resilvers until the current one endsTom Caputi2018-10-182-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | Currently, if a resilver is triggered for any reason while an existing one is running, zfs will immediately restart the existing resilver from the beginning to include the new drive. This causes problems for system administrators when a drive fails while another is already resilvering. In this case, the optimal thing to do to reduce risk of data loss is to wait for the current resilver to end before immediately replacing the second failed drive, which allows the system to operate with two incomplete drives for the minimum amount of time. This patch introduces the resilver_defer feature that essentially does this for the admin without forcing them to wait and monitor the resilver manually. The change requires an on-disk feature since we must mark drives that are part of a deferred resilver in the vdev config to ensure that we do not assume they are done resilvering when an existing resilver completes. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: @mmaybee Signed-off-by: Tom Caputi <[email protected]> Closes #7732
* OpenZFS 9617 - too-frequent TXG sync causes excessive write inflationMatthew Ahrens2018-10-041-3/+5
| | | | | | | | | | | | | | | | | | | | Porting notes: * Renamed zfs_dirty_data_sync_pct to zfs_dirty_data_sync_percent and changed the type to be consistent with the other dirty module params. * Updated zfs-module-parameters.5 accordingly. Authored by: Matthew Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Brad Lewis <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Andrew Stormont <[email protected]> Reviewed-by: George Melikov <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://illumos.org/issues/9617 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7928f4ba Closes #7976
* OpenZFS 9763 - zfs(1M): broken formatting in allow/unallow descriptionYuri Pankov2018-10-021-2/+3
| | | | | | | | | | | | | | | | Porting notes: * Two of the three changes from the upstream patch were already applied for Linux. Only the last one is required. Authored by: Yuri Pankov <[email protected]> Reviewed by: Robert Mustacchi <[email protected]> Reviewed-by: George Melikov <[email protected]> Approved by: Gordon Ross <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://illumos.org/issues/9763 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8a702e55 Closes #7972
* Refine split block reconstructionBrian Behlendorf2018-10-011-1/+1
| | | | | | | | | | | | | | | | | | | | | Due to a flaw in 4589f3ae the number of unique combinations could be calculated incorrectly. This could result in the random combinations reconstruction being used when it would have been possible to check all combinations. This change fixes the unique combinations calculation and simplifies the reconstruction logic by maintaining a per- segment list of unique copies. The vdev_indirect_splits_damage() function was introduced to validate both the enumeration and random reconstruction logic with ztest. It is implemented such it will never make a known recoverable block unrecoverable. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #6900 Closes #7934
* Fix reference to zpool-features(5)DeHackEd2018-09-211-1/+1
| | | | | | | | Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: DHE <[email protected]> Closes #7938
* Fix allocation_classes GUID in zpool-features(5)DeHackEd2018-09-181-1/+1
| | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: DHE <[email protected]> Closes #7920
* Man page fixes - zpool/zfs optional parametersGregor Kopka2018-09-182-4/+4
| | | | | | | | | | The man pages for zpool and zfs (get command) listed the pool/dataset parameter as required, but these are optional. Fixed that. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Gregor Kopka <[email protected]> Closes #7916
* Clarify 'zpool remove' restrictionsBrian Behlendorf2018-09-171-4/+8
| | | | | | | | | | | Update zpool(8) to clarify what type of vdevs may be safely removed and that the existence of any top-level raidz device which is part of the primary pool will prevent device removal. Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7880 Closes #7893
* Fix 'zfs allow' for create time permissionsLOLi2018-09-061-2/+4
| | | | | | | | | | | | | | When no permission set is defined for a dataset the create time permissions are incorrectly shown as if they were a permission set. This change simply correct how allow permissions are displayed. This commit also fixes a small manpage formatting issue and adds the "zfs_allow_003_pos" test case to the ZFS Test Suite. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7519 Closes #7860
* Pool allocation classesDon Brady2018-09-054-0/+93
| | | | | | | | | | | | | | | | | | | | Allocation Classes add the ability to have allocation classes in a pool that are dedicated to serving specific block categories, such as DDT data, metadata, and small file blocks. A pool can opt-in to this feature by adding a 'special' or 'dedup' top-level VDEV. Reviewed by: Pavel Zakharov <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Håkan Johansson <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: DHE <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Gregor Kopka <[email protected]> Reviewed-by: Kash Pande <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #5182
* OpenZFS 9403 - assertion failed in arc_buf_destroy()Tom Caputi2018-08-292-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Assertion failed in arc_buf_destroy() when concurrently reading block with checksum error. Porting notes: * The ability to zinject decompression errors has been added, but this only works at the zio_decompress() level, where we have all of the info we need to match against the user's zinject options. * The decompress_fault test has been added to test the new zinject functionality * We attempted to set zio_decompress_fail_fraction to (1 << 18) in ztest for further test coverage. Although this did uncover a few low priority issues, this unfortuantely also causes ztest to ASSERT in many locations where the code is working correctly since it is designed to fail on IO errors. Developers can manually set this variable with the '-o' option to find and debug issues. Authored by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Matt Ahrens <[email protected]> Ported-by: Tom Caputi <[email protected]> OpenZFS-issue: https://illumos.org/issues/9403 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/fa98e487a9 Closes #7822
* Introduce read/write kstats per datasetSerapheim Dimitropoulos2018-08-202-1/+20
| | | | | | | | | | | | | The following patch introduces a few statistics on reads and writes grouped by dataset. These statistics are implemented as kstats (backed by aggregate sums for performance) and can be retrieved by using the dataset objset ID number. The motivation for this change is to provide some preliminary analytics on dataset usage/performance. Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #7705
* 'zfs holds' scripted mode is not documentedLOLi2018-08-181-2/+4
| | | | | | | | | | This change simply documents the existing "scripted mode" option in both command help and man page. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7798
* Added encryption support for zfs recv -o / -xTom Caputi2018-08-151-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | One small integration that was absent from b52563 was support for zfs recv -o / -x with regards to encryption parameters. The main use cases of this are as follows: * Receiving an unencrypted stream as encrypted without needing to create a "dummy" encrypted parent so that encryption can be inheritted. * Allowing users to change their keylocation on receive, so long as the receiving dataset is an encryption root. * Allowing users to explicitly exclude or override the encryption property from an unencrypted properties stream, allowing it to be received as encrypted. * Receiving a recursive heirarchy of unencrypted datasets, encrypting the top-level one and forcing all children to inherit the encryption. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Elling <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7650
* OpenZFS 8906 - uts: illumos rootfs should support salted cksumToomas Soome2018-07-272-16/+11
| | | | | | | | | | | | | | | | | | | | Porting notes: * As of grub-2.02 these checksums are not supported. However, as pointed out in #6501 there are alternatives such as EFISTUB which work and have no such restriction. A warning was added to the checksum property section of the zfs.8 man page. Authored by: Toomas Soome <[email protected]> Reviewed by: C Fraire <[email protected]> Reviewed by: Robert Mustacchi <[email protected]> Reviewed by: Yuri Pankov <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://illumos.org/issues/8906 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7dec52f Closes #6501 Closes #7714
* OpenZFS 9337 - zfs get all is slow due to uncached metadataMatthew Ahrens2018-07-121-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This project's goal is to make read-heavy channel programs and zfs(1m) administrative commands faster by caching all the metadata that they will need in the dbuf layer. This will prevent the data from being evicted, so that any future call to i.e. zfs get all won't have to go to disk (very much). There are two parts: The dbuf_metadata_cache. We identify what to put into the cache based on the object type of each dbuf. Caching objset properties os {version,normalization,utf8only,casesensitivity} in the objset_t. The reason these needed to be cached is that although they are queried frequently, they aren't stored in a dbuf type which we can easily recognize and cache in the dbuf layer; instead, we have to explicitly store them. There's already existing infrastructure for maintaining cached properties in the objset setup code, so I simply used that. Performance Testing: - Disabled kmem_flags - Tuned dbuf_cache_max_bytes very low (128K) - Tuned zfs_arc_max very low (64M) Created test pool with 400 filesystems, and 100 snapshots per filesystem. Later on in testing, added 600 more filesystems (with no snapshots) to make sure scaling didn't look different between snapshots and filesystems. Results: | Test | Time (trunk / diff) | I/Os (trunk / diff) | +------------------------+---------------------+---------------------+ | zpool import | 0:05 / 0:06 | 12.9k / 12.9k | | zfs get all (uncached) | 1:36 / 0:53 | 16.7k / 5.7k | | zfs get all (cached) | 1:36 / 0:51 | 16.0k / 6.0k | Authored by: Matthew Ahrens <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Richard Lowe <[email protected]> Ported-by: Alek Pinchuk <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> OpenZFS-issue: https://illumos.org/issues/9337 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7dec52f Closes #7668
* OpenZFS 9426 - metaslab size can exceed offset addressable by spacemapDon Brady2018-07-111-1/+12
| | | | | | | | | | | | Authored by: Don Brady <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Ported-by: Brian Behlendorf <[email protected]> Approved by: Dan McDonald <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9426 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f1c88afb1 Closes #7700
* OpenZFS 9330 - stack overflow when creating a deeply nested datasetSerapheim Dimitropoulos2018-07-092-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Datasets that are deeply nested (~100 levels) are impractical. We just put a limit of 50 levels to newly created datasets. Existing datasets should work without a problem. The problem can be seen by attempting to create a dataset using the -p option with many levels: panic[cpu0]/thread=ffffff01cd282c20: BAD TRAP: type=8 (#df Double fault) rp=ffffffff fffffffffbc3aa60 unix:die+100 () fffffffffbc3ab70 unix:trap+157d () ffffff00083d7020 unix:_patch_xrstorq_rbx+196 () ffffff00083d7050 zfs:dbuf_rele+2e () ... ffffff00083d7080 zfs:dsl_dir_close+32 () ffffff00083d70b0 zfs:dsl_dir_evict+30 () ffffff00083d70d0 zfs:dbuf_evict_user+4a () ffffff00083d7100 zfs:dbuf_rele_and_unlock+87 () ffffff00083d7130 zfs:dbuf_rele+2e () ... The block above repeats once per directory in the ... ... create -p command, working towards the root ... ffffff00083db9f0 zfs:dsl_dataset_drop_ref+19 () ffffff00083dba20 zfs:dsl_dataset_rele+42 () ffffff00083dba70 zfs:dmu_objset_prefetch+e4 () ffffff00083dbaa0 zfs:findfunc+23 () ffffff00083dbb80 zfs:dmu_objset_find_spa+38c () ffffff00083dbbc0 zfs:dmu_objset_find+40 () ffffff00083dbc20 zfs:zfs_ioc_snapshot_list_next+4b () ffffff00083dbcc0 zfs:zfsdev_ioctl+347 () ffffff00083dbd00 genunix:cdev_ioctl+45 () ffffff00083dbd40 specfs:spec_ioctl+5a () ffffff00083dbdc0 genunix:fop_ioctl+7b () ffffff00083dbec0 genunix:ioctl+18e () ffffff00083dbf10 unix:brand_sys_sysenter+1c9 () Porting notes: * Added zfs_max_dataset_nesting module option with documentation. * Updated zfs_rename_014_neg.ksh for Linux. * Increase the zfs.sh stack warning to 15K. Enough time has passed that 16K can be reasonably assumed to be the default value. It was increased in the 3.15 kernel released in June of 2014. Authored by: Serapheim Dimitropoulos <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Ported-by: Brian Behlendorf <[email protected]> Approved by: Garrett D'Amore <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9330 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/757a75a Closes #7681
* OpenZFS 9238 - ZFS Spacemap Encoding V2Serapheim Dimitropoulos2018-07-051-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Motivation ========== The current space map encoding has the following disadvantages: [1] Assuming 512 sector size each entry can represent at most 16MB for a segment. This makes the encoding very inefficient for large regions of space. [2] As vdev-wide space maps have started to be used by new features (i.e. device removal, zpool checkpoint) we've started imposing limits in the vdevs that can be used with them based on the maximum addressable offset (currently 64PB for a top-level vdev). New encoding ============ The layout can be found at space_map.h and it remains backwards compatible with the old one. The introduced two-word entry format, besides extending the limits imposed by the single-entry layout, also includes a vdev field and some extra padding after its prefix. The extra padding after the prefix should is reserved for future usage (e.g. new prefixes for future encodings or new fields for flags). The new vdev field not only makes the space maps more self-descriptive, but also opens the doors for pool-wide space maps (expected to be used in the log spacemap project). One final important note is that the number of bits used for vdevs is reduced to 24 bits for blkptrs. That was decided as we don't know of any setups that use more than 16M vdevs for the time being and we wanted to fit the vdev field in the space map. In addition that gives us some extra bits in dva_t. Other references: ================= The new encoding is also discussed towards the end of the Log Space Map presentation from 2017's OpenZFS summit. Link: https://www.youtube.com/watch?v=jj2IxRkl5bQ Authored by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Gordon Ross <[email protected]> Ported-by: Tim Chase <[email protected]> Signed-off-by: Tim Chase <[email protected]> OpenZFS-commit: https://github.com/openzfs/openzfs/commit/90a56e6d OpenZFS-issue: https://www.illumos.org/issues/9238 Closes #7665
* Fix formatting in zpool-features(5)Tim Chase2018-06-271-0/+5
| | | | | | | | | The formatting of the features beginning with large_blocks was broken when the zpool_checkpoint feature was added. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7658
* OpenZFS 9521 - Add checkpoint fieldEitan Adler2018-06-271-2/+2
| | | | | | | | | | | | | | | | | | Add checkpoint field in the default list of the zpool-list man page Authored by: Eitan Adler <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: kpande <[email protected]> Reviewed by: Tim Chase <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://illumos.org/issues/9521 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c5a860f7b Closes #7658
* OpenZFS 9166 - zfs storage pool checkpointSerapheim Dimitropoulos2018-06-264-3/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Details about the motivation of this feature and its usage can be found in this blogpost: https://sdimitro.github.io/post/zpool-checkpoint/ A lightning talk of this feature can be found here: https://www.youtube.com/watch?v=fPQA8K40jAM Implementation details can be found in big block comment of spa_checkpoint.c Side-changes that are relevant to this commit but not explained elsewhere: * renames members of "struct metaslab trees to be shorter without losing meaning * space_map_{alloc,truncate}() accept a block size as a parameter. The reason is that in the current state all space maps that we allocate through the DMU use a global tunable (space_map_blksz) which defauls to 4KB. This is ok for metaslab space maps in terms of bandwirdth since they are scattered all over the disk. But for other space maps this default is probably not what we want. Examples are device removal's vdev_obsolete_sm or vdev_chedkpoint_sm from this review. Both of these have a 1:1 relationship with each vdev and could benefit from a bigger block size. Porting notes: * The part of dsl_scan_sync() which handles async destroys has been moved into the new dsl_process_async_destroys() function. * Remove "VERIFY(!(flags & FWRITE))" in "kernel.c" so zhack can write to block device backed pools. * ZTS: * Fix get_txg() in zpool_sync_001_pos due to "checkpoint_txg". * Don't use large dd block sizes on /dev/urandom under Linux in checkpoint_capacity. * Adopt Delphix-OS's setting of 4 (spa_asize_inflation = SPA_DVAS_PER_BP + 1) for the checkpoint_capacity test to speed its attempts to fill the pool * Create the base and nested pools with sync=disabled to speed up the "setup" phase. * Clear labels in test pool between checkpoint tests to avoid duplicate pool issues. * The import_rewind_device_replaced test has been marked as "known to fail" for the reasons listed in its DISCLAIMER. * New module parameters: zfs_spa_discard_memory_limit, zfs_remove_max_bytes_pause (not documented - debugging only) vdev_max_ms_count (formerly metaslabs_per_vdev) vdev_min_ms_count Authored by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Richard Lowe <[email protected]> Ported-by: Tim Chase <[email protected]> Signed-off-by: Tim Chase <[email protected]> OpenZFS-issue: https://illumos.org/issues/9166 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7159fdb8 Closes #7570
* Fix duplicate "fB" typoajs1242018-06-251-1/+1
| | | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: bunder2015 <[email protected]> Signed-off-by: ajs124 <[email protected]> Closes #7649
* Add tunables for channel programsJohn Gallagher2018-06-151-0/+25
| | | | | | | | | | | | This patch adds tunables for modifying the maximum memory limit and maximum instruction limit that can be specified when running a channel program. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected] Reviewed-by: Sara Hartse <[email protected]> Signed-off-by: John Gallagher <[email protected]> External-issue: LX-1085 Closes #7618
* Tunable directory for zfs runtime scriptsAntonio Russo2018-06-073-3/+3
| | | | | | | | | | | | | zpool and zed place scripts in subdirectories of libexecdir. Some distributions locate architecture independent scripts in other locations (e.g. Debian). To avoid these paths getting out of sync, centralize the definitions. Build zfs-test's default.cfg by Makefile. Use the new directory logic building tests/zfs-tests/include/default.cfg.in. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #7597
* Minor documentation, logging, and testing typosAntonio Russo2018-06-071-6/+5
| | | | | | | | | This patch collects some minor inconsistencies and typos in the documentation, logging and testing infrastructure. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #7608
* Fix typoes in zpool man pagebunder20152018-06-041-4/+9
| | | | | | | | | Fixed some highlighting in the zpool man page Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: bunder2015 <[email protected]> Closes #7596
* Update build system and packagingBrian Behlendorf2018-05-292-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Minimal changes required to integrate the SPL sources in to the ZFS repository build infrastructure and packaging. Build system and packaging: * Renamed SPL_* autoconf m4 macros to ZFS_*. * Removed redundant SPL_* autoconf m4 macros. * Updated the RPM spec files to remove SPL package dependency. * The zfs package obsoletes the spl package, and the zfs-kmod package obsoletes the spl-kmod package. * The zfs-kmod-devel* packages were updated to add compatibility symlinks under /usr/src/spl-x.y.z until all dependent packages can be updated. They will be removed in a future release. * Updated copy-builtin script for in-kernel builds. * Updated DKMS package to include the spl.ko. * Updated stale AUTHORS file to include all contributors. * Updated stale COPYRIGHT and included the SPL as an exception. * Renamed README.markdown to README.md * Renamed OPENSOLARIS.LICENSE to LICENSE. * Renamed DISCLAIMER to NOTICE. Required code changes: * Removed redundant HAVE_SPL macro. * Removed _BOOT from nvpairs since it doesn't apply for Linux. * Initial header cleanup (removal of empty headers, refactoring). * Remove SPL repository clone/build from zimport.sh. * Use of DEFINE_RATELIMIT_STATE and DEFINE_SPINLOCK removed due to build issues when forcing C99 compilation. * Replaced legacy ACCESS_ONCE with READ_ONCE. * Include needed headers for `current` and `EXPORT_SYMBOL`. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> TEST_ZIMPORT_SKIP="yes" Closes #7556
* Merge branch 'zfsonlinux/merge-spl'Brian Behlendorf2018-05-291-0/+357
|\ | | | | | | | | | | | | | | | | | | Merge a minimal version of the zfsonlinux/spl repository in to the zfsonlinux/zfs repository. Care was taken to prevent file conflicts when merging and to preserve the spl repository history. The spl kernel module remains under the GPLv2 license as documented by the additional THIRDPARTYLICENSE.gplv2 file. Signed-off-by: Brian Behlendorf <[email protected]>
| * Prepare SPL repo to merge with ZFS repoBrian Behlendorf2018-05-294-103/+0
| | | | | | | | | | | | | | | | | | This commit removes everything from the repository except the core SPL implementation for Linux. Those files which remain have been moved to non-conflicting locations to facilitate the merge. The README.md and associated files have been updated accordingly. Signed-off-by: Brian Behlendorf <[email protected]>
| * Update spl module parameters man5 with missing parameter detailsabraunegg2017-10-271-1/+15
| | | | | | | | | | | | | | | | Update spl module parameters man5 with the following missing parameter details for spl_panic_halt. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alex Braunegg <[email protected]> Closes #664
| * spl-module-parameters.5 manpage: fix macroFabian-Gruenbichler2017-08-101-1/+1
| | | | | | | | | | | | | | | | | | | | There is no '.sh' macro in troff/groff/man, only '.SH' for section headers. I assume .sp for a line break was intended here like in the rest of the man page. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Closes #643
| * splat.1 manpage: fix spelling of 'hexadecimal'Fabian-Gruenbichler2017-08-101-4/+4
| | | | | | | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Closes #642
| * Limit number of tasks shown in taskq procChunwei Chen2016-12-011-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | To prevent holding tq_lock for too long. Before zfsonlinux/zfs@8e71ab9, hogging delay tasks and cat /proc/spl/taskq would easily cause a lockup. While that bug has been fixed. It's probably still a good idea to do this just in case task lists grow too large. Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #586
| * Allow kicking a taskq to spawn more threadsChunwei Chen2016-02-051-0/+14
| | | | | | | | | | | | | | | | | | | | | | This patch add a module parameter spl_taskq_kick. When writing non-zero value to it, it will scan all the taskq, if a taskq contains a task pending for more than 5 seconds, it will be forced to spawn a new thread. This is use as an emergency recovery from deadlock, not a general solution. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #529
| * Add spl_kmem_cache_kmem_threads man page entryBrian Behlendorf2016-01-121-0/+14
| | | | | | | | | | | | | | | | The spl_kmem_cache_kmem_threads module option was accidentally omitted from the documentation. Add it. Signed-off-by: Brian Behlendorf <[email protected]> Closes #512