summaryrefslogtreecommitdiffstats
path: root/cmd
Commit message (Collapse)AuthorAgeFilesLines
* grammar: it is / plural agreementJosh Soref2019-05-281-2/+2
| | | | | | | Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Signed-off-by: Josh Soref <[email protected]> Closes #8818
* Update comments to match codeRyan Moeller2019-05-281-6/+6
| | | | | | | | | | | | s/get_vdev_spec/make_root_vdev The former doesn't exist anymore. Sponsored by: iXsystems, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #8759
* Endless loop in zpool_do_remove() on platforms with unsigned charloli10K2019-05-283-4/+4
| | | | | | | | | | | | On systems where "char" is an unsigned type the value returned by getopt() will never be negative (-1), leading to an endless loop: this issue prevents both 'zpool remove' and 'zstreamdump' for working on some systems. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8789
* Disable parallel processing for 'zfs mount -l'Tom Caputi2019-05-251-2/+5
| | | | | | | | | | | | | | Currently, 'zfs mount -a' will always attempt to parallelize work related to mounting as best it can. Unfortunately, when the user passes the '-l' option to load keys, this causes all threads to prompt the user for their keys at once, resulting in a confusing and racy user experience. This patch simply disables parallel mounting when using the '-l' flag. Reviewed by: Sebastien Roy <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8762 Closes #8811
* zpool: status -t is not documented in help messageloli10K2019-05-241-1/+1
| | | | | | | | | This commit adds the undocumented "-t" option to zpool(8) help message. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8782
* zfs: missing newline character in zfs_do_channel_program() error messageloli10K2019-05-241-1/+2
| | | | | | | | | | | | This commit simply adds a missing newline ("\n") character to the error message printed by the zfs command when the provided pool parameter can't be found. Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8783
* zpool: trim -p is not a valid optionloli10K2019-05-241-1/+2
| | | | | | | | | | This commit removes the documented but not handled "-p" option from zpool(8) help message. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8781
* Fix dataset name comparison in zfs_compare()Alexander Motin2019-05-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | The code never returned match comparing two datasets (not snapshots). As result, uu_avl_find(), called from zfs_callback(), never succeeded, allowing to add same dataset into the list multiple times, for example: # zfs get name pers pers pers@z pers@z NAME PROPERTY VALUE SOURCE pers name pers - pers name pers - pers@z name pers@z - With the patch: # zfs get name pers pers pers@z pers@z NAME PROPERTY VALUE SOURCE pers name pers - pers@z name pers@z - Reviewed by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #8723
* Fix typesetting of Errata #4JMoVS2019-05-081-23/+22
| | | | | | | | Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Justin Scholz <[email protected]> Closes #8712 Closes #8721
* Clearer wording on Errata #4JMoVS2019-05-021-21/+26
| | | | | | | | | | | | | | Users of existing pools, especially pools with top-level encrypted datasets, could run into trouble trying to work around Errata #4. Clarify that removing encrypted snapshots and bookmarks is enough to clear the errata. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Justin Scholz <[email protected]> Closes #8682 Closes #8683
* Fix estimated scrub completion timeTom Caputi2019-05-011-2/+3
| | | | | | | | | | | | | | | | | | | Currently, it is possible for the 'zpool scrub' command to progress slightly beyond 100% due to concurrent changes happening on the live pool. This behavior is expected, but the userspace code for 'zpool status' would subtract the expected amount of data from the amount of data already scrubbed, resulting in a negative integer being casted to a large positive one. This number was then used to calculate the estimated completion time, resulting in wildly wrong results. This code changes the behavior so that 'zpool status' does not attempt to report an estimate during this period. Reviewed by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8611 Closes #8687
* Use sigaction(2) instead of sigignore(3) for portabilityTomohiro Kusumi2019-04-301-2/+12
| | | | | | | | | | | | | | | | sigignore(3) isn't portable. This code fails to compile on platforms without sigignore(3). Use sigaction(2). -- zfs_main.c: In function 'zfs_do_diff': zfs_main.c:7178:9: error: implicit declaration of function 'sigignore' [-Werror=implicit-function-declaration] (void) sigignore(SIGPIPE); ^~~~~~~~~ Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #8593
* Add option [-V|--version] to emit version stringTerraTech2019-04-162-1/+50
| | | | | | | | | | | | | | | | | | | | | Add the 'zfs version' and 'zpool version' subcommands to display the version of the user space utilities and loaded zfs kernel module. For example: $ zfs version zfs-0.8.0-rc3_169_g67e0366b88 zfs-kmod-0.8.0-rc3_169_g67e0366b88 The '-V' and '--version' aliases were added to support the common convention of using 'zfs --version` to obtain the version information. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: TerraTech <[email protected]> Closes #2501 Closes #8567
* Cleanup nits from ab7615d92Tom Caputi2019-04-141-1/+1
| | | | | | | | | This patch simply up cleans up a nit and corrects an error message issue that were introduced in the Multiple DVA scrub patch. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8619
* Fix 'zfs list -t snapshot' depthBrian Behlendorf2019-04-081-2/+2
| | | | | | | | | | | | | | Commit df583073 introduced the ability to list the snapshots for a specified dataset. This change inadvertently resulted in only the top- level snapshots being listed when no dataset was specified. Fix this issue by adding an additional check to determine if a dataset was provided to avoid incorrectly restricting the depth. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8591 Closes #8594
* Restrict kstats and print real pointersSara Hartse2019-04-041-1/+1
| | | | | | | | | | | | | | | There are several places where we use zfs_dbgmsg and %p to print pointers. In the Linux kernel, these values obfuscated to prevent information leaks which means the pointers aren't very useful for debugging crash dumps. We decided to restrict the permissions of dbgmsg (and some other kstats while we were at it) and print pointers with %px in zfs_dbgmsg as well as spl_dumpstack Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Gallagher <[email protected]> Signed-off-by: sara hartse <[email protected]> Closes #8467 Closes #8476
* Do not iterate through filesystems unnecessarilyTom Caputi2019-04-012-5/+40
| | | | | | | | | | | | | | | | | | | | | | | | Currently, when attempting to list snapshots ZFS may do a lot of extra work checking child datasets. This is because the code does not realize that it will not be able to reach any snapshots contained within snapshots that are at the depth limit since the snapshots of those datasets are counted as an additional layer deeper. This patch corrects this issue. In addition, this patch adds the ability to do perform the commands: $ zfs list -t snapshot <dataset> $ zfs get -t snapshot <prop> <dataset> as a convenient way to list out properties of all snapshots of a given dataset without having to use the depth limit. Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8539
* Add TRIM supportBrian Behlendorf2019-03-292-72/+363
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Contributions-by: Saso Kiselkov <[email protected]> Contributions-by: Tim Chase <[email protected]> Contributions-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8419 Closes #598
* MMP interval and fail_intervals in uberblockOlaf Faaland2019-03-211-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When Multihost is enabled, and a pool is imported, uberblock writes include ub_mmp_delay to allow an importing node to calculate the duration of an activity test. This value, is not enough information. If zfs_multihost_fail_intervals > 0 on the node with the pool imported, the safe minimum duration of the activity test is well defined, but does not depend on ub_mmp_delay: zfs_multihost_fail_intervals * zfs_multihost_interval and if zfs_multihost_fail_intervals == 0 on that node, there is no such well defined safe duration, but the importing host cannot tell whether mmp_delay is high due to I/O delays, or due to a very large zfs_multihost_interval setting on the host which last imported the pool. As a result, it may use a far longer period for the activity test than is necessary. This patch renames ub_mmp_sequence to ub_mmp_config and uses it to record the zfs_multihost_interval and zfs_multihost_fail_intervals values, as well as the mmp sequence. This allows a shorter activity test duration to be calculated by the importing host in most situations. These values are also added to the multihost_history kstat records. It calculates the activity test duration differently depending on whether the new fields are present or not; for importing pools with only ub_mmp_delay, it uses (zfs_multihost_interval + ub_mmp_delay) * zfs_multihost_import_intervals Which results in an activity test duration less sensitive to the leaf count. In addition, it makes a few other improvements: * It updates the "sequence" part of ub_mmp_config when MMP writes in between syncs occur. This allows an importing host to detect MMP on the remote host sooner, when the pool is idle, as it is not limited to the granularity of ub_timestamp (1 second). * It issues writes immediately when zfs_multihost_interval is changed so remote hosts see the updated value as soon as possible. * It fixes a bug where setting zfs_multihost_fail_intervals = 1 results in immediate pool suspension. * Update tests to verify activity check duration is based on recorded tunable values, not tunable values on importing host. * Update tests to verify the expected number of uberblocks have valid MMP fields - fail_intervals, mmp_interval, mmp_seq (sequence number), that sequence number is incrementing, and that uberblock values match tunable settings. Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7842
* Improve `zpool labelclear`Brian Behlendorf2019-03-211-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) As implemented the `zpool labelclear` command overwrites the calculated offsets of all four vdev labels even when only a single valid label is found. If the device as been re-purposed but still contains a valid label this can result in space no longer owned by ZFS being zeroed. Prevent this by verifying every label removed is intact before it's overwritten. 2) Address a small bug in zpool_do_labelclear() which prevented labelclear from working on file vdevs. Only block devices support BLKFLSBUF, try the ioctl() but when it's reported as unsupported this should not be fatal. 3) Fix `zpool labelclear` so it can be run on vdevs which were removed from the pool with `zpool remove`. Additionally, allow intact but partial labels to be cleared as in the case of a failed `zpool attach` or `zpool replace`. 4) Remove LABELCLEAR and LABELREAD variables for test cases. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8500 Closes #8373 Closes #6261
* Multiple DVA Scrubbing FixTom Caputi2019-03-151-14/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there is an issue in the sequential scrub code which prevents self healing from working in some cases. The scrub code will split up all DVA copies of a bp and issue each of them separately. The problem is that, since each of the DVAs is no longer associated with the others, the self healing code doesn't have the opportunity to repair problems that show up in one of the DVAs with the data from the others. This patch fixes this issue by ensuring that all IOs issued by the sequential scrub code include all DVAs. Initially, only the first DVA of each is attempted. If an issue arises, the IO is retried with all available copies, giving the self healing code a chance to correct the issue. To test this change, this patch also adds the ability for zinject to specify individual DVAs to inject read errors into. We then add a new test case that utilizes this functionality to ensure scrubs and self-healing reads can handle and transparently fix issues with individual copies of blocks. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8453
* Better user experience for errata 4Tom Caputi2019-03-141-11/+15
| | | | | | | | | | | | | | | | | | | | This patch attempts to address some user concerns that have arisen since errata 4 was introduced. * The errata warning has been made less scary for users without any encrypted datasets. * The errata warning now clears itself without a pool reimport if the bookmark_v2 feature is enabled and no encrypted datasets exist. * It is no longer possible to create new encrypted datasets without enabling the bookmark_v2 feature, thus helping to ensure that the errata is resolved. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Issue ##8308 Closes #8504
* Update commented zed.rc values to defaultskpande2019-03-141-3/+4
| | | | | | | | | Update zed.rc values reflect their default value. This helps avoid confusion if a user expects functionality to be enabled. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Kash Pande <[email protected]> Closes #8498
* Make zstreamdump -v more greppableTom Caputi2019-03-131-8/+8
| | | | | | | | | | | | | | | | Currently, the verbose output of zstreamdump includes new line characters within some individual records. Presumably, this was originally done to keep the output from getting too wide to fit on a terminal. However, since new flags and struct members have been added, these rules have not been maintained consistently. In addition, these newlines can make it hard to grep the output in some scenarios. This patch simply removes these newlines, making the output easier to grep and removing the inconsistency. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed by: Allan Jude <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8493
* Detect and prevent mixed raw and non-raw sendsTom Caputi2019-03-131-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there is an issue in the raw receive code where raw receives are allowed to happen on top of previously non-raw received datasets. This is a problem because the source-side dataset doesn't know about how the blocks on the destination were encrypted. As a result, any MAC in the objset's checksum-of-MACs tree that is a parent of both blocks encrypted on the source and blocks encrypted by the destination will be incorrect. This will result in authentication errors when we decrypt the dataset. This patch fixes this issue by adding a new check to the raw receive code. The code now maintains an "IVset guid", which acts as an identifier for the set of IVs used to encrypt a given snapshot. When a snapshot is raw received, the destination snapshot will take this value from the DRR_BEGIN payload. Non-raw receives and normal "zfs snap" operations will cause ZFS to generate a new IVset guid. When a raw incremental stream is received, ZFS will check that the "from" IVset guid in the stream matches that of the "from" destination snapshot. If they do not match, the code will error out the receive, preventing the problem. This patch requires an on-disk format change to add the IVset guids to snapshots and bookmarks. As a result, this patch has errata handling and a tunable to help affected users resolve the issue with as little interruption as possible. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8308
* Fix typo in arc_summary3jwittlincohen2019-03-131-1/+1
| | | | | | | | | | This is a simple fix for a typo ("perfetch" rather than "prefetch") in arc_summary3. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Jason Cohen <[email protected]> Closes #8499
* Avoid retrieving unused snapshot propsAlek P2019-03-122-8/+35
| | | | | | | | | | | | | | This patch modifies the zfs_ioc_snapshot_list_next() ioctl to enable it to take input parameters that alter the way looping through the list of snapshots is performed. The idea here is to restrict functions that throw away some of the snapshots returned by the ioctl to a range of snapshots that these functions actually use. This improves efficiency and execution speed for some rollback and send operations. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> Closes #8077
* Do not resume a pool if multihost is enabledOlaf Faaland2019-02-281-0/+7
| | | | | | | | | | | When multihost is enabled, and a pool is suspended, return EINVAL in response to "zpool clear <pool>". The pool may have been imported on another host while I/O was suspended. Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6933 Closes #8460
* zstreamdump: include embedded writes when dumping raw data (-d)Allan Jude2019-02-271-0/+4
| | | | | | | | | When feeding a replication stream to `zstreamdump -d` (raw dump mode), it does not print the raw data for DRR_WRITE_EMBEDDED records. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #8430
* zfs.8 has wrong description of "zfs program -t"Matthew Ahrens2019-02-261-3/+3
| | | | | | | | | | | | | | The "-t" argument to "zfs program" specifies a limit on the number of LUA instructions that can be executed. The zfs.8 manpage has the wrong description. It should be updated to match what's in zfs-program.8 Also fix the formatting of the zfs help message. Reviewed by: Allan Jude <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8410
* zfs should optionally send holdsPaul Zuchowski2019-02-151-6/+13
| | | | | | | | | | | | | Add -h switch to zfs send command to send dataset holds. If holds are present in the stream, zfs receive will create them on the target dataset, unless the zfs receive -h option is used to skip receive of holds. Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Signed-off-by: Paul Zuchowski <[email protected]> Closes #7513
* zdb: replace label_t to zdb_label_t for reduce collisionsIgor K2019-02-131-8/+8
| | | | | | | | | | | | | with builds on illumos based platform we can see build issue because label_t has been redefined. for reduce build issues on others platforms we should rename label_t to zdb_label_t. Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Igor Kozhukhov <[email protected]> Closes #8397
* Get rid of space_map_update() for ms_synced_lengthSerapheim Dimitropoulos2019-02-121-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initially, metaslabs and space maps used to be the same thing in ZFS. Later, we started differentiating them by referring to the space map as the on-disk state of the metaslab, making the metaslab a higher-level concept that is metadata that deals with space accounting. Today we've managed to split that code furthermore, with the space map being its own on-disk data structure used in areas of ZFS besides metaslabs (e.g. the vdev-wide space maps used for zpool checkpoint or vdev removal features). This patch refactors the space map code to further split the space map code from the metaslab code. It does so by getting rid of the idea that the space map can have a different in-core and on-disk length (sm_length vs smp_length) which is something that is only used for the metaslab code, and other consumers of space maps just have to deal with. Instead, this patch introduces changes that move the old in-core length of the metaslab's space map to the metaslab structure itself (see ms_synced_length field) while making the space map code only care about the actual space map's length on-disk. The result of this is that space map consumers no longer have to deal with syncing two different lengths for the same structure (e.g. space_map_update() goes away) while metaslab specific behavior stays within the metaslab code. Specifically, the ms_synced_length field keeps track of the amount of data metaslab_load() can read from the metaslab's space map while working concurrently with metaslab_sync() that may be appending to that same space map. As a side note, the patch also adds a few comments around the metaslab code documenting some assumptions and expected behavior. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8328
* shellcheck passbunder20152019-02-043-5/+5
| | | | | | | | | | note: which is non-standard. Use builtin 'command -v' instead. [SC2230] note: Use -n instead of ! -z. [SC2236] Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: bunder2015 <[email protected]> Closes #8367
* Fix zpool iostat -w header namesTony Hutter2019-01-311-2/+2
| | | | | | | | | | | The zpool iostat latency histograms (-w) has column names 'sync_queue' and 'async_queue', which do not match the man page, nor the equivalent columns in average latency. Change the column names to be 'syncq_wait' and 'asyncq_wait' to be consistent. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #8338
* zdb -L should skip leak detection altogetherSerapheim Dimitropoulos2019-01-301-119/+132
| | | | | | | | | | | | | | | | Currently the point of -L option in zdb is to disable leak tracing and the loading of space maps because they are expensive, yet still do leak detection in terms of space. Unfortunately, there is a scenario where this is a lie. If we are using zdb -L on a pool where a vdev is being removed, zdb_claim_removing() will open the metaslab space maps of that device. This patch makes it so zdb -L skips leak detection altogether and ensures that no space maps are loaded. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8335
* GCC 9.0: Fix ztest "directive argument is not a nul-terminated string"Tony Hutter2019-01-281-2/+2
| | | | | | | | | | | | | GCC 9.0 is complaining because we're trying to print strings that are defined like this: .zo_pool = { 'z', 't', 'e', 's', 't', '\0' }, Fix them by making them actual strings. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #8330
* Rename range_tree_verify to range_tree_verify_not_presentSerapheim Dimitropoulos2019-01-251-2/+3
| | | | | | | | | | | | The range_tree_verify function looks for a segment in a range tree and panics if the segment is present on the tree. This patch gives the function a more descriptive name. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8327
* zfs userspace dumps core when used on ZVOLsloli10K2019-01-251-2/+12
| | | | | | | | | | | If you try to get the userspace, groupspace or projectspace on a ZVOL, the generated error results in passing EINVAL to zfs_standard_error_fmt() when we should return a specific error to inform the user that those properties aren't available on volumes. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Tom Caputi <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8279
* zpool iostat should print headers when terminal fillsDamian Wojsław2019-01-231-4/+32
| | | | | | | | | | | | | | | | When `zpool iostat` fills the terminal the headers should be printed again. `zpool iostat -n` can be used to suppress this. If the command is not attached to a tty, headers will not be printed so as to not break existing scripts. Reviewed-by: Joshua M. Clulow <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Damian Wojsław <[email protected]> Closes #8235 Closes #8262
* Factor metaslab_load_wait() in metaslab_load()Serapheim Dimitropoulos2019-01-181-5/+2
| | | | | | | | | | | | | | Most callers that need to operate on a loaded metaslab, always call metaslab_load_wait() before loading the metaslab just in case someone else is already doing the work. Factoring metaslab_load_wait() within metaslab_load() makes the later more robust, as callers won't have to do the load-wait check explicitly every time they need to load a metaslab. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8290
* ztest: scrub verificationBrian Behlendorf2019-01-181-6/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | By design ztest will never inject non-repairable damage in to the pool. Update the ztest_scrub() test case such that it waits for the scrub to complete and verifies the pool is always repairable. After enabling scrub verification two scenarios were encountered which are the result of how ztest manages failure injection. The first case is straight forward and pertains to detaching a mirror vdev. In this case, the pool must always be scrubbed prior the detach. Failure to do so can potentially lock in previously repairable data corruption by removing all good copies of a block leaving only damaged ones. The second is a little more subtle. The child/offset selection logic in ztest_fault_inject() depends on the calculated number of leaves always remaining constant between injection passes. This is true within a single execution of ztest, but when using zloop.sh random values are selected for each restart. Therefore, when ztest imports an existing pool it must be scrubbed before failure injection can be safely enabled. Otherwise it is possible that it will inject non-repairable damage. Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8269
* Fix error handling incallers of dbuf_hold_level()Tom Caputi2019-01-171-6/+6
| | | | | | | | | | | | Currently, the functions dbuf_prefetch_indirect_done() and dmu_assign_arcbuf_by_dnode() assume that dbuf_hold_level() cannot fail. In the event of an error the former will cause a NULL pointer dereference and the later will trigger a VERIFY. This patch adds error handling to these functions and their callers where necessary. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8291
* ztest: scrub ddt repairBrian Behlendorf2019-01-171-16/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ztest_ddt_repair() test is designed inflict damage to the ddt which can be repairable by a scrub. Unfortunately, this repair logic was broken at some point and it went undetected. This issue is not specific to ztest, but thankfully this extra redundancy is rarely enabled and even more rarely needed. The root cause was identified to be the ddt_bp_create() function called by dsl_scan_ddt_entry() which did not set the dedup bit of the generated block pointer. The consequence of this was that the ZIO_DDT_READ_PIPELINE was never enabled for the block pointer during the scrub, and the dedup ditto repair logic was never run. Note that for demand reads which don't rely on ddt_bp_create() the required pipeline stages would be enabled and the repair performed. This was resolved by unconditionally setting the dedup bit in ddt_bp_create(). This way all codes paths which may need to perform a repair from a block pointer generated from the dtt entry will be able too. The only exception is that the dedup bit is cleared in ddt_phys_free() which is required to avoid leaking space. Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8270
* ztest: split block reconstructionBrian Behlendorf2019-01-162-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Increase the default allowed number of reconstruction attempts. There's not an exact right number for this setting. It needs to be set large enough to cover any realistic failure scenarios and small enough to avoid stalling the IO pipeline and invoking the dead man detection. The current value of 256 was empirically determined to be too low based on multi-day runs of ztest. The fault injection code would inject more damage than could be reconstructed given the relatively small number of attempts. However, in all observed cases the block could be reconstructed using a slightly higher limit. Based on local testing increasing the default value to 4096 was determined to strike the best balance. Checking all combinations takes less than 10s in the worst case, and has so far eliminated the vast majority of false positives detected by ztest. This delay is roughly on par with how long retries may be performed to a misbehaving HDD and was deemed to be reasonable. Better to err on the side of a brief delay rather than fail to reconstruct the data. Lastly, the -Y flag has been added to zdb to make it easy to try all possible combinations when performing split block reconstruction. For badly damaged blocks with 18 splits, they can be fully enumerated within a few minutes. This has been done to ensure permanent errors are never incorrectly reported when ztest verifies the pool with zdb. Reviewed by: Tom Caputi <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8271
* Disable 'zfs remap' commandBrian Behlendorf2019-01-152-24/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The implementation of 'zfs remap' has proven to be problematic since it modifies the objset (but not its logical contents) by dirtying metadata without owning it. The consequence of which is that dmu_objset_remap_indirects() is vulnerable to certain races. For example, if we are in the middle of receiving into the filesystem while it is being remapped. Then it is possible we could evict the objset when the receive completes (see dsl_dataset_clone_swap_sync_impl, or dmu_recv_end_sync), but dmu_objset_remap_indirects() may be still using the objset. The result of which would be a panic. Extended runs of ztest(8) have exposed other possible races which can occur when using 'zfs remap'. Several of these have been fixed but there may be others which have not yet been encountered and diagnosed. Furthermore, the ability to manually remap a filesystem is no longer particularly useful now that the removal code can map large chunks. Coupled with the fact that explaining what this command does and why it may be useful requires a detailed understanding of the internals of device removal. These are details users should not be bothered with. Therefore, the 'zfs remap' command is being disabled but not entirely removed. It may be removed in the future or potentially reworked to address the issues described above. Since 'zfs remap' has never been part of a tagged release its removal is expected to have minimal impact. The ZTS tests have been updated to continue to exercise the command to prevent atrophy, but it has been removed entirely from ztest(8). Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8238
* Minor spelling correctionsBrian Behlendorf2019-01-131-5/+5
| | | | | | | | | | Some minor spelling mistakes and typos. No functional changes. Reviewed-by: Neal Gompa <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: bunder2015 <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8272
* Add 'zpool status -i' optionBrian Behlendorf2019-01-071-38/+57
| | | | | | | | | | | | | | | | | | | | | Only display the full details of the vdev initialization state in 'zpool status' output when requested with the -i option. By default display '(initializing)' after vdevs when they are being actively initialized. This is consistent with the established precident of appending '(resilvering), etc' and fits within the default 80 column terminal width making it easy to read. Additionally, updated the 'zpool initialize' documentation to make it clear the options are mutually exclusive, but allow duplicate options like all other zfs/zpool commands. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Tim Chase <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8230
* zfs initialize performance enhancementsGeorge Wilson2019-01-071-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM ======== When invoking "zpool initialize" on a pool the command will create a thread to initialize each disk. Unfortunately, it does this serially across many transaction groups which can result in commands taking a long time to return to the user and may appear hung. The same thing is true when trying to suspend/cancel the operation. SOLUTION ========= This change refactors the way we invoke the initialize interface to ensure we can start or stop the intialization in just a few transaction groups. When stopping or cancelling a vdev initialization perform it in two phases. First signal each vdev initialization thread that it should exit, then after all threads have been signaled wait for them to exit. On a pool with 40 leaf vdevs this reduces the vdev initialize stop/cancel time from ~10 minutes to under a second. The reason for this is spa_vdev_initialize() no longer needs to wait on multiple full TXGs per leaf vdev being stopped. This commit additionally adds some missing checks for the passed "initialize_vdevs" input nvlist. The contents of the user provided input "initialize_vdevs" nvlist must be validated to ensure all values are uint64s. This is done in zfs_ioc_pool_initialize() in order to keep all of these checks in a single location. Updated the innvl and outnvl comments to match the formatting used for all other new sytle ioctls. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: George Wilson <[email protected]> Closes #8230
* OpenZFS 9102 - zfs should be able to initialize storage devicesGeorge Wilson2019-01-072-0/+248
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM ======== The first access to a block incurs a performance penalty on some platforms (e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are "thick provisioned", where supported by the platform (VMware). This can create a large delay in getting a new virtual machines up and running (or adding storage to an existing Engine). If the thick provision step is omitted, write performance will be suboptimal until all blocks on the LUN have been written. SOLUTION ========= This feature introduces a way to 'initialize' the disks at install or in the background to make sure we don't incur this first read penalty. When an entire LUN is added to ZFS, we make all space available immediately, and allow ZFS to find unallocated space and zero it out. This works with concurrent writes to arbitrary offsets, ensuring that we don't zero out something that has been (or is in the middle of being) written. This scheme can also be applied to existing pools (affecting only free regions on the vdev). Detailed design: - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...] - start, suspend, or cancel initialization - Creates new open-context thread for each vdev - Thread iterates through all metaslabs in this vdev - Each metaslab: - select a metaslab - load the metaslab - mark the metaslab as being zeroed - walk all free ranges within that metaslab and translate them to ranges on the leaf vdev - issue a "zeroing" I/O on the leaf vdev that corresponds to a free range on the metaslab we're working on - continue until all free ranges for this metaslab have been "zeroed" - reset/unmark the metaslab being zeroed - if more metaslabs exist, then repeat above tasks. - if no more metaslabs, then we're done. - progress for the initialization is stored on-disk in the vdev’s leaf zap object. The following information is stored: - the last offset that has been initialized - the state of the initialization process (i.e. active, suspended, or canceled) - the start time for the initialization - progress is reported via the zpool status command and shows information for each of the vdevs that are initializing Porting notes: - Added zfs_initialize_value module parameter to set the pattern written by "zpool initialize". - Added zfs_vdev_{initializing,removal}_{min,max}_active module options. Authored by: George Wilson <[email protected]> Reviewed by: John Wren Kennedy <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: loli10K <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Richard Lowe <[email protected]> Signed-off-by: Tim Chase <[email protected]> Ported-by: Tim Chase <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9102 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb Closes #8230