openzfs/zfs.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Detect and prevent mixed raw and non-raw sends	Tom Caputi	2019-03-13	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, there is an issue in the raw receive code where raw receives are allowed to happen on top of previously non-raw received datasets. This is a problem because the source-side dataset doesn't know about how the blocks on the destination were encrypted. As a result, any MAC in the objset's checksum-of-MACs tree that is a parent of both blocks encrypted on the source and blocks encrypted by the destination will be incorrect. This will result in authentication errors when we decrypt the dataset. This patch fixes this issue by adding a new check to the raw receive code. The code now maintains an "IVset guid", which acts as an identifier for the set of IVs used to encrypt a given snapshot. When a snapshot is raw received, the destination snapshot will take this value from the DRR_BEGIN payload. Non-raw receives and normal "zfs snap" operations will cause ZFS to generate a new IVset guid. When a raw incremental stream is received, ZFS will check that the "from" IVset guid in the stream matches that of the "from" destination snapshot. If they do not match, the code will error out the receive, preventing the problem. This patch requires an on-disk format change to add the IVset guids to snapshots and bookmarks. As a result, this patch has errata handling and a tunable to help affected users resolve the issue with as little interruption as possible. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8308
*	Do not resume a pool if multihost is enabled	Olaf Faaland	2019-02-28	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \|	When multihost is enabled, and a pool is suspended, return EINVAL in response to "zpool clear <pool>". The pool may have been imported on another host while I/O was suspended. Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6933 Closes #8460
*	Warn user about accidentally sharing devices	Olaf Faaland	2019-02-28	1	-5/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improve the man page text to warn the user about the risk of adding the same device to multiple pools via simultaneous "zpool create", "zpool add", "zpool replace", etc. State that MMP/multihost does not protect against these scenarios. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6473 Closes #8457
*	zfs.8 has wrong description of "zfs program -t"	Matthew Ahrens	2019-02-26	2	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "-t" argument to "zfs program" specifies a limit on the number of LUA instructions that can be executed. The zfs.8 manpage has the wrong description. It should be updated to match what's in zfs-program.8 Also fix the formatting of the zfs help message. Reviewed by: Allan Jude <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8410
*	zfs(8): improve document of compression behaviours	DeHackEd	2019-02-25	1	-0/+13
\| \| \| \| \| \| \| \| \|	Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed by: Allan Jude <[email protected]> Reviewed-by: bunder2015 <[email protected]> Signed-off-by: DHE <[email protected]> Closes #4660 Closes #8423
*	Clarify zpool iostat statistics reporting	kpande	2019-02-21	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \|	Document expected behavior for zpool iostat statistics reporting. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Allan Jude <[email protected]> Signed-off-by: Kash Pande <[email protected]> Closes #2888 Closes #8417
*	Fix '-T u\|d' descriptions in zpool(8)	Anatoly Borodin	2019-02-21	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In -T u\|d Display a time stamp. Specify -u for a printed representation of the internal representation of time. See time(2). Specify -d for standard date format. See date(1). 'Specify u' and 'Specify d' should be used instead. `zpool list -T -u` does not work. Bring the descriptions in `zpool list` and `zpool status` in sync with `zpool iostat`. Reviewed by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Anatoly Borodin <[email protected]> Closes #8438
*	zfs mount man page should document legacy behaviour	kpande	2019-02-19	1	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Document legacy mount behavior. Reviewed by: Allan Jude <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: bunder2015 <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Kash Pande <[email protected]> Closes #2900 Closes #8414
*	zfs should optionally send holds	Paul Zuchowski	2019-02-15	1	-6/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add -h switch to zfs send command to send dataset holds. If holds are present in the stream, zfs receive will create them on the target dataset, unless the zfs receive -h option is used to skip receive of holds. Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Signed-off-by: Paul Zuchowski <[email protected]> Closes #7513
*	Pool allocation classes misplacing small file blocks	loli10K	2019-02-08	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to an off-by-one condition in spa_preferred_class() we are picking the "normal" allocation class instead of the "special" one for file blocks with size equal to the special_small_blocks property value. This change fix the small code issue, update the ZFS Test Suite and the zfs(8) man page. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8351 Closes #8361
*	zstreamdump: -d option is not documented in manpage	loli10K	2019-02-04	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \|	This change simply documents the missing -d (dump contents) option in zstreamdump(8). Reviewed-by: bunder2015 <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8369
*	zdb -L should skip leak detection altogether	Serapheim Dimitropoulos	2019-01-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the point of -L option in zdb is to disable leak tracing and the loading of space maps because they are expensive, yet still do leak detection in terms of space. Unfortunately, there is a scenario where this is a lie. If we are using zdb -L on a pool where a vdev is being removed, zdb_claim_removing() will open the metaslab space maps of that device. This patch makes it so zdb -L skips leak detection altogether and ensures that no space maps are loaded. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8335
*	zpool iostat should print headers when terminal fills	Damian Wojsław	2019-01-23	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When `zpool iostat` fills the terminal the headers should be printed again. `zpool iostat -n` can be used to suppress this. If the command is not attached to a tty, headers will not be printed so as to not break existing scripts. Reviewed-by: Joshua M. Clulow <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Damian Wojsław <[email protected]> Closes #8235 Closes #8262
*	ztest: split block reconstruction	Brian Behlendorf	2019-01-16	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Increase the default allowed number of reconstruction attempts. There's not an exact right number for this setting. It needs to be set large enough to cover any realistic failure scenarios and small enough to avoid stalling the IO pipeline and invoking the dead man detection. The current value of 256 was empirically determined to be too low based on multi-day runs of ztest. The fault injection code would inject more damage than could be reconstructed given the relatively small number of attempts. However, in all observed cases the block could be reconstructed using a slightly higher limit. Based on local testing increasing the default value to 4096 was determined to strike the best balance. Checking all combinations takes less than 10s in the worst case, and has so far eliminated the vast majority of false positives detected by ztest. This delay is roughly on par with how long retries may be performed to a misbehaving HDD and was deemed to be reasonable. Better to err on the side of a brief delay rather than fail to reconstruct the data. Lastly, the -Y flag has been added to zdb to make it easy to try all possible combinations when performing split block reconstruction. For badly damaged blocks with 18 splits, they can be fully enumerated within a few minutes. This has been done to ensure permanent errors are never incorrectly reported when ztest verifies the pool with zdb. Reviewed by: Tom Caputi <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8271
*	Disable 'zfs remap' command	Brian Behlendorf	2019-01-15	1	-13/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The implementation of 'zfs remap' has proven to be problematic since it modifies the objset (but not its logical contents) by dirtying metadata without owning it. The consequence of which is that dmu_objset_remap_indirects() is vulnerable to certain races. For example, if we are in the middle of receiving into the filesystem while it is being remapped. Then it is possible we could evict the objset when the receive completes (see dsl_dataset_clone_swap_sync_impl, or dmu_recv_end_sync), but dmu_objset_remap_indirects() may be still using the objset. The result of which would be a panic. Extended runs of ztest(8) have exposed other possible races which can occur when using 'zfs remap'. Several of these have been fixed but there may be others which have not yet been encountered and diagnosed. Furthermore, the ability to manually remap a filesystem is no longer particularly useful now that the removal code can map large chunks. Coupled with the fact that explaining what this command does and why it may be useful requires a detailed understanding of the internals of device removal. These are details users should not be bothered with. Therefore, the 'zfs remap' command is being disabled but not entirely removed. It may be removed in the future or potentially reworked to address the issues described above. Since 'zfs remap' has never been part of a tagged release its removal is expected to have minimal impact. The ZTS tests have been updated to continue to exercise the command to prevent atrophy, but it has been removed entirely from ztest(8). Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8238
*	zfs.8 uses wrong snapshot names in Example 15	Benjamin Gentil	2019-01-07	1	-4/+4
\| \| \| \| \| \| \|	Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: bunder2015 <[email protected]> Signed-off-by: Benjamin Gentil <[email protected]> Closes #8241
*	Add 'zpool status -i' option	Brian Behlendorf	2019-01-07	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Only display the full details of the vdev initialization state in 'zpool status' output when requested with the -i option. By default display '(initializing)' after vdevs when they are being actively initialized. This is consistent with the established precident of appending '(resilvering), etc' and fits within the default 80 column terminal width making it easy to read. Additionally, updated the 'zpool initialize' documentation to make it clear the options are mutually exclusive, but allow duplicate options like all other zfs/zpool commands. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Tim Chase <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8230
*	OpenZFS 9102 - zfs should be able to initialize storage devices	George Wilson	2019-01-07	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PROBLEM ======== The first access to a block incurs a performance penalty on some platforms (e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are "thick provisioned", where supported by the platform (VMware). This can create a large delay in getting a new virtual machines up and running (or adding storage to an existing Engine). If the thick provision step is omitted, write performance will be suboptimal until all blocks on the LUN have been written. SOLUTION ========= This feature introduces a way to 'initialize' the disks at install or in the background to make sure we don't incur this first read penalty. When an entire LUN is added to ZFS, we make all space available immediately, and allow ZFS to find unallocated space and zero it out. This works with concurrent writes to arbitrary offsets, ensuring that we don't zero out something that has been (or is in the middle of being) written. This scheme can also be applied to existing pools (affecting only free regions on the vdev). Detailed design: - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...] - start, suspend, or cancel initialization - Creates new open-context thread for each vdev - Thread iterates through all metaslabs in this vdev - Each metaslab: - select a metaslab - load the metaslab - mark the metaslab as being zeroed - walk all free ranges within that metaslab and translate them to ranges on the leaf vdev - issue a "zeroing" I/O on the leaf vdev that corresponds to a free range on the metaslab we're working on - continue until all free ranges for this metaslab have been "zeroed" - reset/unmark the metaslab being zeroed - if more metaslabs exist, then repeat above tasks. - if no more metaslabs, then we're done. - progress for the initialization is stored on-disk in the vdev’s leaf zap object. The following information is stored: - the last offset that has been initialized - the state of the initialization process (i.e. active, suspended, or canceled) - the start time for the initialization - progress is reported via the zpool status command and shows information for each of the vdevs that are initializing Porting notes: - Added zfs_initialize_value module parameter to set the pattern written by "zpool initialize". - Added zfs_vdev_{initializing,removal}_{min,max}_active module options. Authored by: George Wilson <[email protected]> Reviewed by: John Wren Kennedy <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: loli10K <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Richard Lowe <[email protected]> Signed-off-by: Tim Chase <[email protected]> Ported-by: Tim Chase <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9102 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb Closes #8230
*	Minor tweaks to zfs.8 man page for POSIX ACLs	Richard Laager	2018-12-26	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Capitalize POSIX in POSIX ACLs. This change makes the POSIX in POSIX ACLs all caps, which is both correct and consistent with the rest of the man page. * Slightly reword part of zfs.8. I tweaked a sentence to add a missing comma, and as long as I was editing, removed a couple unnecessary words. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: bunder2015 <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8220
*	Detect IO errors during device removal	Brian Behlendorf	2018-12-04	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Detect IO errors during device removal While device removal cannot verify the checksums of individual blocks during device removal, it can reasonably detect hard IO errors from the leaf vdevs. Failure to perform this error checking can result in device removal completing successfully, but moving no data which will permanently corrupt the pool. Situation 1: faulted/degraded vdevs In the configuration shown below, the removal of mirror-0 will permanently corrupt the pool. Device removal will preferentially copy data from 'vdev1 -> vdev3' and from 'vdev2 -> vdev4'. Which in this case will result in nothing being copied since one vdev in each of those groups in unavailable. However, device removal will complete successfully since all IO errors are ignored. tank DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 /var/tmp/vdev1 FAULTED 0 0 0 external fault /var/tmp/vdev2 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 /var/tmp/vdev3 ONLINE 0 0 0 /var/tmp/vdev4 FAULTED 0 0 0 external fault This issue is resolved by updating the source child selection logic to exclude unreadable leaf vdevs. Additionally, unwritable destination child vdevs which can never succeed are skipped to prevent generating a large number of write IO errors. Situation 2: individual hard IO errors During removal if an unexpected hard IO error is encountered when either reading or writing the child vdev the entire removal operation is cancelled. While it may be possible to reconstruct the data after removal that cannot be guaranteed. The only strictly safe thing to do is to cancel the removal. As a future improvement we may want to instead suspend the removal process and allow the damaged region to be retried. But that work is left for another time, hard IO errors during the removal process are expected to be exceptionally rare. Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #6900 Closes #8161
*	man/zfs.8: document 'received' property source	Christian Schwarz	2018-11-20	1	-2/+3
\| \| \| \| \| \| \| \|	Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: Christian Schwarz <[email protected]> Closes #8134
*	Add zpool status -s (slow I/Os) and -p (parseable)	Tony Hutter	2018-11-08	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a new slow I/Os (-s) column to zpool status to show the number of VDEV slow I/Os. This is the number of I/Os that didn't complete in zio_slow_io_ms milliseconds. It also adds a new parsable (-p) flag to display exact values. NAME STATE READ WRITE CKSUM SLOW testpool ONLINE 0 0 0 - mirror-0 ONLINE 0 0 0 - loop0 ONLINE 0 0 0 20 loop1 ONLINE 0 0 0 0 Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #7756 Closes #6885
*	Defer new resilvers until the current one ends	Tom Caputi	2018-10-18	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, if a resilver is triggered for any reason while an existing one is running, zfs will immediately restart the existing resilver from the beginning to include the new drive. This causes problems for system administrators when a drive fails while another is already resilvering. In this case, the optimal thing to do to reduce risk of data loss is to wait for the current resilver to end before immediately replacing the second failed drive, which allows the system to operate with two incomplete drives for the minimum amount of time. This patch introduces the resilver_defer feature that essentially does this for the admin without forcing them to wait and monitor the resilver manually. The change requires an on-disk feature since we must mark drives that are part of a deferred resilver in the vdev config to ensure that we do not assume they are done resilvering when an existing resilver completes. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: @mmaybee Signed-off-by: Tom Caputi <[email protected]> Closes #7732
*	OpenZFS 9763 - zfs(1M): broken formatting in allow/unallow description	Yuri Pankov	2018-10-02	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Porting notes: * Two of the three changes from the upstream patch were already applied for Linux. Only the last one is required. Authored by: Yuri Pankov <[email protected]> Reviewed by: Robert Mustacchi <[email protected]> Reviewed-by: George Melikov <[email protected]> Approved by: Gordon Ross <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://illumos.org/issues/9763 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8a702e55 Closes #7972
*	Fix reference to zpool-features(5)	DeHackEd	2018-09-21	1	-1/+1
\| \| \| \| \| \| \| \|	Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: DHE <[email protected]> Closes #7938
*	Man page fixes - zpool/zfs optional parameters	Gregor Kopka	2018-09-18	2	-4/+4
\| \| \| \| \| \| \| \| \| \|	The man pages for zpool and zfs (get command) listed the pool/dataset parameter as required, but these are optional. Fixed that. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Gregor Kopka <[email protected]> Closes #7916
*	Clarify 'zpool remove' restrictions	Brian Behlendorf	2018-09-17	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \|	Update zpool(8) to clarify what type of vdevs may be safely removed and that the existence of any top-level raidz device which is part of the primary pool will prevent device removal. Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7880 Closes #7893
*	Fix 'zfs allow' for create time permissions	LOLi	2018-09-06	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When no permission set is defined for a dataset the create time permissions are incorrectly shown as if they were a permission set. This change simply correct how allow permissions are displayed. This commit also fixes a small manpage formatting issue and adds the "zfs_allow_003_pos" test case to the ZFS Test Suite. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7519 Closes #7860
*	Pool allocation classes	Don Brady	2018-09-05	2	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allocation Classes add the ability to have allocation classes in a pool that are dedicated to serving specific block categories, such as DDT data, metadata, and small file blocks. A pool can opt-in to this feature by adding a 'special' or 'dedup' top-level VDEV. Reviewed by: Pavel Zakharov <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Håkan Johansson <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: DHE <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Gregor Kopka <[email protected]> Reviewed-by: Kash Pande <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #5182
*	OpenZFS 9403 - assertion failed in arc_buf_destroy()	Tom Caputi	2018-08-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Assertion failed in arc_buf_destroy() when concurrently reading block with checksum error. Porting notes: * The ability to zinject decompression errors has been added, but this only works at the zio_decompress() level, where we have all of the info we need to match against the user's zinject options. * The decompress_fault test has been added to test the new zinject functionality * We attempted to set zio_decompress_fail_fraction to (1 << 18) in ztest for further test coverage. Although this did uncover a few low priority issues, this unfortuantely also causes ztest to ASSERT in many locations where the code is working correctly since it is designed to fail on IO errors. Developers can manually set this variable with the '-o' option to find and debug issues. Authored by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Matt Ahrens <[email protected]> Ported-by: Tom Caputi <[email protected]> OpenZFS-issue: https://illumos.org/issues/9403 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/fa98e487a9 Closes #7822
*	Introduce read/write kstats per dataset	Serapheim Dimitropoulos	2018-08-20	2	-1/+20
\| \| \| \| \| \| \| \| \| \| \| \| \|	The following patch introduces a few statistics on reads and writes grouped by dataset. These statistics are implemented as kstats (backed by aggregate sums for performance) and can be retrieved by using the dataset objset ID number. The motivation for this change is to provide some preliminary analytics on dataset usage/performance. Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #7705
*	'zfs holds' scripted mode is not documented	LOLi	2018-08-18	1	-2/+4
\| \| \| \| \| \| \| \| \| \|	This change simply documents the existing "scripted mode" option in both command help and man page. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7798
*	Added encryption support for zfs recv -o / -x	Tom Caputi	2018-08-15	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	One small integration that was absent from b52563 was support for zfs recv -o / -x with regards to encryption parameters. The main use cases of this are as follows: * Receiving an unencrypted stream as encrypted without needing to create a "dummy" encrypted parent so that encryption can be inheritted. * Allowing users to change their keylocation on receive, so long as the receiving dataset is an encryption root. * Allowing users to explicitly exclude or override the encryption property from an unencrypted properties stream, allowing it to be received as encrypted. * Receiving a recursive heirarchy of unencrypted datasets, encrypting the top-level one and forcing all children to inherit the encryption. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Elling <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7650
*	OpenZFS 8906 - uts: illumos rootfs should support salted cksum	Toomas Soome	2018-07-27	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Porting notes: * As of grub-2.02 these checksums are not supported. However, as pointed out in #6501 there are alternatives such as EFISTUB which work and have no such restriction. A warning was added to the checksum property section of the zfs.8 man page. Authored by: Toomas Soome <[email protected]> Reviewed by: C Fraire <[email protected]> Reviewed by: Robert Mustacchi <[email protected]> Reviewed by: Yuri Pankov <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://illumos.org/issues/8906 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7dec52f Closes #6501 Closes #7714
*	OpenZFS 9330 - stack overflow when creating a deeply nested dataset	Serapheim Dimitropoulos	2018-07-09	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Datasets that are deeply nested (~100 levels) are impractical. We just put a limit of 50 levels to newly created datasets. Existing datasets should work without a problem. The problem can be seen by attempting to create a dataset using the -p option with many levels: panic[cpu0]/thread=ffffff01cd282c20: BAD TRAP: type=8 (#df Double fault) rp=ffffffff fffffffffbc3aa60 unix:die+100 () fffffffffbc3ab70 unix:trap+157d () ffffff00083d7020 unix:_patch_xrstorq_rbx+196 () ffffff00083d7050 zfs:dbuf_rele+2e () ... ffffff00083d7080 zfs:dsl_dir_close+32 () ffffff00083d70b0 zfs:dsl_dir_evict+30 () ffffff00083d70d0 zfs:dbuf_evict_user+4a () ffffff00083d7100 zfs:dbuf_rele_and_unlock+87 () ffffff00083d7130 zfs:dbuf_rele+2e () ... The block above repeats once per directory in the ... ... create -p command, working towards the root ... ffffff00083db9f0 zfs:dsl_dataset_drop_ref+19 () ffffff00083dba20 zfs:dsl_dataset_rele+42 () ffffff00083dba70 zfs:dmu_objset_prefetch+e4 () ffffff00083dbaa0 zfs:findfunc+23 () ffffff00083dbb80 zfs:dmu_objset_find_spa+38c () ffffff00083dbbc0 zfs:dmu_objset_find+40 () ffffff00083dbc20 zfs:zfs_ioc_snapshot_list_next+4b () ffffff00083dbcc0 zfs:zfsdev_ioctl+347 () ffffff00083dbd00 genunix:cdev_ioctl+45 () ffffff00083dbd40 specfs:spec_ioctl+5a () ffffff00083dbdc0 genunix:fop_ioctl+7b () ffffff00083dbec0 genunix:ioctl+18e () ffffff00083dbf10 unix:brand_sys_sysenter+1c9 () Porting notes: * Added zfs_max_dataset_nesting module option with documentation. * Updated zfs_rename_014_neg.ksh for Linux. * Increase the zfs.sh stack warning to 15K. Enough time has passed that 16K can be reasonably assumed to be the default value. It was increased in the 3.15 kernel released in June of 2014. Authored by: Serapheim Dimitropoulos <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Ported-by: Brian Behlendorf <[email protected]> Approved by: Garrett D'Amore <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9330 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/757a75a Closes #7681
*	OpenZFS 9521 - Add checkpoint field	Eitan Adler	2018-06-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add checkpoint field in the default list of the zpool-list man page Authored by: Eitan Adler <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: kpande <[email protected]> Reviewed by: Tim Chase <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://illumos.org/issues/9521 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c5a860f7b Closes #7658
*	OpenZFS 9166 - zfs storage pool checkpoint	Serapheim Dimitropoulos	2018-06-26	2	-1/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Details about the motivation of this feature and its usage can be found in this blogpost: https://sdimitro.github.io/post/zpool-checkpoint/ A lightning talk of this feature can be found here: https://www.youtube.com/watch?v=fPQA8K40jAM Implementation details can be found in big block comment of spa_checkpoint.c Side-changes that are relevant to this commit but not explained elsewhere: * renames members of "struct metaslab trees to be shorter without losing meaning * space_map_{alloc,truncate}() accept a block size as a parameter. The reason is that in the current state all space maps that we allocate through the DMU use a global tunable (space_map_blksz) which defauls to 4KB. This is ok for metaslab space maps in terms of bandwirdth since they are scattered all over the disk. But for other space maps this default is probably not what we want. Examples are device removal's vdev_obsolete_sm or vdev_chedkpoint_sm from this review. Both of these have a 1:1 relationship with each vdev and could benefit from a bigger block size. Porting notes: * The part of dsl_scan_sync() which handles async destroys has been moved into the new dsl_process_async_destroys() function. * Remove "VERIFY(!(flags & FWRITE))" in "kernel.c" so zhack can write to block device backed pools. * ZTS: * Fix get_txg() in zpool_sync_001_pos due to "checkpoint_txg". * Don't use large dd block sizes on /dev/urandom under Linux in checkpoint_capacity. * Adopt Delphix-OS's setting of 4 (spa_asize_inflation = SPA_DVAS_PER_BP + 1) for the checkpoint_capacity test to speed its attempts to fill the pool * Create the base and nested pools with sync=disabled to speed up the "setup" phase. * Clear labels in test pool between checkpoint tests to avoid duplicate pool issues. * The import_rewind_device_replaced test has been marked as "known to fail" for the reasons listed in its DISCLAIMER. * New module parameters: zfs_spa_discard_memory_limit, zfs_remove_max_bytes_pause (not documented - debugging only) vdev_max_ms_count (formerly metaslabs_per_vdev) vdev_min_ms_count Authored by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Richard Lowe <[email protected]> Ported-by: Tim Chase <[email protected]> Signed-off-by: Tim Chase <[email protected]> OpenZFS-issue: https://illumos.org/issues/9166 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7159fdb8 Closes #7570
*	Tunable directory for zfs runtime scripts	Antonio Russo	2018-06-07	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	zpool and zed place scripts in subdirectories of libexecdir. Some distributions locate architecture independent scripts in other locations (e.g. Debian). To avoid these paths getting out of sync, centralize the definitions. Build zfs-test's default.cfg by Makefile. Use the new directory logic building tests/zfs-tests/include/default.cfg.in. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #7597
*	Minor documentation, logging, and testing typos	Antonio Russo	2018-06-07	1	-6/+5
\| \| \| \| \| \| \| \| \|	This patch collects some minor inconsistencies and typos in the documentation, logging and testing infrastructure. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #7608
*	Fix typoes in zpool man page	bunder2015	2018-06-04	1	-4/+9
\| \| \| \| \| \| \| \| \|	Fixed some highlighting in the zpool man page Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: bunder2015 <[email protected]> Closes #7596
*	Ignore *.o.ur-safe build artifacts	Brian Behlendorf	2018-05-13	1	-0/+1
\| \| \| \| \| \| \| \| \|	Generated when building on Ubuntu 18.04. Also ignore the new dynamically generated zfs-mount-generator.8 man page, and the module/.cache.mk file. Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7534
*	Add canonical mount options zfs-mount-generator	Antonio Russo	2018-05-11	2	-19/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lib/libzfs/libzfs_mount.c:zfs_add_options provides the canonical mount options used by a `zfs mount` command. Because we cannot call `zfs mount` directly from a systemd.mount unit, we mirror that logic in zfs-mount-generator. The zed script is updated to cache these properties as well. Include a mini-tutorial in the manual page, properly substitute configuration paths in zfs-mount-generator.8.in, and standardize the Makefile. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #7453
*	Add support for decryption faults in zinject	Tom Caputi	2018-05-02	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the ability for zinject to trigger decryption and authentication faults in the ZIO and ARC layers. This functionality is exposed via the new "decrypt" error type, which may be provided for "data" object types. This patch also refactors some of the core encryption / decryption functions so that they have consistent prototypes, handle errors consistently, and do not have unused arguments. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7474
*	Add back iostat -y or -w descriptions	George Melikov	2018-04-30	1	-2/+19
\| \| \| \| \| \| \| \| \| \| \| \|	The iostat -y and -w descriptions were left in cda0317e, get them back. Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #7479 Closes #7483
*	OpenZFS 7614, 9064 - zfs device evacuation/removal	Matthew Ahrens	2018-04-14	2	-12/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	OpenZFS 7614 - zfs device evacuation/removal OpenZFS 9064 - remove_mirror should wait for device removal to complete This project allows top-level vdevs to be removed from the storage pool with "zpool remove", reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now "indirect") vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become "obsolete" because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been "remapped" in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be "remapped" to their new (concrete) locations if possible. This process can be accelerated by using the "zfs remap" command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. At the moment, only mirrors and simple top-level vdevs can be removed and no removal is allowed if any of the top-level vdevs are raidz. Porting Notes: * Avoid zero-sized kmem_alloc() in vdev_compact_children(). The device evacuation code adds a dependency that vdev_compact_children() be able to properly empty the vdev_child array by setting it to NULL and zeroing vdev_children. Under Linux, kmem_alloc() and related functions return a sentinel pointer rather than NULL for zero-sized allocations. * Remove comment regarding "mpt" driver where zfs_remove_max_segment is initialized to SPA_MAXBLOCKSIZE. Change zfs_condense_indirect_commit_entry_delay_ticks to zfs_condense_indirect_commit_entry_delay_ms for consistency with most other tunables in which delays are specified in ms. * ZTS changes: Use set_tunable rather than mdb Use zpool sync as appropriate Use sync_pool instead of sync Kill jobs during test_removal_with_operation to allow unmount/export Don't add non-disk names such as "mirror" or "raidz" to $DISKS Use $TEST_BASE_DIR instead of /tmp Increase HZ from 100 to 1000 which is more common on Linux removal_multiple_indirection.ksh Reduce iterations in order to not time out on the code coverage builders. removal_resume_export: Functionally, the test case is correct but there exists a race where the kernel thread hasn't been fully started yet and is not visible. Wait for up to 1 second for the removal thread to be started before giving up on it. Also, increase the amount of data copied in order that the removal not finish before the export has a chance to fail. * MMP compatibility, the concept of concrete versus non-concrete devices has slightly changed the semantics of vdev_writeable(). Update mmp_random_leaf_impl() accordingly. * Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool feature which is not supported by OpenZFS. * Added support for new vdev removal tracepoints. * Test cases removal_with_zdb and removal_condense_export have been intentionally disabled. When run manually they pass as intended, but when running in the automated test environment they produce unreliable results on the latest Fedora release. They may work better once the upstream pool import refectoring is merged into ZoL at which point they will be re-enabled. Authored by: Matthew Ahrens <[email protected]> Reviewed-by: Alex Reece <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Reviewed by: Richard Laager <[email protected]> Reviewed by: Tim Chase <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Garrett D'Amore <[email protected]> Ported-by: Tim Chase <[email protected]> Signed-off-by: Tim Chase <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/7614 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb Closes #6900
*	OpenZFS 9286 - want refreservation=auto	Mike Gerdts	2018-04-11	1	-5/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Mike Gerdts <[email protected]> Reviewed by: Allan Jude <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Andy Stormont <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Approved by: Richard Lowe <[email protected]> Ported-by: Don Brady <[email protected]> Porting Notes: * Adopted destroy_dataset in ZTS test cleanup * Use ksh shebang instead of bash for new tests OpenZFS-issue: https://www.illumos.org/issues/9286 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/723d0c85 Closes #7387
*	zfs(8): fix dedup omission during mdoc overhaul	DeHackEd	2018-04-10	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \|	The property description has been updated with new algorithms as well. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: DHE <[email protected]> Closes #7377
*	systemd mount generator and tracking ZEDLET	Antonio Russo	2018-04-06	2	-0/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	zfs-mount-generator implements the "systemd generator" protocol, producing systemd.mount units from the cached outputs of zfs list, during early boot, integrating with systemd. Each pool has an indpendent cache of the command zfs list -H -oname,mountpoint,canmount -tfilesystem -r $pool which is kept synchronized by the ZEDLET history_event-zfs-list-cacher.sh Datasets not in the cache will be loaded later in the boot process by zfs-mount.service, including pools without a cache. Among other things, this allows for complex mount hierarchies. Reviewed-by: Fabian Grünbichler <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #7329
*	Clarify zpool actions for an intent log device	Peter Ashford	2018-03-22	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	Updated the "Intent Log" section of the "zpool" manual page to properly reflect the actions that may be performed. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Peter Ashford <[email protected]> Closes #6938 Closes #7318
*	Add JSON output support to channel programs	Alek P	2018-03-19	2	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The changes piggyback JSON output support on top of channel programs (#6558). This way the JSON output support is targeted to scripting use cases and is easily maintainable since it really only touches one function (zfs_do_channel_program()). This patch ports Joyent's JSON nvlist library from illumos to enable easy JSON printing of channel program output nvlist. To keep the delta small I also took advantage of the fact that printing in zfs_do_channel_program() was almost always done before exiting the program. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> Closes #7281