openzfs/zfs.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix zfs "redact" misc issues	loli10K	2019-07-05	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \|	* zfs redact error messages do not end with newline character * 30af21b0 inadvertently removed some ZFS_PROP comments * man/zfs: zfs redact <redaction_snapshot> is not optional Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8988
*	OpenZFS 9318 - vol_volsize_to_reservation does not account for raidz skip blocks	Mike Gerdts	2019-07-05	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a volume is created in a pool with raidz vdevs and volblocksize != 128k, the volume can reference more space than is reserved with the automatically calculated refreservation. There are two deficiencies in vol_volsize_to_reservation that contribute to this: 1) Skip blocks may be added to keep each allocation a multiple of parity + 1. This is the dominating factor when volblocksize is close to 2^ashift. 2) raidz deflation for 128 KB blocks is different for most other block sizes. See "The theory of raidz space accounting" comment in libzfs_dataset.c for a full explanation. Authored by: Mike Gerdts <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Sanjay Nadkarni <[email protected]> Reviewed by: Jerry Jelinek <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Kody Kantor <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Mike Gerdts <[email protected]> Porting Notes: * ZTS: wait for zvols to exist before writing * ZTS: use log_must_busy with {zpool\|zfs} destroy OpenZFS-issue: https://www.illumos.org/issues/9318 OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/b73ccab0 Closes #8973
*	Add 'zfs umount -u' for encrypted datasets	Tom Caputi	2019-06-28	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \|	This patch adds the ability for the user to unload keys for datasets as they are being unmounted. This is analogous to 'zfs mount -l'. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes: #8917 Closes: #8952
*	zdb -vvvvv on ztest pool dies with "out of memory"	Paul Dagnelie	2019-06-25	1	-6/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ztest creates some extremely large files as part of its operation. When zdb tries to dump a large enough file, it can run out of memory or spend an extremely long time attempting to print millions or billions of uint64_ts. We cap the amount of data from a uint64 object that we are willing to read and print. Reviewed-by: Don Brady <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> External-issue: DLPX-53814 Closes #8947
*	Redacted Send/Receive causes zdb to dump core	loli10K	2019-06-24	1	-1/+1
\| \| \| \| \| \| \| \| \|	When used with verbosity >= 4 zdb fails an assertion in dump_bookmarks() because it expects snprintf() to retun 0 on success. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8948
*	Remove code for zfs remap	Matthew Ahrens	2019-06-24	1	-78/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "zfs remap" command was disabled by 6e91a72fe3ff8bb282490773bd687632f3e8c79d, because it has little utility and introduced some tricky bugs. This commit removes the code for it, the associated ZFS_IOC_REMAP ioctl, and tests. Note that the ioctl and property will remain, but have no functionality. This allows older software to fail gracefully if it attempts to use these, and avoids a backwards incompatibility that would be introduced if we renumbered the later ioctls/props. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8944
*	Fix out-of-tree build failures	Brian Behlendorf	2019-06-24	2	-55/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Resolve the incorrect use of srcdir and builddir references for various files in the build system. These have crept in over time and went unnoticed because when building in the top level directory srcdir and builddir are identical. With this change it's again possible to build in a subdirectory. $ mkdir obj $ cd obj $ ../configure $ make Reviewed-by: loli10K <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8921 Closes #8943
*	Let zfs mount all tolerate in-progress mounts	Don Brady	2019-06-22	1	-1/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The zfs-mount service can unexpectedly fail to start when zfs encounters a mount that is in progress. This service uses zfs mount -a, which has a window between the time it checks if the dataset was mounted and when the actual mount (via mount.zfs binary) occurs. The reason for the racing mounts is that both zfs-mount.target and zfs-share.target are allowed to execute concurrently after the import. This is more of an issue with the relatively recent addition of parallel mounting, and we should consider serializing the mount and share targets. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed-by: Allan Jude <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #8881
*	zstreamdump: add per-record-type counters and an overhead counter	Allan Jude	2019-06-22	1	-22/+41
\| \| \| \| \| \| \| \| \| \| \| \| \|	Count the bytes of payload for each replication record type Count the bytes of overhead (replication records themselves) Include these counters in the output summary at the end of the run. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Allan Jude <[email protected]> Sponsored-By: Klara Systems and Catalogic Closes #8432
*	Redacted Send/Receive broke zfs(8) help message	loli10K	2019-06-21	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \|	Since 30af21b0 was merged 'zfs send' help message format is broken and lists "-r" as a valid option: this commit corrects these small issues. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8942
*	Remove dedupditto functionality	Matthew Ahrens	2019-06-19	1	-162/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If dedup is in use, the `dedupditto` property can be set, causing ZFS to keep an extra copy of data that is referenced many times (>100x). The idea was that this data is more important than other data and thus we want to be really sure that it is not lost if the disk experiences a small amount of random corruption. ZFS (and system administrators) rely on the pool-level redundancy to protect their data (e.g. mirroring or RAIDZ). Since the user/sysadmin doesn't have control over what data will be offered extra redundancy by dedupditto, this extra redundancy is not very useful. The bulk of the data is still vulnerable to loss based on the pool-level redundancy. For example, if particle strikes corrupt 0.1% of blocks, you will either be saved by mirror/raidz, or you will be sad. This is true even if dedupditto saved another 0.01% of blocks from being corrupted. Therefore, the dedupditto functionality is rarely enabled (i.e. the property is rarely set), and it fulfills its promise of increased redundancy even more rarely. Additionally, this feature does not work as advertised (on existing releases), because scrub/resilver did not repair the extra (dedupditto) copy (see https://github.com/zfsonlinux/zfs/pull/8270). In summary, this seldom-used feature doesn't work, and even if it did it wouldn't provide useful data protection. It has a non-trivial maintenance burden (again see https://github.com/zfsonlinux/zfs/pull/8270). We should remove the dedupditto functionality. For backwards compatibility with the existing CLI, "zpool set dedupditto" will still "succeed" (exit code zero), but won't have any effect. For backwards compatibility with existing pools that had dedupditto enabled at some point, the code will still be able to understand dedupditto blocks and free them when appropriate. However, ZFS won't write any new dedupditto blocks. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Issue #8270 Closes #8310
*	Fix memory leak in check_disk()	Michael Niewöhner	2019-06-19	1	-0/+1
\| \| \| \| \| \| \| \|	Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Richard Elling <[email protected]> Signed-off-by: Michael Niewöhner <[email protected]> Closes #8897 Closes #8911
*	Implement Redacted Send/Receive	Paul Dagnelie	2019-06-19	3	-58/+434
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Redacted send/receive allows users to send subsets of their data to a target system. One possible use case for this feature is to not transmit sensitive information to a data warehousing, test/dev, or analytics environment. Another is to save space by not replicating unimportant data within a given dataset, for example in backup tools like zrepl. Redacted send/receive is a three-stage process. First, a clone (or clones) is made of the snapshot to be sent to the target. In this clone (or clones), all unnecessary or unwanted data is removed or modified. This clone is then snapshotted to create the "redaction snapshot" (or snapshots). Second, the new zfs redact command is used to create a redaction bookmark. The redaction bookmark stores the list of blocks in a snapshot that were modified by the redaction snapshot(s). Finally, the redaction bookmark is passed as a parameter to zfs send. When sending to the snapshot that was redacted, the redaction bookmark is used to filter out blocks that contain sensitive or unwanted information, and those blocks are not included in the send stream. When sending from the redaction bookmark, the blocks it contains are considered as candidate blocks in addition to those blocks in the destination snapshot that were modified since the creation_txg of the redaction bookmark. This step is necessary to allow the target to rehydrate data in the case where some blocks are accidentally or unnecessarily modified in the redaction snapshot. The changes to bookmarks to enable fast space estimation involve adding deadlists to bookmarks. There is also logic to manage the life cycles of these deadlists. The new size estimation process operates in cases where previously an accurate estimate could not be provided. In those cases, a send is performed where no data blocks are read, reducing the runtime significantly and providing a byte-accurate size estimate. Reviewed-by: Dan Kimmel <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Prashanth Sreenivasa <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Chris Williamson <[email protected]> Reviewed-by: Pavel Zhakarov <[email protected]> Reviewed-by: Sebastien Roy <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #7958
*	make zil max block size tunable	Matthew Ahrens	2019-06-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've observed that on some highly fragmented pools, most metaslab allocations are small (~2-8KB), but there are some large, 128K allocations. The large allocations are for ZIL blocks. If there is a lot of fragmentation, the large allocations can be hard to satisfy. The most common impact of this is that we need to check (and thus load) lots of metaslabs from the ZIL allocation code path, causing sync writes to wait for metaslabs to load, which can take a second or more. In the worst case, we may not be able to satisfy the allocation, in which case the ZIL will resort to txg_wait_synced() to ensure the change is on disk. To provide a workaround for this, this change adds a tunable that can reduce the size of ZIL blocks. External-issue: DLPX-61719 Reviewed-by: George Wilson <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8865
*	arc_summary: prefer python3 version and install when there is no python	Eli Schwartz	2019-06-10	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This matches the behavior of other python scripts, such as arcstat and dbufstat, which are always installed but whose install-exec-hook actions will simply touch up the shebang if a python interpreter was configured and that interpreter is a python2 interpreter. Fixes installation in a minimal build chroot without python available. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Eli Schwartz <[email protected]> Closes #8851
*	Make Python detection optional and more portable	Ryan Moeller	2019-06-04	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, --without-python would cause ./configure to fail. Now it is able to proceed, and the Python scripts will not be built. Use portable parameter expansion matching instead of nonstandard substring matching to detect the Python version. This test is duplicated in several places, so define a function for it. Don't assume the full path to binaries, since different platforms do install things in different places. Use AC_CHECK_PROGS instead. When building without Python, also build without pyzfs. Sponsored by: iXsystems, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Eli Schwartz <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #8809 Closes #8731
*	grammar: it is / plural agreement	Josh Soref	2019-05-28	1	-2/+2
\| \| \| \| \| \| \|	Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Signed-off-by: Josh Soref <[email protected]> Closes #8818
*	Update comments to match code	Ryan Moeller	2019-05-28	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	s/get_vdev_spec/make_root_vdev The former doesn't exist anymore. Sponsored by: iXsystems, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #8759
*	Endless loop in zpool_do_remove() on platforms with unsigned char	loli10K	2019-05-28	3	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	On systems where "char" is an unsigned type the value returned by getopt() will never be negative (-1), leading to an endless loop: this issue prevents both 'zpool remove' and 'zstreamdump' for working on some systems. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8789
*	Disable parallel processing for 'zfs mount -l'	Tom Caputi	2019-05-25	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, 'zfs mount -a' will always attempt to parallelize work related to mounting as best it can. Unfortunately, when the user passes the '-l' option to load keys, this causes all threads to prompt the user for their keys at once, resulting in a confusing and racy user experience. This patch simply disables parallel mounting when using the '-l' flag. Reviewed by: Sebastien Roy <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8762 Closes #8811
*	zpool: status -t is not documented in help message	loli10K	2019-05-24	1	-1/+1
\| \| \| \| \| \| \| \| \|	This commit adds the undocumented "-t" option to zpool(8) help message. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8782
*	zfs: missing newline character in zfs_do_channel_program() error message	loli10K	2019-05-24	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This commit simply adds a missing newline ("\n") character to the error message printed by the zfs command when the provided pool parameter can't be found. Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8783
*	zpool: trim -p is not a valid option	loli10K	2019-05-24	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	This commit removes the documented but not handled "-p" option from zpool(8) help message. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8781
*	Fix dataset name comparison in zfs_compare()	Alexander Motin	2019-05-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code never returned match comparing two datasets (not snapshots). As result, uu_avl_find(), called from zfs_callback(), never succeeded, allowing to add same dataset into the list multiple times, for example: # zfs get name pers pers pers@z pers@z NAME PROPERTY VALUE SOURCE pers name pers - pers name pers - pers@z name pers@z - With the patch: # zfs get name pers pers pers@z pers@z NAME PROPERTY VALUE SOURCE pers name pers - pers@z name pers@z - Reviewed by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #8723
*	Fix typesetting of Errata #4	JMoVS	2019-05-08	1	-23/+22
\| \| \| \| \| \| \| \|	Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Justin Scholz <[email protected]> Closes #8712 Closes #8721
*	Clearer wording on Errata #4	JMoVS	2019-05-02	1	-21/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Users of existing pools, especially pools with top-level encrypted datasets, could run into trouble trying to work around Errata #4. Clarify that removing encrypted snapshots and bookmarks is enough to clear the errata. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Justin Scholz <[email protected]> Closes #8682 Closes #8683
*	Fix estimated scrub completion time	Tom Caputi	2019-05-01	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, it is possible for the 'zpool scrub' command to progress slightly beyond 100% due to concurrent changes happening on the live pool. This behavior is expected, but the userspace code for 'zpool status' would subtract the expected amount of data from the amount of data already scrubbed, resulting in a negative integer being casted to a large positive one. This number was then used to calculate the estimated completion time, resulting in wildly wrong results. This code changes the behavior so that 'zpool status' does not attempt to report an estimate during this period. Reviewed by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8611 Closes #8687
*	Use sigaction(2) instead of sigignore(3) for portability	Tomohiro Kusumi	2019-04-30	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sigignore(3) isn't portable. This code fails to compile on platforms without sigignore(3). Use sigaction(2). -- zfs_main.c: In function 'zfs_do_diff': zfs_main.c:7178:9: error: implicit declaration of function 'sigignore' [-Werror=implicit-function-declaration] (void) sigignore(SIGPIPE); ^~~~~~~~~ Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #8593
*	Add option [-V\|--version] to emit version string	TerraTech	2019-04-16	2	-1/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the 'zfs version' and 'zpool version' subcommands to display the version of the user space utilities and loaded zfs kernel module. For example: $ zfs version zfs-0.8.0-rc3_169_g67e0366b88 zfs-kmod-0.8.0-rc3_169_g67e0366b88 The '-V' and '--version' aliases were added to support the common convention of using 'zfs --version` to obtain the version information. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: TerraTech <[email protected]> Closes #2501 Closes #8567
*	Cleanup nits from ab7615d92	Tom Caputi	2019-04-14	1	-1/+1
\| \| \| \| \| \| \| \| \|	This patch simply up cleans up a nit and corrects an error message issue that were introduced in the Multiple DVA scrub patch. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8619
*	Fix 'zfs list -t snapshot' depth	Brian Behlendorf	2019-04-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit df583073 introduced the ability to list the snapshots for a specified dataset. This change inadvertently resulted in only the top- level snapshots being listed when no dataset was specified. Fix this issue by adding an additional check to determine if a dataset was provided to avoid incorrectly restricting the depth. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8591 Closes #8594
*	Restrict kstats and print real pointers	Sara Hartse	2019-04-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several places where we use zfs_dbgmsg and %p to print pointers. In the Linux kernel, these values obfuscated to prevent information leaks which means the pointers aren't very useful for debugging crash dumps. We decided to restrict the permissions of dbgmsg (and some other kstats while we were at it) and print pointers with %px in zfs_dbgmsg as well as spl_dumpstack Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Gallagher <[email protected]> Signed-off-by: sara hartse <[email protected]> Closes #8467 Closes #8476
*	Do not iterate through filesystems unnecessarily	Tom Caputi	2019-04-01	2	-5/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, when attempting to list snapshots ZFS may do a lot of extra work checking child datasets. This is because the code does not realize that it will not be able to reach any snapshots contained within snapshots that are at the depth limit since the snapshots of those datasets are counted as an additional layer deeper. This patch corrects this issue. In addition, this patch adds the ability to do perform the commands: $ zfs list -t snapshot <dataset> $ zfs get -t snapshot <prop> <dataset> as a convenient way to list out properties of all snapshots of a given dataset without having to use the depth limit. Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8539
*	Add TRIM support	Brian Behlendorf	2019-03-29	2	-72/+363
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Contributions-by: Saso Kiselkov <[email protected]> Contributions-by: Tim Chase <[email protected]> Contributions-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8419 Closes #598
*	MMP interval and fail_intervals in uberblock	Olaf Faaland	2019-03-21	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When Multihost is enabled, and a pool is imported, uberblock writes include ub_mmp_delay to allow an importing node to calculate the duration of an activity test. This value, is not enough information. If zfs_multihost_fail_intervals > 0 on the node with the pool imported, the safe minimum duration of the activity test is well defined, but does not depend on ub_mmp_delay: zfs_multihost_fail_intervals * zfs_multihost_interval and if zfs_multihost_fail_intervals == 0 on that node, there is no such well defined safe duration, but the importing host cannot tell whether mmp_delay is high due to I/O delays, or due to a very large zfs_multihost_interval setting on the host which last imported the pool. As a result, it may use a far longer period for the activity test than is necessary. This patch renames ub_mmp_sequence to ub_mmp_config and uses it to record the zfs_multihost_interval and zfs_multihost_fail_intervals values, as well as the mmp sequence. This allows a shorter activity test duration to be calculated by the importing host in most situations. These values are also added to the multihost_history kstat records. It calculates the activity test duration differently depending on whether the new fields are present or not; for importing pools with only ub_mmp_delay, it uses (zfs_multihost_interval + ub_mmp_delay) * zfs_multihost_import_intervals Which results in an activity test duration less sensitive to the leaf count. In addition, it makes a few other improvements: * It updates the "sequence" part of ub_mmp_config when MMP writes in between syncs occur. This allows an importing host to detect MMP on the remote host sooner, when the pool is idle, as it is not limited to the granularity of ub_timestamp (1 second). * It issues writes immediately when zfs_multihost_interval is changed so remote hosts see the updated value as soon as possible. * It fixes a bug where setting zfs_multihost_fail_intervals = 1 results in immediate pool suspension. * Update tests to verify activity check duration is based on recorded tunable values, not tunable values on importing host. * Update tests to verify the expected number of uberblocks have valid MMP fields - fail_intervals, mmp_interval, mmp_seq (sequence number), that sequence number is incrementing, and that uberblock values match tunable settings. Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7842
*	Improve `zpool labelclear`	Brian Behlendorf	2019-03-21	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1) As implemented the `zpool labelclear` command overwrites the calculated offsets of all four vdev labels even when only a single valid label is found. If the device as been re-purposed but still contains a valid label this can result in space no longer owned by ZFS being zeroed. Prevent this by verifying every label removed is intact before it's overwritten. 2) Address a small bug in zpool_do_labelclear() which prevented labelclear from working on file vdevs. Only block devices support BLKFLSBUF, try the ioctl() but when it's reported as unsupported this should not be fatal. 3) Fix `zpool labelclear` so it can be run on vdevs which were removed from the pool with `zpool remove`. Additionally, allow intact but partial labels to be cleared as in the case of a failed `zpool attach` or `zpool replace`. 4) Remove LABELCLEAR and LABELREAD variables for test cases. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8500 Closes #8373 Closes #6261
*	Multiple DVA Scrubbing Fix	Tom Caputi	2019-03-15	1	-14/+109
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, there is an issue in the sequential scrub code which prevents self healing from working in some cases. The scrub code will split up all DVA copies of a bp and issue each of them separately. The problem is that, since each of the DVAs is no longer associated with the others, the self healing code doesn't have the opportunity to repair problems that show up in one of the DVAs with the data from the others. This patch fixes this issue by ensuring that all IOs issued by the sequential scrub code include all DVAs. Initially, only the first DVA of each is attempted. If an issue arises, the IO is retried with all available copies, giving the self healing code a chance to correct the issue. To test this change, this patch also adds the ability for zinject to specify individual DVAs to inject read errors into. We then add a new test case that utilizes this functionality to ensure scrubs and self-healing reads can handle and transparently fix issues with individual copies of blocks. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8453
*	Better user experience for errata 4	Tom Caputi	2019-03-14	1	-11/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch attempts to address some user concerns that have arisen since errata 4 was introduced. * The errata warning has been made less scary for users without any encrypted datasets. * The errata warning now clears itself without a pool reimport if the bookmark_v2 feature is enabled and no encrypted datasets exist. * It is no longer possible to create new encrypted datasets without enabling the bookmark_v2 feature, thus helping to ensure that the errata is resolved. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Issue ##8308 Closes #8504
*	Update commented zed.rc values to defaults	kpande	2019-03-14	1	-3/+4
\| \| \| \| \| \| \| \| \|	Update zed.rc values reflect their default value. This helps avoid confusion if a user expects functionality to be enabled. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Kash Pande <[email protected]> Closes #8498
*	Make zstreamdump -v more greppable	Tom Caputi	2019-03-13	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the verbose output of zstreamdump includes new line characters within some individual records. Presumably, this was originally done to keep the output from getting too wide to fit on a terminal. However, since new flags and struct members have been added, these rules have not been maintained consistently. In addition, these newlines can make it hard to grep the output in some scenarios. This patch simply removes these newlines, making the output easier to grep and removing the inconsistency. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed by: Allan Jude <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8493
*	Detect and prevent mixed raw and non-raw sends	Tom Caputi	2019-03-13	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, there is an issue in the raw receive code where raw receives are allowed to happen on top of previously non-raw received datasets. This is a problem because the source-side dataset doesn't know about how the blocks on the destination were encrypted. As a result, any MAC in the objset's checksum-of-MACs tree that is a parent of both blocks encrypted on the source and blocks encrypted by the destination will be incorrect. This will result in authentication errors when we decrypt the dataset. This patch fixes this issue by adding a new check to the raw receive code. The code now maintains an "IVset guid", which acts as an identifier for the set of IVs used to encrypt a given snapshot. When a snapshot is raw received, the destination snapshot will take this value from the DRR_BEGIN payload. Non-raw receives and normal "zfs snap" operations will cause ZFS to generate a new IVset guid. When a raw incremental stream is received, ZFS will check that the "from" IVset guid in the stream matches that of the "from" destination snapshot. If they do not match, the code will error out the receive, preventing the problem. This patch requires an on-disk format change to add the IVset guids to snapshots and bookmarks. As a result, this patch has errata handling and a tunable to help affected users resolve the issue with as little interruption as possible. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8308
*	Fix typo in arc_summary3	jwittlincohen	2019-03-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This is a simple fix for a typo ("perfetch" rather than "prefetch") in arc_summary3. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Jason Cohen <[email protected]> Closes #8499
*	Avoid retrieving unused snapshot props	Alek P	2019-03-12	2	-8/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch modifies the zfs_ioc_snapshot_list_next() ioctl to enable it to take input parameters that alter the way looping through the list of snapshots is performed. The idea here is to restrict functions that throw away some of the snapshots returned by the ioctl to a range of snapshots that these functions actually use. This improves efficiency and execution speed for some rollback and send operations. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> Closes #8077
*	Do not resume a pool if multihost is enabled	Olaf Faaland	2019-02-28	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \|	When multihost is enabled, and a pool is suspended, return EINVAL in response to "zpool clear <pool>". The pool may have been imported on another host while I/O was suspended. Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6933 Closes #8460
*	zstreamdump: include embedded writes when dumping raw data (-d)	Allan Jude	2019-02-27	1	-0/+4
\| \| \| \| \| \| \| \| \|	When feeding a replication stream to `zstreamdump -d` (raw dump mode), it does not print the raw data for DRR_WRITE_EMBEDDED records. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #8430
*	zfs.8 has wrong description of "zfs program -t"	Matthew Ahrens	2019-02-26	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "-t" argument to "zfs program" specifies a limit on the number of LUA instructions that can be executed. The zfs.8 manpage has the wrong description. It should be updated to match what's in zfs-program.8 Also fix the formatting of the zfs help message. Reviewed by: Allan Jude <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8410
*	zfs should optionally send holds	Paul Zuchowski	2019-02-15	1	-6/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add -h switch to zfs send command to send dataset holds. If holds are present in the stream, zfs receive will create them on the target dataset, unless the zfs receive -h option is used to skip receive of holds. Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Signed-off-by: Paul Zuchowski <[email protected]> Closes #7513
*	zdb: replace label_t to zdb_label_t for reduce collisions	Igor K	2019-02-13	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	with builds on illumos based platform we can see build issue because label_t has been redefined. for reduce build issues on others platforms we should rename label_t to zdb_label_t. Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Igor Kozhukhov <[email protected]> Closes #8397
*	Get rid of space_map_update() for ms_synced_length	Serapheim Dimitropoulos	2019-02-12	1	-8/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Initially, metaslabs and space maps used to be the same thing in ZFS. Later, we started differentiating them by referring to the space map as the on-disk state of the metaslab, making the metaslab a higher-level concept that is metadata that deals with space accounting. Today we've managed to split that code furthermore, with the space map being its own on-disk data structure used in areas of ZFS besides metaslabs (e.g. the vdev-wide space maps used for zpool checkpoint or vdev removal features). This patch refactors the space map code to further split the space map code from the metaslab code. It does so by getting rid of the idea that the space map can have a different in-core and on-disk length (sm_length vs smp_length) which is something that is only used for the metaslab code, and other consumers of space maps just have to deal with. Instead, this patch introduces changes that move the old in-core length of the metaslab's space map to the metaslab structure itself (see ms_synced_length field) while making the space map code only care about the actual space map's length on-disk. The result of this is that space map consumers no longer have to deal with syncing two different lengths for the same structure (e.g. space_map_update() goes away) while metaslab specific behavior stays within the metaslab code. Specifically, the ms_synced_length field keeps track of the amount of data metaslab_load() can read from the metaslab's space map while working concurrently with metaslab_sync() that may be appending to that same space map. As a side note, the patch also adds a few comments around the metaslab code documenting some assumptions and expected behavior. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8328
*	shellcheck pass	bunder2015	2019-02-04	3	-5/+5
\| \| \| \| \| \| \| \| \| \|	note: which is non-standard. Use builtin 'command -v' instead. [SC2230] note: Use -n instead of ! -z. [SC2236] Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: bunder2015 <[email protected]> Closes #8367