openzfs/zfs.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	ZTS: Introduce targeted corruption in file blocks	John Wren Kennedy	2019-09-09	2	-19/+96
\| \| \| \| \| \| \| \| \| \| \| \|	filetest_001_pos verifies that various checksum algorithms detect corruption by overwriting the underlying vdev on which a file resides. It is possible for the overwrite to miss the blocks of a file, causing a spurious failure. This change introduces a function to corrupt the individual blocks of a file as determined by zdb. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: John Kennedy <[email protected]> Closes #9288
*	Clean up do_vol_test in zfs_copies tests	Ryan Moeller	2019-09-09	1	-41/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Get rid of the `get_used_prop` function. `get_prop used` works fine. Fix the comment describing the function parameters. The type does not have a default, and mntp is also used for ext2. Rename the variable for the number of copies from `copy` to `copies`. Use a `case` statement to match the type parameter, order the cases alphabetically, and add a little sanity checking for good measure. Use eval to make sure the output of commands is silenced rather than the log messages when redirecting output to /dev/null. Simplify cases where zfs requires special behavior. Don't allow the test to loop forever in the event space usage does not change. Bail out of the loop and fail after an arbitrary number of iterations. Add more information to the log message when the test fails, to help debugging. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9286
*	Fix noop receive of raw send stream	Tom Caputi	2019-09-05	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the noop receive code fails to work with raw send streams and resuming send streams. This happens because zfs_receive_impl() reads the DRR_BEGIN payload without reading the payload itself. Normally, the kernel expects to read this itself, but in this case the recv_skip() code runs instead and it is not prepared to handle the stream being left at any place other than the beginning of a record. This patch resolves this issue by manually reading the DRR_BEGIN payload in the dry-run case. This patch also includes a number of small fixups in this code path. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #9221 Closes #9173
*	Clean up zfs_clone_010_pos	Ryan Moeller	2019-09-05	1	-12/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove a lot of unnecessary setting and incrementing of `i`. Remove unused variable `j`. Instead of calling out to Python in a loop to generate the same string repeatedly, generate the string once using shell constructs before entering the loop. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Elling <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9284
*	Refactor checksum operations in tests	Ryan Moeller	2019-09-05	14	-58/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	md5sum in particular but also sha256sum to a lesser extent is used in several areas of the test suite for computing checksums. The vast majority of invocations are followed by `\| awk '{ print $1 }'`. Introduce functions to wrap up `md5sum $file \| awk '{ print $1 }'` and likewise for sha256sum. These also serve as a convenient interface for alternative implementations on other platforms. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9280
*	Use the right booleans	Ryan Moeller	2019-09-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	TRUE and FALSE happen to be defined, but we should use B_TRUE and B_FALSE for the sake of consistency. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9264
*	ZTS: Fix removal_cancel.ksh	Igor K	2019-09-03	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	Create a larger file to extend the time required to perform the removal. Occasional failures were observed due to the removal completing before the cancel could be requested. Reviewed-by: George Melikov <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Igor Kozhukhov <[email protected]> Closes #9259
*	Fix typos	Andrea Gelmini	2019-09-02	4	-6/+6
\| \| \| \| \| \| \|	Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #9251
*	Fix typos in tests/	Andrea Gelmini	2019-09-02	14	-24/+24
\| \| \| \| \| \| \|	Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #9250
*	Fix typos in tests/	Andrea Gelmini	2019-09-02	15	-22/+22
\| \| \| \| \| \| \|	Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #9249
*	Fix typos in tests/	Andrea Gelmini	2019-09-02	15	-21/+21
\| \| \| \| \| \| \|	Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #9247
*	Fix typos in tests/	Andrea Gelmini	2019-09-02	15	-18/+18
\| \| \| \| \| \| \|	Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #9246
*	Fix typos in tests/	Andrea Gelmini	2019-09-02	10	-12/+12
\| \| \| \| \| \| \|	Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #9244
*	Fix typos in tests/	Andrea Gelmini	2019-09-02	10	-10/+10
\| \| \| \| \| \| \|	Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #9243
*	Fix typos in tests/	Andrea Gelmini	2019-09-02	10	-11/+11
\| \| \| \| \| \| \|	Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #9242
*	Fix typos in tests/	Andrea Gelmini	2019-08-30	15	-19/+19
\| \| \| \| \| \| \|	Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #9248
*	Fix typos in tests/	Andrea Gelmini	2019-08-30	10	-10/+10
\| \| \| \| \| \| \|	Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #9245
*	Fix refquota_007_neg.ksh	Igor K	2019-08-30	1	-3/+3
\| \| \| \| \| \| \| \|	Must use 'zfs' instead of '$ZFS' which is undefined. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Igor Kozhukhov <[email protected]> Closes #9257
*	Prevent metaslab_sync panic due to spa_final_dirty_txg	Paul Dagnelie	2019-08-30	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a pool enables the SPACEMAP_HISTOGRAM feature shortly before being exported, we can enter a situation that causes a kernel panic. Any metaslabs that are loaded during the final dirty txg and haven't already been condensed will cause metaslab_sync to proceed after the final dirty txg so that the condense can be performed, which there are assertions to prevent. Because of the nature of this issue, there are a number of ways we can enter this state. Rather than try to prevent each of them one by one, potentially missing some edge cases, we instead cut it off at the point of intersection; by preventing metaslab_sync from proceeding if it would only do so to perform a condense and we're past the final dirty txg, we preserve the utility of the existing asserts while preventing this particular issue. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #9185 Closes #9186 Closes #9231 Closes #9253
*	Simplify deleting partitions in libtest	Ryan Moeller	2019-08-29	1	-45/+10
\| \| \| \| \| \| \| \| \| \|	Eliminate unnecessary code duplication. We can use a for-loop instead of a while-loop. There is no need to echo $DISKSARRAY in a subshell or return 0. Declare all variables with typeset. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9224
*	Use compatible arg order in tests	Ryan Moeller	2019-08-29	3	-8/+8
\| \| \| \| \| \| \| \| \|	BSD getopt() and getopt_long() want options before arguments. Reorder arguments to zfs/zpool in tests to put all the options first. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9228
*	ZTS: Temporarily disable several upgrade tests	Brian Behlendorf	2019-08-28	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Until issues #9185 and #9186 have been resolved the following zpool upgrade tests are being disabled to prevent CI failures. zpool_upgrade_002_pos, zpool_upgrade_003_pos, zpool_upgrade_004_pos, zpool_upgrade_007_pos, zpool_upgrade_008_pos Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #9185 Issue #9186 Closes #9225
*	Fix zil replay panic when TX_REMOVE followed by TX_CREATE	Chunwei Chen	2019-08-28	4	-3/+141
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If TX_REMOVE is followed by TX_CREATE on the same object id, we need to make sure the object removal is completely finished before creation. The current implementation relies on dnode_hold_impl with DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work fine before, in current version it does not guarantee the object removal is completed. We fix this by checking if DNODE_MUST_BE_FREE returns successful instead. Also add test and remove dead code in dnode_hold_impl. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #7151 Closes #8910 Closes #9123 Closes #9145
*	Prefer `for (;;)` to `while (TRUE)`	Ryan Moeller	2019-08-28	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \|	Defining a special constant to make an infinite loop is excessive, especially when the name clashes with symbols commonly defined on some platforms (ie FreeBSD). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: John Kennedy <[email protected] Signed-off-by: Ryan Moeller <[email protected]> Closes #9219
*	Restore :: in Makefile.am	Ryan Moeller	2019-08-26	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	The double-colon looked like a typo, but it's actually an obscure feature. Rules with :: may appear multiple times and are run independently of one another in the order they appear. The use of :: for distclean-local was conventional, not accidental. Add comments to indicate the intentional use of double-colon rules. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9210
*	Add regression test for "zpool list -p"	Paul Dagnelie	2019-08-25	4	-3/+115
\| \| \| \| \| \| \| \| \| \|	Other than this test, zpool list -p is not well tested by any of the automated tests. Add a test for zpool list -p. Reviewed-by: Prakash Surya <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #9134
*	ZTS: Fix in-tree dbufstats test case	Brian Behlendorf	2019-08-22	4	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Commit a887d653 updated the dbufstats such that escalated privileges are required. Since all tests under cli_user are run with normal privileges move this test case to a location where it will be run required privileges. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Michael Niewöhner <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9118 Closes #9196
*	Make slog test setup more robust	Ryan Moeller	2019-08-22	19	-10/+27
\| \| \| \| \| \| \| \| \| \| \|	The slog tests fail when attempting to create pools using file vdevs that already exist from previous test runs. Remove these files in the setup for the test. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9194
*	ZTS: Use decimal values when setting tunables	Brian Behlendorf	2019-08-22	4	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \|	The mdb_set_uint32 function requires that the values passed in be decimal. This was overlooked initially because the matching Linux function accepts both decimal and hexadecimal values. Reviewed-by: John Kennedy <[email protected]> Reviewed by: Sara Hartse <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Igor Kozhukhov <[email protected]> Closes #9125 Closes #9195
*	Dedup IOC enum values in libzfs_input_check	Ryan Moeller	2019-08-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Reuse enum value ZFS_IOC_BASE for `('Z' << 8)`. This is helpful on FreeBSD where ZFS_IOC_BASE has a different value and `('Z' << 8)` is wrong. Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9188
*	Enhance ioctl number checks	Ryan Moeller	2019-08-22	1	-87/+99
\| \| \| \| \| \| \| \| \|	When checking ZFS_IOC_* numbers, print which numbers are wrong rather than silently failing. Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9187
*	ZTS: Fix vdev_zaps_005_pos on CentOS 6	Brian Behlendorf	2019-08-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	The ancient version of blkid (v2.17.2) used in CentOS 6 will not detect the newly created pool unless it has been written to. Force a pool sync so `zpool import` will detect the newly created pool. Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9199
*	Add more refquota tests	Paul Dagnelie	2019-08-19	4	-2/+137
\| \| \| \| \| \| \| \| \| \| \| \| \|	It used to be possible for zfs receive (and other operations related to clone swap) to bypass refquotas. This can cause a number of issues, and there should be an automated test for it. Added tests for rollback and receive not overriding refquota. Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #9139
*	Fix out-of-order ZIL txtype lost on hardlinked files	Chunwei Chen	2019-08-13	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We should only call zil_remove_async when an object is removed. However, in current implementation, it is called whenever TX_REMOVE is called. In the case of hardlinked file, every unlink will generate TX_REMOVE and causing operations to be dropped even when the object is not removed. We fix this by only calling zil_remove_async when the file is fully unlinked. Reviewed-by: George Wilson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #8769 Closes #9061
*	Introduce getting holds and listing bookmarks through ZCP	Serapheim Dimitropoulos	2019-08-12	8	-3/+313
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consumers of ZFS Channel Programs can now list bookmarks, and get holds from datasets. A minor-refactoring was also applied to distinguish between user and system properties in ZCP. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Ported-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Dan Kimmel <[email protected]> OpenZFS-issue: https://illumos.org/issues/8862 Closes #7902
*	Test cancelling a removal in ZTS	Serapheim Dimitropoulos	2019-08-05	3	-4/+104
\| \| \| \| \| \| \| \| \|	This patch adds a new test that sanity checks cancelling a removal. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #9101
*	Revert "Develop tests for issues #5866 and #8858"	Brian Behlendorf	2019-07-29	10	-156/+2
\| \| \| \| \| \| \| \| \| \|	This reverts commit 693c1fc478cc8118dd0168c4815c0ae3be41c9c3. This change resulted in a kmem leak being observed in existing code which needs to be identified and addressed. Reviewed-by: Paul Zuchowski <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #8978 Closes #9090
*	Develop tests for issues #5866 and #8858	Paul Zuchowski	2019-07-26	10	-2/+156
\| \| \| \| \| \| \| \| \| \| \| \|	Provide zfstest coverage for these two issues which were a panic accessing extended attributes and a problem comparing 64 bit and 32 bit generation numbers. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Zuchowski <[email protected]> Issue #5866 Issue #8858 Closes #8978
*	Implement secpolicy_vnode_setid_retain()	Tomohiro Kusumi	2019-07-26	11	-0/+433
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't unconditionally return 0 (i.e. retain SUID/SGID). Test CAP_FSETID capability. https://github.com/pjd/pjdfstest/blob/master/tests/chmod/12.t which expects SUID/SGID to be dropped on write(2) by non-owner fails without this. Most filesystems make this decision within VFS by using a generic file write for fops. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #9035 Closes #9043
*	Fast Clone Deletion	Sara Hartse	2019-07-26	10	-5/+596
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Deleting a clone requires finding blocks are clone-only, not shared with the snapshot. This was done by traversing the entire block tree which results in a large performance penalty for sparsely written clones. This is new method keeps track of clone blocks when they are modified in a "Livelist" so that, when it’s time to delete, the clone-specific blocks are already at hand. We see performance improvements because now deletion work is proportional to the number of clone-modified blocks, not the size of the original dataset. Reviewed-by: Sean Eric Fagan <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Sara Hartse <[email protected]> Closes #8416
*	Move some tests to cli_user/zpool_status	Tony Hutter	2019-07-19	10	-10/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The tests in tests/functional/cli_root/zpool_status should all require root. However, linux.run has "user =" specified for those tests, which means they run as a normal user. When I removed that line to run them as root, the following tests did not pass: zpool_status_003_pos zpool_status_-c_disable zpool_status_-c_homedir zpool_status_-c_searchpath These tests need to be run as a normal user. To fix this, move these tests to a new tests/functional/cli_user/zpool_status directory. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #9057
*	Add zfs create dryrun	Mike Gerdts	2019-07-16	4	-2/+338
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds the ability to sanity check zfs create arguments and to see the value of any additional properties that will local to the dataset. For example, automation that may need to adjust quota on a parent filesystem before creating a volume may call `zfs create -nP -V <size> <volume>` to obtain the value of refreservation. This adds the following options to zfs create: - -n dry-run (no-op) - -v verbose - -P parseable (implies verbose) Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Jerry Jelinek <[email protected]> Signed-off-by: Mike Gerdts <[email protected]> Closes #8974
*	Log Spacemap Project	Serapheim Dimitropoulos	2019-07-16	7	-4/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	= Motivation At Delphix we've seen a lot of customer systems where fragmentation is over 75% and random writes take a performance hit because a lot of time is spend on I/Os that update on-disk space accounting metadata. Specifically, we seen cases where 20% to 40% of sync time is spend after sync pass 1 and ~30% of the I/Os on the system is spent updating spacemaps. The problem is that these pools have existed long enough that we've touched almost every metaslab at least once, and random writes scatter frees across all metaslabs every TXG, thus appending to their spacemaps and resulting in many I/Os. To give an example, assuming that every VDEV has 200 metaslabs and our writes fit within a single spacemap block (generally 4K) we have 200 I/Os. Then if we assume 2 levels of indirection, we need 400 additional I/Os and since we are talking about metadata for which we keep 2 extra copies for redundancy we need to triple that number, leading to a total of 1800 I/Os per VDEV every TXG. We could try and decrease the number of metaslabs so we have less I/Os per TXG but then each metaslab would cover a wider range on disk and thus would take more time to be loaded in memory from disk. In addition, after it's loaded, it's range tree would consume more memory. Another idea would be to just increase the spacemap block size which would allow us to fit more entries within an I/O block resulting in fewer I/Os per metaslab and a speedup in loading time. The problem is still that we don't deal with the number of I/Os going up as the number of metaslabs is increasing and the fact is that we generally write a lot to a few metaslabs and a little to the rest of them. Thus, just increasing the block size would actually waste bandwidth because we won't be utilizing our bigger block size. = About this patch This patch introduces the Log Spacemap project which provides the solution to the above problem while taking into account all the aforementioned tradeoffs. The details on how it achieves that can be found in the references sections below and in the code (see Big Theory Statement in spa_log_spacemap.c). Even though the change is fairly constraint within the metaslab and lower-level SPA codepaths, there is a side-change that is user-facing. The change is that VDEV IDs from VDEV holes will no longer be reused. To give some background and reasoning for this, when a log device is removed and its VDEV structure was replaced with a hole (or was compacted; if at the end of the vdev array), its vdev_id could be reused by devices added after that. Now with the pool-wide space maps recording the vdev ID, this behavior can cause problems (e.g. is this entry referring to a segment in the new vdev or the removed log?). Thus, to simplify things the ID reuse behavior is gone and now vdev IDs for top-level vdevs are truly unique within a pool. = Testing The illumos implementation of this feature has been used internally for a year and has been in production for ~6 months. For this patch specifically there don't seem to be any regressions introduced to ZTS and I have been running zloop for a week without any related problems. = Performance Analysis (Linux Specific) All performance results and analysis for illumos can be found in the links of the references. Redoing the same experiments in Linux gave similar results. Below are the specifics of the Linux run. After the pool reached stable state the percentage of the time spent in pass 1 per TXG was 64% on average for the stock bits while the log spacemap bits stayed at 95% during the experiment (graph: sdimitro.github.io/img/linux-lsm/PercOfSyncInPassOne.png). Sync times per TXG were 37.6 seconds on average for the stock bits and 22.7 seconds for the log spacemap bits (related graph: sdimitro.github.io/img/linux-lsm/SyncTimePerTXG.png). As a result the log spacemap bits were able to push more TXGs, which is also the reason why all graphs quantified per TXG have more entries for the log spacemap bits. Another interesting aspect in terms of txg syncs is that the stock bits had 22% of their TXGs reach sync pass 7, 55% reach sync pass 8, and 20% reach 9. The log space map bits reached sync pass 4 in 79% of their TXGs, sync pass 7 in 19%, and sync pass 8 at 1%. This emphasizes the fact that not only we spend less time on metadata but we also iterate less times to convergence in spa_sync() dirtying objects. [related graphs: stock- sdimitro.github.io/img/linux-lsm/NumberOfPassesPerTXGStock.png lsm- sdimitro.github.io/img/linux-lsm/NumberOfPassesPerTXGLSM.png] Finally, the improvement in IOPs that the userland gains from the change is approximately 40%. There is a consistent win in IOPS as you can see from the graphs below but the absolute amount of improvement that the log spacemap gives varies within each minute interval. sdimitro.github.io/img/linux-lsm/StockVsLog3Days.png sdimitro.github.io/img/linux-lsm/StockVsLog10Hours.png = Porting to Other Platforms For people that want to port this commit to other platforms below is a list of ZoL commits that this patch depends on: Make zdb results for checkpoint tests consistent db587941c5ff6dea01932bb78f70db63cf7f38ba Update vdev_is_spacemap_addressable() for new spacemap encoding 419ba5914552c6185afbe1dd17b3ed4b0d526547 Simplify spa_sync by breaking it up to smaller functions 8dc2197b7b1e4d7ebc1420ea30e51c6541f1d834 Factor metaslab_load_wait() in metaslab_load() b194fab0fb6caad18711abccaff3c69ad8b3f6d3 Rename range_tree_verify to range_tree_verify_not_present df72b8bebe0ebac0b20e0750984bad182cb6564a Change target size of metaslabs from 256GB to 16GB c853f382db731e15a87512f4ef1101d14d778a55 zdb -L should skip leak detection altogether 21e7cf5da89f55ce98ec1115726b150e19eefe89 vs_alloc can underflow in L2ARC vdevs 7558997d2f808368867ca7e5234e5793446e8f3f Simplify log vdev removal code 6c926f426a26ffb6d7d8e563e33fc176164175cb Get rid of space_map_update() for ms_synced_length 425d3237ee88abc53d8522a7139c926d278b4b7f Introduce auxiliary metaslab histograms 928e8ad47d3478a3d5d01f0dd6ae74a9371af65e Error path in metaslab_load_impl() forgets to drop ms_sync_lock 8eef997679ba54547f7d361553d21b3291f41ae7 = References Background, Motivation, and Internals of the Feature - OpenZFS 2017 Presentation: youtu.be/jj2IxRkl5bQ - Slides: slideshare.net/SerapheimNikolaosDim/zfs-log-spacemaps-project Flushing Algorithm Internals & Performance Results (Illumos Specific) - Blogpost: sdimitro.github.io/post/zfs-lsm-flushing/ - OpenZFS 2018 Presentation: youtu.be/x6D2dHRjkxw - Slides: slideshare.net/SerapheimNikolaosDim/zfs-log-spacemap-flushing-algorithm Upstream Delphix Issues: DLPX-51539, DLPX-59659, DLPX-57783, DLPX-61438, DLPX-41227, DLPX-59320 DLPX-63385 Reviewed-by: Sean Eric Fagan <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8442
*	Fix ZTS killed processes detection	Attila Fülöp	2019-07-10	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	log_neg_expect was using the wrong exit status to detect if a process got killed by SIGSEGV or SIGBUS, resulting in false positives. Reviewed-by: loli10K <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes #9003
*	Fix race in parallel mount's thread dispatching algorithm	Tomohiro Kusumi	2019-07-09	3	-1/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Strategy of parallel mount is as follows. 1) Initial thread dispatching is to select sets of mount points that don't have dependencies on other sets, hence threads can/should run lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets to be mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that the initial thread dispatching in zfs_foreach_mountpoint() can be multi-threaded when it needs to be single-threaded, and this puts threads under race condition. This race appeared as mount/unmount issues on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts were reordered by the race condition. There are currently two known patterns of input list `handles` in `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition. 1) #8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directories. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test1 and "/a/b" * ThreadB for "/a" for test0/a and in case of #8833, ThreadA won the race. Two threads were created because "/a" wasn't considered as `"/a" contains "/a"`. 2) #8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list containing "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of #8450, ThreadA won the race. Two threads were created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is (at least one) "/" in the input list, the initial thread dispatching must be single-threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result the initial thread dispatching creates another thread when it needs to be single-threaded. Fix a conditional in libzfs_path_contains() to consider above two. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Sebastien Roy <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #8450 Closes #8833 Closes #8878
*	OpenZFS 9318 - vol_volsize_to_reservation does not account for raidz skip blocks	Mike Gerdts	2019-07-05	4	-2/+332
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a volume is created in a pool with raidz vdevs and volblocksize != 128k, the volume can reference more space than is reserved with the automatically calculated refreservation. There are two deficiencies in vol_volsize_to_reservation that contribute to this: 1) Skip blocks may be added to keep each allocation a multiple of parity + 1. This is the dominating factor when volblocksize is close to 2^ashift. 2) raidz deflation for 128 KB blocks is different for most other block sizes. See "The theory of raidz space accounting" comment in libzfs_dataset.c for a full explanation. Authored by: Mike Gerdts <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Sanjay Nadkarni <[email protected]> Reviewed by: Jerry Jelinek <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Kody Kantor <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Mike Gerdts <[email protected]> Porting Notes: * ZTS: wait for zvols to exist before writing * ZTS: use log_must_busy with {zpool\|zfs} destroy OpenZFS-issue: https://www.illumos.org/issues/9318 OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/b73ccab0 Closes #8973
*	nopwrites on dmu_sync-ed blocks can result in a panic	George Wilson	2019-06-28	3	-2/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After device removal, performing nopwrites on a dmu_sync-ed block will result in a panic. This panic can show up in two ways: 1. an attempt to issue an IOCTL in vdev_indirect_io_start() 2. a failed comparison of zio->io_bp and zio->io_bp_orig in zio_done() To resolve both of these panics, nopwrites of blocks on indirect vdevs should be ignored and new allocations should be performed on concrete vdevs. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: George Wilson <[email protected]> Closes #8957
*	Add 'zfs umount -u' for encrypted datasets	Tom Caputi	2019-06-28	3	-2/+82
\| \| \| \| \| \| \| \| \| \| \|	This patch adds the ability for the user to unload keys for datasets as they are being unmounted. This is analogous to 'zfs mount -l'. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes: #8917 Closes: #8952
*	-Y option for zdb is valid	Igor K	2019-06-24	1	-1/+1
\| \| \| \| \| \| \| \|	The -Y option was added for ztest to test split block reconstruction. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Elling <[email protected]> Signed-off-by: Igor Kozhukhov <[email protected]> Closes #8926
*	Remove code for zfs remap	Matthew Ahrens	2019-06-24	14	-387/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "zfs remap" command was disabled by 6e91a72fe3ff8bb282490773bd687632f3e8c79d, because it has little utility and introduced some tricky bugs. This commit removes the code for it, the associated ZFS_IOC_REMAP ioctl, and tests. Note that the ioctl and property will remain, but have no functionality. This allows older software to fail gracefully if it attempts to use these, and avoids a backwards incompatibility that would be introduced if we renumbered the later ioctls/props. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8944