openzfs/zfs.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Move assert in dump_dir() in zdb	Tom Caputi	2018-12-05	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	This one line patch moves an assert in the function dump_dir() below an error check that ensures it ran correctly. This ensures zdb dumps the error that actually caused the problem, as opposed to one of its symptoms. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8171
*	Fix 'zpool list -v' alignment	Brian Behlendorf	2018-12-04	1	-52/+105
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The verbose output of 'zpool list' was not correctly aligned due to differences in the vdev name lengths. Minimally update the code the correct the alignment using the same strategy employed by 'zpool status'. Missing dashes were added for the empty defaults columns, and the vdev state is now printed for all vdevs. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7308 Closes #8147
*	Fix ztest deadlock in ztest_zil_remount()	Tom Caputi	2018-12-04	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a small race condition in ztest_zil_remount() that could result in a deadlock. ztest_device_removal() calls spa_vdev_remove() which may eventually call spa_reset_logs(). If ztest_zil_remount() attempts to call zil_close() while this is happening, it may fail when it asserts !zilog_is_dirty(zilog). This patch simply adds locking to correct the issue. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8154
*	Fix ASSERT in zfs_receive_one()	LOLi	2018-12-04	1	-1/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit fixes the following ASSERT in zfs_receive_one() when receiving a send stream from a root dataset with the "-e" option: $ sudo zfs snap source@snap $ sudo zfs send source@snap \| sudo zfs recv -e destination/recv chopprefix > drrb->drr_toname ASSERT at libzfs_sendrecv.c:3804:zfs_receive_one() Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8121
*	Fix consistency of ztest_device_removal_active	Tom Caputi	2018-11-28	1	-1/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ztest currently uses the boolean flag ztest_device_removal_active to protect some tests that may not run successfully if they occur at the same time as ztest_device_removal(). Unfortunately, in the event that ztest is in the middle of a device removal when it decides to issue a SIGKILL, the device removal will be automatically restarted (without setting the flag) when the pool is re-imported on the next run. This patch corrects this by ensuring that any in-progress removals are completed before running further tests after the re-import. This patch also makes a few small changes to prevent race conditions involving the creation and destruction of spa->spa_vdev_removal, since this field is not protected by any locks. Some checks that may run concurrently with setting / unsetting this field have been updated to check spa->spa_removing_phys.sr_state instead. The most significant change here is that spa_removal_get_stats() no longer accounts for in-flight work done, since that could result in a NULL pointer dereference. Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8105
*	zpool: allow split with whole-disk devices	LOLi	2018-11-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This change allows 'zpool split' to work with whole-disk devices and updates the ZFS Test Suite with a new script to exercise this functionality. Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6643 Closes #8133
*	OpenZFS 8115 - parallel zfs mount	Sebastien Roy	2018-11-15	1	-30/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Porting Notes: * Use thread pools (tpool) API instead of introducing taskq interfaces to libzfs. * Use pthread_mutext for locks as mutex_t isn't available. * Ignore alternative libshare initialization since OpenZFS-7955 is not present on zfsonlinux. Authored by: Sebastien Roy <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Reviewed by: Brad Lewis <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Prashanth Sreenivasa <[email protected]> Authored by: Brian Behlendorf <[email protected]> Approved by: Matt Ahrens <[email protected]> Ported-by: Don Brady <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8115 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/a3f0e2b569 Closes #8092
*	zed: detect and offline physically removed devices	loli10K	2018-11-09	3	-35/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds a new test case to the ZFS Test Suite to verify ZED can detect when a device is physically removed from a running system: the device will be offlined if a spare is not available in the pool. We implement this by using the existing libudev functionality and without relying solely on the FM kernel module capabilities which have been observed to be unreliable with some kernels. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #1537 Closes #7926
*	Add zpool status -s (slow I/Os) and -p (parseable)	Tony Hutter	2018-11-08	1	-8/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a new slow I/Os (-s) column to zpool status to show the number of VDEV slow I/Os. This is the number of I/Os that didn't complete in zio_slow_io_ms milliseconds. It also adds a new parsable (-p) flag to display exact values. NAME STATE READ WRITE CKSUM SLOW testpool ONLINE 0 0 0 - mirror-0 ONLINE 0 0 0 - loop0 ONLINE 0 0 0 20 loop1 ONLINE 0 0 0 0 Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #7756 Closes #6885
*	Fix !zilog_is_dirty() assert during ztest	Tom Caputi	2018-11-07	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ztest occasionally hits an assert that !zilog_is_dirty() during zil_close(). This is caused by an interaction between 2 threads. First, ztest_run() waits for each test thread to complete and closes the associated dataset as soon as the thread joins. At the same time, the ztest_vdev_add_remove() test may attempt to remove the slog, which will open, dirty, and reset the logs on every dataset in the pool (including those of other threads). This patch simply ensures that we always join all of the test threads before closing any datasets. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8094
*	Replay logs before starting ztest workers	Tom Caputi	2018-11-07	1	-11/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch ensures that logs are replayed on all datasets prior to starting ztest workers. This ensures that the call to vdev_offline() a log device in ztest_fault_inject() will not fail due to the log device being required for replay. This patch also fixes a small issue found during testing where spa_keystore_load_wkey() does not check that the dataset specified is an encryption root. This check was present in libzfs, however. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8084
*	ztest: reduce gangblock creation	Brian Behlendorf	2018-11-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to validate the gang block code ztest is configured to artificially force a fraction of large blocks to be written as gang blocks. The default setting chosen for this was to write 25% of all blocks 32k or larger using gang blocks. The confluence of an unrealistically large number of gang blocks, the aggressive fault injection done by ztest, and the split segment reconstruction logic introduced by device removal has resulted in the following type of failure: zdb -bccsv -G -d ... exit code 3 Specifically, zdb was unable to open the pool because it was unable to reconstruct a damaged block. Manual investigation of multiple failures clearly showed that the block could be reconstructed. However, due to the large number of damaged segments (>35) it could not be done in the allotted time. Furthermore, the large number of gang blocks was determined to be the reason for the unrealistically large number of damaged segments. In order to make this situation less likely, this change both increases the forced gang block size to 64k and reduces the frequency to 3% of blocks. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8080
*	Add libzutil for libzfs or libzpool consumers	Don Brady	2018-11-05	16	-141/+78
\| \| \| \| \| \| \| \| \| \| \|	Adds a libzutil for utility functions that are common to libzfs and libzpool consumers (most of what was in libzfs_import.c). This removes the need for utilities to link against both libzpool and libzfs. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #8050
*	zdb -k does not work on Linux when used with -e	Serapheim Dimitropoulos	2018-10-30	1	-11/+18
\| \| \| \| \| \| \| \| \| \| \| \|	This minor bug was introduced with the port of the feature from OpenZFS to ZoL. This patch fixes the issue that was caused by a minor re-ordering from the original code. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8001
*	Added column definitions to arcstat.py	Gregor Kopka	2018-10-29	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	grow: ARC Grow enabled (!arc_no_grow) free: ARC Free memory (arc_sys_free) need: ARC Reclaim need (arc_need_free) Fixed alignment issues (mread had wrong width). Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Gregor Kopka <[email protected]> Closes #8058
*	ZTS: Fix auto_replace_001_pos test	Brian Behlendorf	2018-10-29	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The root cause of these failures is that udev can notify the ZED of newly created partition before its links are created. Handle this by allowing an auto-replace to briefly wait until udev confirms the links exist. Distill this test case down to its essentials so it can be run reliably. What we need to check is that: 1) A new disk, in the same physical location, is automatically brought online when added to the system, 2) It completes the replacement process, and 3) The pool is now ONLINE and healthy. There is no need to remove the scsi_debug module. After exporting the pool the disk can be zeroed, removed, and then re-added to the system as a new disk. Reviewed by: loli10K <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8051
*	Fix flake8 "invalid escape sequence 'x'" warning	Brian Behlendorf	2018-10-24	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From, https://lintlyci.github.io/Flake8Rules/rules/W605.html As of Python 3.6, a backslash-character pair that is not a valid escape sequence now generates a DeprecationWarning. Although this will eventually become a SyntaxError, that will not be for several Python releases. Note 'float_pobj' was simply removed from arcstat.py since it was entirely unused. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Richard Elling <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8056
*	Fix waiting in ztest_device_removal()	Tom Caputi	2018-10-24	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	spa->spa_vdev_removal is created in a sync task that is initiated via dsl_sync_task_nowait(). Since the task may not run before spa_vdev_remove() returns, we must wait at least 1 txg to ensure that the removal struct has been created. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8010
*	Fix ztest deadman panic with indirect vdev damage	Tom Caputi	2018-10-24	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes an issue where ztest's deadman thread would trigger a panic because reconstructing artifically damaged blocks would take too long to reconstruct. This patch simply limits how often ztest inflicts split-block damage and how many segments it can damage when it does. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8010
*	Fix dbgmsg printing in ztest and zdb	Tom Caputi	2018-10-24	2	-6/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch resolves a problem where the -G option in both zdb and ztest would cause the code to call __dprintf() to print zfs_dbgmsg output. This function was not properly wired to add messages to the dbgmsg log as it is in userspace and so the messages were simply dropped. This patch also tries to add some degree of distinction to dprintf() (which now prints directly to stdout) and zfs_dbgmsg() (which adds messages to an internal list that can be dumped with zfs_dbgmsg_print()). In addition, this patch corrects an issue where ztest used a global variable to decide whether to dump the dbgmsg buffer on a crash. This did not work because ztest spins up more instances of itself using execv(), which did not copy the global variable to the new process. The option has been moved to the ztest_shared_opts_t which already exists for interprocess communication. This patch also changes zfs_dbgmsg_print() to use write() calls instead of printf() so that it will not fail when used in a signal handler. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8010
*	Fix random ztest_deadman_thread failures	Tom Caputi	2018-10-24	1	-11/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	The zloop test has been failing in buildbot for the last few weeks with various failures in ztest_deadman_thread(). This is due to the fact that this thread is not stopped when performing pool import / export tests as it should be. This patch simply corrects this. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8010
*	OpenZFS 9682 - page fault in dsl_async_clone_destroy() while opening pool	Serapheim Dimitropoulos	2018-10-19	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Brad Lewis <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Sara Hartse <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: George Melikov <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9682 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ade2c82828 Closes #8037
*	OpenZFS 9681 - ztest failure in spa_history_log_internal due to spa_rename()	Matthew Ahrens	2018-10-19	1	-77/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Matthew Ahrens <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: George Melikov <[email protected]> Reviewed by: Tom Caputi <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9681 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/6aee0ad7 Closes #8041
*	Defer new resilvers until the current one ends	Tom Caputi	2018-10-18	1	-6/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, if a resilver is triggered for any reason while an existing one is running, zfs will immediately restart the existing resilver from the beginning to include the new drive. This causes problems for system administrators when a drive fails while another is already resilvering. In this case, the optimal thing to do to reduce risk of data loss is to wait for the current resilver to end before immediately replacing the second failed drive, which allows the system to operate with two incomplete drives for the minimum amount of time. This patch introduces the resilver_defer feature that essentially does this for the admin without forcing them to wait and monitor the resilver manually. The change requires an on-disk feature since we must mark drives that are part of a deferred resilver in the vdev config to ensure that we do not assume they are done resilvering when an existing resilver completes. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: @mmaybee Signed-off-by: Tom Caputi <[email protected]> Closes #7732
*	zpool: allow sharing of spare device among pools	LOLi	2018-10-17	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \|	ZFS allows, by default, sharing of spare devices among different pools; this commit simply restores this functionality for disk devices and adds an additional tests case to the ZFS Test Suite to prevent future regression. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7999
*	Add types to featureflags in zfs	Paul Dagnelie	2018-10-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The boolean featureflags in use thus far in ZFS are extremely useful, but because they take advantage of the zap layer, more interesting data than just a true/false value can be stored in a featureflag. In redacted send/receive, this is used to store the list of redaction snapshots for a redacted dataset. This change adds the ability for ZFS to store types other than a boolean in a featureflag. The only other implemented type is a uint64_t array. It also modifies the interfaces around dataset features to accomodate the new capabilities, and adds a few new functions to increase encapsulation. This functionality will be used by the Redacted Send/Receive feature. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #7981
*	OpenZFS 9847 - leaking dd_clones (DMU_OT_DSL_CLONES) objects (#7979)	Matthew Ahrens	2018-10-12	1	-5/+251
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	OpenZFS 9847 - leaking dd_clones (DMU_OT_DSL_CLONES) objects We're leaking the dd_clones objects in dsl_dir_destroy_sync. This bug appears to have been around forever. Thankfully the amount of space typically involved is tiny. In addition this adds a mechanism in ZDB to find objects in the MOS which are leaked (not referenced anywhere). Porting notes: * Added dd_crypto_obj to ZDB MOS object leak tracking Authored by: Matthew Ahrens <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Matthew Ahrens <[email protected]> OpenZFS-issue: https://illumos.org/issues/9847 Closes #7979
*	Improved error handling for extreme rewinds	Brian Behlendorf	2018-10-12	1	-12/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and vdev_obsolete_counts_are_precise() functions assume that the only way a zap_lookup() can fail is if the requested entry is missing. While this is the most common cause, it's not the only cause. Attemping to access a damaged ZAP will result in other errors. The most likely scenario for accessing a damaged ZAP is during an extreme rewind pool import. Under these conditions the pool is expected to contain damaged objects and the import code was updated to handle this gracefully. Getting an ECKSUM error from these ZAPs after the pool in import a far less likely, therefore the behavior for call paths was not modified. Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7809 Closes #7921
*	OpenZFS 9689 - zfs range lock code should not be zpl-specific	Matt Ahrens	2018-10-11	1	-155/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ZFS range locking code in zfs_rlock.c/h depends on ZPL-specific data structures, specifically znode_t. However, it's also used by the ZVOL code, which uses a "dummy" znode_t to pass to the range locking code. We should clean this up so that the range locking code is generic and can be used equally by ZPL and ZVOL, and also can be used by future consumers that may need to run in userland (libzpool) as well as the kernel. Porting notes: * Added missing sys/avl.h include to sys/zfs_rlock.h. * Removed 'dbuf is within the locked range' ASSERTs from dmu_sync(). This was needed because ztest does not yet use a locked_range_t. * Removed "Approved by:" tag requirement from OpenZFS commit check to prevent needless warnings when integrating changes which has not been merged to illumos. * Reverted free_list range lock changes which were originally needed to defer the cv_destroy() which was called immediately after cv_broadcast(). With d2733258 this should be safe but if not we may need to reintroduce this logic. * Reverts: The following two commits were reverted and squashed in to this change in order to make it easier to apply OpenZFS 9689. - d88895a0, which removed the dummy znode from zvol_state - e3a07cd0, which updated ztest to use range locks * Preserved optimized rangelock comparison function. Preserved the rangelock free list. The cv_destroy() function will block waiting for all processes in cv_wait() to be scheduled and drop their reference. This is done to ensure it's safe to free the condition variable. However, blocking while holding the rl->rl_lock mutex can result in a deadlock on Linux. A free list is introduced to defer the cv_destroy() and kmem_free() until after the mutex is released. Authored by: Matthew Ahrens <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Brad Lewis <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://illumos.org/issues/9689 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/680 External-issue: DLPX-58662 Closes #7980
*	Print "(repairing)" in zpool status again	Tony Hutter	2018-10-09	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Historically, zpool status prints "(repairing)" for any drives that have errors during a scrub: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 /tmp/file1 ONLINE 13 0 0 (repairing) /tmp/file2 ONLINE 0 0 0 /tmp/file3 ONLINE 0 0 0 This was accidentally broken in "OpenZFS 9166 - zfs storage pool checkpoint" (d2734cc). This patch adds it back in. Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #7779 Closes #7978
*	Refcounted DSL Crypto Key Mappings	Tom Caputi	2018-10-03	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since native ZFS encryption was merged, we have been fighting against a series of bugs that come down to the same problem: Key mappings (which must be present during all I/O operations) are created and destroyed based on dataset ownership, but I/Os can have traditionally been allowed to "leak" into the next txg after the dataset is disowned. In the past we have attempted to solve this problem by trying to ensure that datasets are disowned ater all I/O is finished by calling txg_wait_synced(), but we have repeatedly found edge cases that need to be squashed and code paths that might incur a high number of txg syncs. This patch attempts to resolve this issue differently, by adding a reference to the key mapping for each txg it is dirtied in. By doing so, we can remove many of the unnecessary calls to txg_wait_synced() we have added in the past and ensure we don't need to deal with this problem in the future. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7949
*	Prefix all refcount functions with zfs_	Tim Schumacher	2018-10-01	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Recent changes in the Linux kernel made it necessary to prefix the refcount_add() function with zfs_ due to a name collision. To bring the other functions in line with that and to avoid future collisions, prefix the other refcount functions as well. Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Schumacher <[email protected]> Closes #7963
*	Refine split block reconstruction	Brian Behlendorf	2018-10-01	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to a flaw in 4589f3ae the number of unique combinations could be calculated incorrectly. This could result in the random combinations reconstruction being used when it would have been possible to check all combinations. This change fixes the unique combinations calculation and simplifies the reconstruction logic by maintaining a per- segment list of unique copies. The vdev_indirect_splits_damage() function was introduced to validate both the enumeration and random reconstruction logic with ztest. It is implemented such it will never make a known recoverable block unrecoverable. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #6900 Closes #7934
*	Linux 4.19-rc3+ compat: Remove refcount_t compat	Tim Schumacher	2018-09-26	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	torvalds/linux@59b57717f ("blkcg: delay blkg destruction until after writeback has finished") added a refcount_t to the blkcg structure. Due to the refcount_t compatibility code, zfs_refcount_t was used by mistake. Resolve this by removing the compatibility code and replacing the occurrences of refcount_t with zfs_refcount_t. Reviewed-by: Franz Pletz <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Schumacher <[email protected]> Closes #7885 Closes #7932
*	Zpool iostat: remove latency/queue scaling	Gregor Kopka	2018-09-25	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bandwidth and iops are average per second while _wait are averages per request for latency or, for queue depths, an instantaneous measurement at the end of an interval (according to man zpool). When calculating the first two it makes sense to do x/interval_duration (x being the increase in total bytes or number of requests over the duration of the interval, interval_duration in seconds) to 'scale' from amount/interval_duration to amount/second. But applying the same math for the latter (_wait latencies/queue) is wrong as there is no interval_duration component in the values (these are time/requests to get to average_time/request or already an absulute number). This bug leads to the only correct continuous *_wait figures for both latencies and queue depths from 'zpool iostat -l/q' being with duration=1 as then the wrong math cancels itself (x/1 is a nop). This removes temporal scaling from latency and queue depth figures. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Gregor Kopka <[email protected]> Closes #7945 Closes #7694
*	zstreamdump dumps core printing truncated nvlist	LOLi	2018-09-18	1	-3/+5
\| \| \| \| \| \| \| \|	This change prevents zstreamdump from crashing when trying to print invalid nvlist data (DRR_BEGIN record) from a truncated send stream. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7917
*	zpool should detect invalid fs property on create	LOLi	2018-09-13	1	-4/+11
\| \| \| \| \| \| \| \| \| \| \|	This change improve the handling of invalid filesystem properties when specified at pool creation: this is useful when 'zpool create -n' (dry run) is executed to detect invalid fs-level options (-O) before the actual command is run. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7620 Closes #7878
*	Fix 'zfs allow' for create time permissions	LOLi	2018-09-06	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When no permission set is defined for a dataset the create time permissions are incorrectly shown as if they were a permission set. This change simply correct how allow permissions are displayed. This commit also fixes a small manpage formatting issue and adds the "zfs_allow_003_pos" test case to the ZFS Test Suite. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7519 Closes #7860
*	Pool allocation classes	Don Brady	2018-09-05	4	-131/+582
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allocation Classes add the ability to have allocation classes in a pool that are dedicated to serving specific block categories, such as DDT data, metadata, and small file blocks. A pool can opt-in to this feature by adding a 'special' or 'dedup' top-level VDEV. Reviewed by: Pavel Zakharov <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Håkan Johansson <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: DHE <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Gregor Kopka <[email protected]> Reviewed-by: Kash Pande <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #5182
*	Add basic zfs ioc input nvpair validation	Don Brady	2018-09-02	1	-10/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We want newer versions of libzfs_core to run against an existing zfs kernel module (i.e. a deferred reboot or module reload after an update). Programmatically document, via a zfs_ioc_key_t, the valid arguments for the ioc commands that rely on nvpair input arguments (i.e. non legacy commands from libzfs_core). Automatically verify the expected pairs before dispatching a command. This initial phase focuses on the non-legacy ioctls. A follow-on change can address the legacy ioctl input from the zfs_cmd_t. The zfs_ioc_key_t for zfs_keys_channel_program looks like: static const zfs_ioc_key_t zfs_keys_channel_program[] = { {"program", DATA_TYPE_STRING, 0}, {"arg", DATA_TYPE_UNKNOWN, 0}, {"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL}, {"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL}, {"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL}, }; Introduce four input errors to identify specific input failures (in addition to generic argument value errors like EINVAL, ERANGE, EBADF, and E2BIG). ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #7780
*	ZTS: path cleanup	bernie1995	2018-08-30	1	-2/+4
\| \| \| \| \| \| \| \| \|	Removing hardcoded paths in many scripts. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: bernie1995 <[email protected]> Issue #7507 Closes #7843
*	OpenZFS 9403 - assertion failed in arc_buf_destroy()	Tom Caputi	2018-08-29	2	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Assertion failed in arc_buf_destroy() when concurrently reading block with checksum error. Porting notes: * The ability to zinject decompression errors has been added, but this only works at the zio_decompress() level, where we have all of the info we need to match against the user's zinject options. * The decompress_fault test has been added to test the new zinject functionality * We attempted to set zio_decompress_fail_fraction to (1 << 18) in ztest for further test coverage. Although this did uncover a few low priority issues, this unfortuantely also causes ztest to ASSERT in many locations where the code is working correctly since it is designed to fail on IO errors. Developers can manually set this variable with the '-o' option to find and debug issues. Authored by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Matt Ahrens <[email protected]> Ported-by: Tom Caputi <[email protected]> OpenZFS-issue: https://illumos.org/issues/9403 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/fa98e487a9 Closes #7822
*	Added metadata/dnode cache info to arc_summary	Rich Ercolani	2018-08-22	2	-3/+60
\| \| \| \| \| \| \|	Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #7815
*	Skip import activity test in more zdb code paths	Olaf Faaland	2018-08-20	1	-16/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since zdb opens the pools read-only, it cannot damage the pool in the event the pool is already imported either on the same host or on another one. If the pool vdev structure is changing while zdb is importing the pool, it may cause zdb to crash. However this is unlikely, and in any case it's a user space process and can simply be run again. For this reason, zdb should disable the multihost activity test on import that is normally run. This commit fixes a few zdb code paths where that had been overlooked. It also adds tests to ensure that several common use cases handle this properly in the future. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Gu Zheng <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7797 Closes #7801
*	Don't modify argv[] in user tools	DeHackEd	2018-08-20	2	-4/+32
\| \| \| \| \| \| \| \| \| \| \| \|	argv[] gets modified during string parsing for input arguments. This is reflected in the live process listing. Don't do that. Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: DHE <[email protected]> Closes #7760
*	'zfs holds' scripted mode is not documented	LOLi	2018-08-18	1	-3/+4
\| \| \| \| \| \| \| \| \| \|	This change simply documents the existing "scripted mode" option in both command help and man page. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7798
*	Fix arcstat.py handling of unsupported options	LOLi	2018-08-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	This change allows the arcstat.py script to handle unsupported options gracefully and print both error and usage messages when one such option is provided. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7799
*	MMP should not suspend pool in ztest	Olaf Faaland	2018-08-13	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When running ztest, never suspend the pool due to failed or delayed MMP writes. There are many sources of long delays within ztest, such as device opens, closes, etc. which in combination, may delay MMP writes too long and cause MMP to suspend the pool. Some of these delays also affect real pools, and should be fixed. That is being worked separately. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7776
*	Add support for selecting encryption backend	Nathan Lewis	2018-08-02	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Add two new module parameters to icp (icp_aes_impl, icp_gcm_impl) that control the crypto implementation. At the moment there is a choice between generic and aesni (on platforms that support it). - This enables support for AES-NI and PCLMULQDQ-NI on AMD Family 15h (bulldozer) and newer CPUs (zen). - Modify aes_key_t to track what implementation it was generated with as key schedules generated with various implementations are not necessarily interchangable. Reviewed by: Gvozden Neskovic <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: Nathaniel R. Lewis <[email protected]> Closes #7102 Closes #7103
*	OpenZFS 9580 - Add a hash-table on top of nvlist to speed-up operations	Serapheim Dimitropoulos	2018-07-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	= Motivation While dealing with another performance issue (see 126118f) we noticed that we spend a lot of time in various places in the kernel when constructing long nvlists. The problem is that when an nvlist is created with the NV_UNIQUE_NAME set (which is the case most of the time), we do a linear search through the whole list to ensure uniqueness for every entry we add. An example of the above scenario can be seen in the following flamegraph, where more than have the time of the zfsdev_ioctl() is spent on constructing nvlists. Flamegraph: https://sdimitro.github.io/img/flame/sdimitro_snap_unmount3.svg Adding a table to speed up lookups will help situations where we just construct an nvlist (like the scenario above), in addition to regular lookups and removals. = What this patch does In this diff we've implemented a hash-table on top of the nvlist code that converts most nvlist operations from O(# number of entries) to O(1)* (the start is for amortized time as the hash-table grows and shrinks depending on the # of entries - plain lookup is strictly O(1)). = Performance Analysis To analyze the performance improvement I just used the setup from the snapshot deletion issue mentioned above in the Motivation section. Basically I created 10K filesystems with one snapshot each and then I just used the API of libZFS_Core to pass down an nvlist of all the snapshots to have them deleted. The reason I used my own driver program was to have clean performance results of what actually happens in the kernel. The flamegraphs and wall clock times mentioned below were gathered from the start to the end of the driver program's run. Between trials the testpool used was completely destroyed, the system was rebooted and the testpool was completely recreated. The reason for this dance was to get consistent results. == Results (before patch): === Sampling Flamegraphs [Trial 1] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A.svg [Trial 2] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A2.svg [Trial 3] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A3.svg === Wall clock times (in seconds) ``` [Trial 4] real 5.3 user 0.4 sys 2.3 [Trial 5] real 8.2 user 0.4 sys 2.4 [Trial 6] real 6.0 user 0.5 sys 2.3 ``` == Results (after patch): === Sampling Flamegraphs [Trial 1] https://sdimitro.github.io/img/flame/DLPX-53417/trial-Ae.svg [Trial 2] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A2e.svg [Trial 3] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A3e.svg === Wall clock times (in seconds) ``` [Trial 4] real 4.9 user 0.0 sys 0.9 [Trial 5] real 3.8 user 0.0 sys 0.9 [Trial 6] real 3.6 user 0.0 sys 0.9 ``` == Analysis The results between the trials are consistent so in this sections I will only talk about the flamegraph results from trial-1 and the wall-clock results from trial-4. From trial-1 we can see that zfs_dev_ioctl() goes from 2,331 to 996 samples counts. Specifically, the samples from fnvlist_add_nvlist() and spa_history_log_nvl() are almost gone (~500 & ~800 to 5 & 5 samples), leaving zfs_ioc_destroy_snaps() to dominate most samples from zfs_dev_ioctl(). From trial-4 we see that the user time dropped to 0 secods. I believe the consistent 0.4 seconds before my patch was applied was due to my driver program constructing the long nvlist of snapshots so it can pass it to the kernel. As for the system time, the effect there is more clear (2.3 down to 0.9 seconds). Porting Notes: * DATA_TYPE_DONTCARE case added to switch in fm_nvprintr() and zpool_do_events_nvprint(). Authored by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Sebastien Roy <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9580 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b5eca7b1 Closes #7748