openzfs/zfs.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Send / Recv Fixes following b52563	Tom Caputi	2017-08-23	2	-36/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes several issues discovered after the encryption patch was merged: * Fixed a bug where encrypted datasets could attempt to receive embedded data records. * Fixed a bug where dirty records created by the recv code wasn't properly setting the dr_raw flag. * Fixed a typo where a dmu_tx_commit() was changed to dmu_tx_abort() * Fixed a few error handling bugs unrelated to the encryption patch in dmu_recv_stream() Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #6512 Closes #6524 Closes #6545
*	Fix zfs_ioc_pool_sync should not use fnvlist	Chunwei Chen	2017-08-21	1	-3/+9
\| \| \| \| \| \| \| \| \|	Use fnvlist on user input would allow user to easily panic zfs. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #6529
*	vdev_mirror: kstat observables for preferred vdev	Gvozden Neskovic	2017-08-21	2	-6/+78
\| \| \| \| \| \|	Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Gvozden Neskovic <[email protected]> Closes #6461
*	vdev_mirror: load balancing fixes	Gvozden Neskovic	2017-08-21	2	-34/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vdev_queue: - Track the last position of each vdev, including the io size, in order to detect linear access of the following zio. - Remove duplicate `vq_lastoffset` vdev_mirror: - Correctly calculate the zio offset (signedness issue) - Deprecate `vdev_queue_register_lastoffset()` - Add `VDEV_LABEL_START_SIZE` to zio offset of leaf vdevs Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Gvozden Neskovic <[email protected]> Closes #6461
*	Fix range locking in ZIL commit codepath	LOLi	2017-08-21	3	-12/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr offset and length to the offset and length of the BIO from zvol_write()->zvol_log_write(): these offset and length are later used to take a range lock in zillog->zl_get_data function: zvol_get_data(). Now suppose we have a ZVOL with blocksize=8K and push 4K writes to offset 0: we will only be range-locking 0-4096. This means the ASSERTion we make in dbuf_unoverride() is no longer valid because now dmu_sync() is called from zilog's get_data functions holding a partial lock on the dbuf. Fix this by taking a range lock on the whole block in zvol_get_data(). Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6238 Closes #6315 Closes #6356 Closes #6477
*	Fix remounting snapshots read-write	LOLi	2017-08-17	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \|	It's not enough to preserve/restore MS_RDONLY on the superblock flags to avoid remounting a snapshot read-write: be explicit about our intentions to the VFS layer so the readonly bit is updated correctly in do_remount_sb(). Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6510 Closes #6515
*	Retire legacy test infrastructure	Brian Behlendorf	2017-08-15	4	-1309/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Removed zpios kmod, utility, headers and man page. * Removed unused scripts zpios-profile/, zpios-test/, zpool-config/, smb.sh, zpios-sanity.sh, zpios-survey.sh, zpios.sh, and zpool-create.sh. Removed zfs-script-config.sh.in. When building 'make' generates a common.sh with in-tree path information from the common.sh.in template. This file and sourced by the test scripts and used for in-tree testing, it is not included in the packages. When building packages 'make install' uses the same template to create a new common.sh which is appropriate for the packaging. * Removed unused functions/variables from scripts/common.sh.in. Only minimal path information and configuration environment variables remain. * Removed unused scripts from scripts/ directory. * Remaining shell scripts in the scripts directory updated to cleanly pass shellcheck and added to checked scripts. * Renamed tests/test-runner/cmd/ to tests/test-runner/bin/ to match install location name. * Removed last traces of the --enable-debug-dmu-tx configure options which was retired some time ago. Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6509
*	Add corruption failure option to zinject(8)	Don Brady	2017-08-14	2	-7/+49
\| \| \| \| \| \| \| \| \| \| \|	Added a 'corrupt' error option that will flip a bit in the data after a read operation. This is useful for generating checksum errors at the device layer (in a mirror config for example). It is also used to validate the diagnosis of checksum errors from the zfs diagnosis engine. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #6345
*	Native Encryption for ZFS on Linux	Tom Caputi	2017-08-14	42	-875/+8531
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #494 Closes #5769
*	Fix NULL pointer when O_SYNC read in snapshot	Chunwei Chen	2017-08-11	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	When doing read on a file open with O_SYNC, it will trigger zil_commit. However for snapshot, there's no zil, so we shouldn't be doing that. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #6478 Closes #6494
*	Allow longer SPA names in stats	gaurkuma	2017-08-11	2	-15/+21
\| \| \| \| \| \| \| \| \| \| \| \|	The pool name can be 256 chars long. Today, in /proc/spl/kstat/zfs/ the name is limited to < 32 characters. This change is to allows bigger pool names. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: gaurkuma <[email protected]> Closes #6481
*	Simplify threads, mutexs, cvs and rwlocks	Brian Behlendorf	2017-08-11	6	-12/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Simplify threads, mutexs, cvs and rwlocks * Update the zk_thread_create() function to use the same trick as Illumos. Specifically, cast the new pthread_t to a void pointer and return that as the kthread_t . This avoids the issues associated with managing a wrapper structure and is safe as long as the callers never attempt to dereference it. Update all function prototypes passed to pthread_create() to match the expected prototype. We were getting away this with before since the function were explicitly cast. * Replaced direct zk_thread_create() calls with thread_create() for code consistency. All consumers of libzpool now use the proper wrappers. * The mutex_held() calls were converted to MUTEX_HELD(). * Removed all mutex_owner() calls and retired the interface. Instead use MUTEX_HELD() which provides the same information and allows the implementation details to be hidden. In this case the use of the pthread_equals() function. * The kthread_t, kmutex_t, krwlock_t, and krwlock_t types had any non essential fields removed. In the case of kthread_t and kcondvar_t they could be directly typedef'd to pthread_t and pthread_cond_t respectively. * Removed all extra ASSERTS from the thread, mutex, rwlock, and cv wrapper functions. In practice, pthreads already provides the vast majority of checks as long as we check the return code. Removing this code from our wrappers help readability. * Added TS_JOINABLE state flag to pass to request a joinable rather than detached thread. This isn't a standard thread_create() state but it's the least invasive way to pass this information and is only used by ztest. TEST_ZTEST_TIMEOUT=3600 Chunwei Chen <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4547 Closes #5503 Closes #5523 Closes #6377 Closes #6495
*	zio_dva_throttle_done() should allow zinjected ZIO	sanjeevbagewadi	2017-08-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If fault injection is enabled, the ZIO_FLAG_IO_RETRY could be set by zio_handle_device_injection() to generate the FMA events and update stats. Hence, ignore the flag and process such zios. A better fix would be to add another flag in the zio_t to indicate that the zio is failed because of a zinject rule. However, considering the fact that we do this in debug bits, we could do with the crude check using the global flag zio_injection_enabled which is set to 1 when zinject records are added. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Sanjeev Bagewadi <[email protected]> Closes #6383 Closes #6384
*	Add libtpool (thread pools)	Brian Behlendorf	2017-08-09	9	-97/+133
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	OpenZFS provides a library called tpool which implements thread pools for user space applications. Porting this library means the zpool utility no longer needs to borrow the kernel mutex and taskq interfaces from libzpool. This code was updated to use the tpool library which behaves in a very similar fashion. Porting libtpool was relatively straight forward and minimal modifications were needed. The core changes were: * Fully convert the library to use pthreads. * Updated signal handling. * lmalloc/lfree converted to calloc/free * Implemented portable pthread_attr_clone() function. Finally, update the build system such that libzpool.so is no longer linked in to zfs(8), zpool(8), etc. All that is required is libzfs to which the zcommon soures were added (which is the way it always should have been). Removing the libzpool dependency resulted in several build issues which needed to be resolved. * Moved zfeature support to module/zcommon/zfeature_common.c * Moved ratelimiting to to module/zfs/zfs_ratelimit.c * Moved get_system_hostid() to lib/libspl/gethostid.c * Removed use of cmn_err() in zcommon source * Removed dprintf_setup() call from zpool_main.c and zfs_main.c * Removed highbit() and lowbit() * Removed unnecessary library dependencies from Makefiles * Removed fletcher-4 kstat in user space * Added sha2 support explicitly to libzfs * Added highbit64() and lowbit64() to zpool_util.c Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6442
*	zv_suspend_lock in zvol_open()/zvol_release()	Boris Protopopov	2017-08-09	1	-23/+41
\| \| \| \| \| \| \|	Acquire zv_suspend_lock on first open and last close only. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Boris Protopopov <[email protected]> Closes #6342
*	Add debug log entries for failed receive records	Ned Bass	2017-08-08	1	-7/+100
\| \| \| \| \| \| \| \| \| \| \|	Log contents of a receive record if an error occurs while writing it out to the pool. This may help determine the cause when backup streams are rejected as invalid. Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #6465
*	Fix dnode allocation race	Brian Behlendorf	2017-08-08	5	-66/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When performing concurrent object allocations using the new multi-threaded allocator and large dnodes it's possible to allocate overlapping large dnodes. This case should have been handled by detecting an error returned by dnode_hold_impl(). But that logic only checked the returned dnp was not-NULL, and the dnp variable was not reset to NULL when retrying. Resolve this issue by properly checking the return value of dnode_hold_impl(). Additionally, it was possible that dnode_hold_impl() would misreport a dnode as free when it was in fact in use. This could occurs for two reasons: * The per-slot zrl_lock must be held over the entire critical section which includes the alloc/free until the new dnode is assigned to children_dnodes. Additionally, all of the zrl_lock's in the range must be held to protect moving dnodes. * The dn->dn_ot_type cannot be solely relied upon to check the type. When allocating a new dnode its type will be DMU_OT_NONE after dnode_create(). Only latter when dnode_allocate() is called will it transition to the new type. This means there's a window when allocating where it can mistaken for a free dnode. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Ned Bass <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6414 Closes #6439
*	Use SET_ERROR for constant non-zero return codes	Ned Bass	2017-08-02	18	-46/+46
\| \| \| \| \| \| \| \| \| \| \| \| \|	Update many return and assignment statements to follow the convention of using the SET_ERROR macro when returning a hard-coded non-zero value from a function. This aids debugging by recording the error codes in the debug log. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #6441
*	Only record zio->io_delay on reads and writes	Tony Hutter	2017-08-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While investigating https://github.com/zfsonlinux/zfs/issues/6425 I noticed that ioctl ZIOs were not setting zio->io_delay correctly. They would set the start time in zio_vdev_io_start(), but never set the end time in zio_vdev_io_done(), since ioctls skip it and go straight to zio_done(). This was causing spurious "delayed IO" events to appear, which would eventually get rate-limited and displayed as "Missed events" messages in zed. To get around the problem, this patch only sets zio->io_delay for read and write ZIOs, since that's all we care about anyway. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #6425 Closes #6440
*	Fix volmode=none property behavior at import time	LOLi	2017-07-31	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \|	At import time spa_import() calls zvol_create_minors() directly: with the current implementation we have no way to avoid device node creation when volmode=none. Fix this by enforcing volmode=none directly in zvol_alloc(). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6426
*	zfs promote\|rename .../%recv should be an error	LOLi	2017-07-28	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we are in the middle of an incremental 'zfs receive', the child .../%recv will exist. If we run 'zfs promote' .../%recv, it will "work", but then zfs gets confused about the status of the new dataset. Attempting to do this promote should be an error. Similarly renaming .../%recv datasets should not be allowed. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #4843 Closes #6339
*	OpenZFS 7915 - checks in l2arc_evict could use some cleaning up	Andriy Gapon	2017-07-28	1	-15/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Andriy Gapon <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Prakash Surya <[email protected]> Approved by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/7915 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/836a00c Closes #6375
*	OpenZFS 8373 - TXG_WAIT in ZIL commit path	Andriy Gapon	2017-07-28	1	-1/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Andriy Gapon <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Dan McDonald <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8373 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7f04961 Closes #6403
*	OpenZFS 8508 - Mounting a zpool on 32-bit platforms panics	Giuseppe Di Natale	2017-07-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Justin Hibbits <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Approved by: Dan McDonald <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8508 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/15fc257 Closes #6404
*	Add line info and SET_ERROR() to ZFS debug log	Ned Bass	2017-07-25	1	-7/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Redefine the SET_ERROR macro in terms of __dprintf() so the error return codes get logged as both tracepoint events (if tracepoints are enabled) and as ZFS debug log entries. This also allows us to use the same definition of SET_ERROR() in kernel and user space. Define a new debug flag ZFS_DEBUG_SET_ERROR=512 that may be bitwise or'd into zfs_flags. Setting this flag enables both dprintf() and SET_ERROR() messages in the debug log. That is, setting ZFS_DEBUG_SET_ERROR and ZFS_DEBUG_DPRINTF\|ZFS_DEBUG_SET_ERROR are equivalent (this was done for sake of simplicity). Leaving ZFS_DEBUG_SET_ERROR unset suppresses the SET_ERROR() messages which helps avoid cluttering up the logs. To enable SET_ERROR() logging, run: echo 1 > /sys/module/zfs/parameters/zfs_dbgmsg_enable echo 512 > /sys/module/zfs/parameters/zfs_flags Remove the zfs_set_error_class tracepoints event class since SET_ERROR() now uses __dprintf(). This sacrifices a bit of granularity when selecting individual tracepoint events to enable but it makes the code simpler. Include file, function, and line number information in debug log entries. The information is now added to the message buffer in __dprintf() and as a result the zfs_dprintf_class tracepoints event class was changed from a 4 parameter interface to a single parameter. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #6400
*	Some additional send stream validity checking	Ned Bass	2017-07-25	1	-8/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check in the DMU whether an object record in a send stream being received contains an unsupported dnode slot count, and return an error if it does. Failure to catch an unsupported dnode slot count would result in a panic when the SPA attempts to increment the reference count for the large_dnode feature and the pool has the feature disabled. This is not normally an issue for a well-formed send stream which would have the DMU_BACKUP_FEATURE_LARGE_DNODE flag set if it contains large dnodes, so it will be rejected as unsupported if the required feature is disabled. This change adds a missing object record field validation. Add missing stream feature flag checks in dmu_recv_resume_begin_check(). Consolidate repetitive comment blocks in dmu_recv_begin_check(). Update zstreamdump to print the dnode slot count (dn_slots) for an object record when running in verbose mode. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #6396
*	Fix 'zpool clear' on suspended pools	Brian Behlendorf	2017-07-25	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'zpool clear' should be able to resume I/O on suspended, but otherwise healthy, pools. 4a283c7 accidentally introduced a new code path where we call txg_wait_synced() on the suspended pool before we had the chance to resume I/O via zio_resume(): this results in the 'zpool clear' command hanging indefinitely, waiting for a TXG that cannot be synced. Fix this by avoiding the call to txg_wait_synced(). Reviewed-by: George Melikov <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6399
*	Report MMP_STATE_NO_HOSTID immediately	Olaf Faaland	2017-07-25	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	There is no need to perform the activity check before detecting that the user must set the system hostid, because the pool's multihost property is on, but spa_get_hostid() returned 0. The initial call to vdev_uberblock_load() provided the information required. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6388
*	Add callback for zfs_multihost_interval	Olaf Faaland	2017-07-25	1	-1/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a callback to wake all running mmp threads when zfs_multihost_interval is changed. This is necessary when the interval is changed from a very large value to a significantly lower one, while pools are imported that have the multihost property enabled. Without this commit, the mmp thread does not wake up and detect the new interval until after it has waited the old multihost interval time. A user monitoring mmp writes via the provided kstat would be led to believe that the changed setting did not work. Added a test in the ZTS under mmp to verify the new functionality is working. Added a test to ztest which starts and stops mmp threads, and calls into the code to signal sleeping mmp threads, to test for deadlocks or similar locking issues. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6387
*	Release SCL_STATE in map_write_done()	Olaf Faaland	2017-07-25	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The config lock must be held for the duration of the MMP write. Since the I/Os are executed via map_nowait(), the done function is the only place where we know the write has completed. Since SCL_STATE is taken as reader, overlapping I/Os do not create a deadlock. The refcount is simply increased when new I/Os are queued and decreased when I/Os complete. Test case added which exercises the probe IO call path to verify the fix and prevent a regression. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6394
*	Revert Fix vdev_probe() call wrt SCL_STATE_ALL	Olaf Faaland	2017-07-25	1	-1/+1
\| \| \| \| \| \| \| \| \|	This reverts commit cc9c6bc, which has been causing intermittent test failures on buildbot. A correct fix for this locking issue has been applied in a separate patch. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]>
*	Fix buffer overflow in dsl_dataset_name()	LOLi	2017-07-24	1	-1/+8
\| \| \| \| \| \| \| \| \| \|	If we're creating a pool with version >= SPA_VERSION_DSL_SCRUB (v11) we need to account for additional space needed by the origin dataset which will also be snapshotted: "poolname"+"/"+"$ORIGIN"+"@"+"$ORIGIN". Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6374
*	Use correct macro for hz in mmp.c	Olaf Faaland	2017-07-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 379ca9c Multi-modifier protection (MMP) used HZ to convert nanoseconds to ticks for use with cv_timedwait() and ddi_get_lbolt(). The correct macro is hz, which is defined within the SPL for kernel space, and within zfs_context.h for user space. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6357 Closes #6360
*	Fix coverity defects: CID 165755	Giuseppe Di Natale	2017-07-24	2	-3/+3
\| \| \| \| \| \| \| \| \|	CID 165755: Division or modulo by zero (DIVIDE_BY_ZERO) Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #6352
*	Linux 4.13 compat: bio->bi_status and blk_status_t	Brian Behlendorf	2017-07-23	1	-5/+6
\| \| \| \| \| \| \| \| \|	Commit torvalds/linux@4e4cbee9. The bio->bi_error field was replaced with bio->bi_status which is an enum that describes all possible error types. Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6351
*	Fix vdev_probe() call outside SCL_STATE_ALL lock	Brian Behlendorf	2017-07-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an IO fails then zio_vdev_io_done() can call vdev_probe() to determine the health of the vdev. This is safe as long as the original zio was submitted with zio_wait() and holds the SCL_STATE_ALL lock over the operation. If zio_no_wait() was used then the done callback will submit the probe IO outside the SCL_STATE_ALL lock and hit this ASSERT in zio_create() ASSERT(!vd \|\| spa_config_held(spa, SCL_STATE_ALL, RW_READER)); Resolve the issue by only allowing vdev_probe() to be called when there's a waiter indicating the caller is using zio_wait(). This assumes that caller is still holding SCL_STATE_ALL. This issue isn't MMP specific but was surfaced when testing. Without this patch it can be reproduced by running: zpool set multihost on <pool> zinject -d <vdev> -e io -T write -f 50 <pool> -L uber Reviewed-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #745 Closes #6279
*	Multi-modifier protection (MMP)	Olaf Faaland	2017-07-13	11	-60/+1001
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add multihost=on\|off pool property to control MMP. When enabled a new thread writes uberblocks to the last slot in each label, at a set frequency, to indicate to other hosts the pool is actively imported. These uberblocks are the last synced uberblock with an updated timestamp. Property defaults to off. During tryimport, find the "best" uberblock (newest txg and timestamp) repeatedly, checking for change in the found uberblock. Include the results of the activity test in the config returned by tryimport. These results are reported to user in "zpool import". Allow the user to control the period between MMP writes, and the duration of the activity test on import, via a new module parameter zfs_multihost_interval. The period is specified in milliseconds. The activity test duration is calculated from this value, and from the mmp_delay in the "best" uberblock found initially. Add a kstat interface to export statistics about Multiple Modifier Protection (MMP) updates. Include the last synced txg number, the timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV label that received the last MMP update, and the VDEV path. Abbreviated output below. $ cat /proc/spl/kstat/zfs/mypool/multihost 31 0 0x01 10 880 105092382393521 105144180101111 txg timestamp mmp_delay vdev_guid vdev_label vdev_path 20468 261337 250274925 68396651780 3 /dev/sda 20468 261339 252023374 6267402363293 1 /dev/sdc 20468 261340 252000858 6698080955233 1 /dev/sdx 20468 261341 251980635 783892869810 2 /dev/sdy 20468 261342 253385953 8923255792467 3 /dev/sdd 20468 261344 253336622 042125143176 0 /dev/sdab 20468 261345 253310522 1200778101278 2 /dev/sde 20468 261346 253286429 0950576198362 2 /dev/sdt 20468 261347 253261545 96209817917 3 /dev/sds 20468 261349 253238188 8555725937673 3 /dev/sdb Add a new tunable zfs_multihost_history to specify the number of MMP updates to store history for. By default it is set to zero meaning that no MMP statistics are stored. When using ztest to generate activity, for automated tests of the MMP function, some test functions interfere with the test. For example, the pool is exported to run zdb and then imported again. Add a new ztest function, "-M", to alter ztest behavior to prevent this. Add new tests to verify the new functionality. Tests provided by Giuseppe Di Natale. Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Ned Bass <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #745 Closes #6279
*	OpenZFS 6939 - add sysevents to zfs core for commands	Dave Eddy	2017-07-12	6	-49/+182
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Dave Eddy <[email protected]> Reviewed by: Patrick Mooney <[email protected]> Reviewed by: Joshua M. Clulow <[email protected]> Reviewed by: Josh Wilsdon <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Alan Somers <[email protected]> Reviewed by: Andrew Stormont <[email protected]> Approved by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/6939 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ce1577b Closes #6328
*	Add port of FreeBSD 'volmode' property	LOLi	2017-07-12	3	-13/+188
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The volmode property may be set to control the visibility of ZVOL block devices. This allow switching ZVOL between three modes: full - existing fully functional behaviour (default) dev - hide partitions on ZVOL block devices none - not exposing volumes outside ZFS Additionally the new zvol_volmode module parameter can be used to control the default behaviour. This functionality can be used, for instance, on "backup" pools to avoid cluttering /dev with unneeded zd* devices. Original-patch-by: mav <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: loli10K <[email protected]> Signed-off-by: loli10K <[email protected]> FreeBSD-commit: https://github.com/freebsd/freebsd/commit/dd28e6bb Closes #1796 Closes #3438 Closes #6233
*	OpenZFS 5428 - provide fts(), reallocarray(), and strtonum()	Yuri Pankov	2017-07-08	6	-13/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Yuri Pankov <[email protected]> Reviewed by: Robert Mustacchi <[email protected]> Approved by: Joshua M. Clulow <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Ported-by: Brian Behlendorf <[email protected]> Porting Notes: * All hunks unrelated to ZFS were dropped. OpenZFS-issue: https://www.illumos.org/issues/5428 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4585130 Closes #6326
*	OpenZFS 8126 - ztest assertion failed in dbuf_dirty due to dn_nlevels changing	Matthew Ahrens	2017-07-07	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sync thread is concurrently modifying dn_phys->dn_nlevels while dbuf_dirty() is trying to assert something about it, without holding the necessary lock. We need to move this assertion further down in the function, after we have acquired the dn_struct_rwlock. Authored by: Matthew Ahrens <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8126 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/0ef125d Closes #6314
*	OpenZFS 8067 - zdb should be able to dump literal embedded block pointer	Matthew Ahrens	2017-07-07	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Alex Reece <[email protected]> Reviewed by: Yuri Pankov <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8067 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8173085 Closes #6319
*	Fix 'zpool clear' on readonly pools	LOLi	2017-07-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Illumos 4080 inadvertently allows 'zpool clear' on readonly pools: fix this by reintroducing a check (POOL_CHECK_READONLY) in zfs_ioc_clear registration code. Signed-off-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6306
*	Implemented zpool scrub pause/resume	Alek P	2017-07-06	4	-34/+166
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, there is no way to pause a scrub. Pausing may be useful when the pool is busy with other I/O to preserve bandwidth. This patch adds the ability to pause and resume scrubbing. This is achieved by maintaining a persistent on-disk scrub state. While the state is 'paused' we do not scrub any more blocks. We do however perform regular scan housekeeping such as freeing async destroyed and deadlist blocks while paused. Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> Closes #6167
*	Reschedule processes on -ERESTARTSYS	Arkadiusz Bubała	2017-07-06	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	On the single core machine the system may hang when the spa_namespare_lock acquisition fails in the zvol_first_open function. It returns -ERESTARTSYS error what causes the endless loop in __blkdev_get function. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Arkadiusz Bubała <[email protected]> Closes #6283 Closes #6312
*	Clang fixes	alaviss	2017-07-05	2	-10/+10
\| \| \| \| \| \| \| \| \|	Clang doesn't support `/` as comment in assembly, this patch replaces them with `#`. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Leorize <[email protected]> Closes #6311
*	OpenZFS 8378 - crash due to bp in-memory modification of nopwrite block	Matthew Ahrens	2017-07-04	3	-32/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Matthew Ahrens <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> The problem is that zfs_get_data() supplies a stale zgd_bp to dmu_sync(), which we then nopwrite against. zfs_get_data() doesn't hold any DMU-related locks, so after it copies db_blkptr to zgd_bp, dbuf_write_ready() could change db_blkptr, and dbuf_write_done() could remove the dirty record. dmu_sync() then sees the stale BP and that the dbuf it not dirty, so it is eligible for nop-writing. The fix is for dmu_sync() to copy db_blkptr to zgd_bp after acquiring the db_mtx. We could still see a stale db_blkptr, but if it is stale then the dirty record will still exist and thus we won't attempt to nopwrite. OpenZFS-issue: https://www.illumos.org/issues/8378 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3127742 Closes #6293
*	OpenZFS 7600 - zfs rollback should pass target snapshot to kernel	Andriy Gapon	2017-07-04	2	-6/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Andriy Gapon <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> The existing kernel-side code only provides a method to rollback to a latest snapshot, whatever it happens to be at the time when the rollback is actually done. That could be unsafe or confusing in environments where concurrent DSL changes are possible as the resulting state could correspond to a newer or older snapshot than the originally requested one. This change allows to amend that method such that the rollback is performed only when the latest snapshot has a specific name. That is, if a new snapshot is concurrently created or the target snapshot is destroyed, then no rollback is done and EXDEV error is returned. New libzfs_core function lzc_rollback_to() is provided for the new functionality. libzfs is changed to use lzc_rollback_to() to implement zfs rollback command. Perhaps we should return different errors to distinguish the case where the desired snapshot exists but it's not the latest snapshot and the case where the desired snapshot does not exist. OpenZFS-issue: https://www.illumos.org/issues/7600 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3d645eb Closes #6292
*	OpenZFS 7910 - l2arc_write_buffers() may write beyond target_sz	Andriy Gapon	2017-07-04	1	-44/+44
\| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Andriy Gapon <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/7910 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/cb6af4b Closes #6291
*	OpenZFS 8377 - Panic in bookmark deletion	Matthew Ahrens	2017-06-30	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Matthew Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> The problem is that when dsl_bookmark_destroy_check() is executed from open context (the pre-check), it fills in dbda_success based on the existence of the bookmark. But the bookmark (or containing filesystem as in this case) can be destroyed before we get to syncing context. When we re-run dsl_bookmark_destroy_check() in syncing context, it will not add the deleted bookmark to dbda_success, intending for dsl_bookmark_destroy_sync() to not process it. But because the bookmark is still in dbda_success from the open-context call, we do try to destroy it. The fix is that dsl_bookmark_destroy_check() should not modify dbda_success when called from open context. OpenZFS-issue: https://www.illumos.org/issues/8377 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b0b6fe3 Closes #6286