openzfs/zfs.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Allow 'zpool events' filtering by pool name	LOLi	2017-10-26	1	-2/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Additionally add four new tests: * zpool_events_clear: verify 'zpool events -c' functionality * zpool_events_cliargs: verify command line options and arguments * zpool_events_follow: verify 'zpool events -f' * zpool_events_poolname: verify events filtering by pool name Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #3285 Closes #6762
*	Added no_scrub_restart flag to zpool reopen	Arkadiusz Bubała	2017-10-26	2	-22/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added -n flag to zpool reopen that allows a running scrub operation to continue if there is a device with Dirty Time Log. By default if a component device has a DTL and zpool reopen is executed all running scan operations will be restarted. Added functional tests for `zpool reopen` Tests covers following scenarios: * `zpool reopen` without arguments, * `zpool reopen` with pool name as argument, * `zpool reopen` while scrubbing, * `zpool reopen -n` while scrubbing, * `zpool reopen -n` while resilvering, * `zpool reopen` with bad arguments. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Arkadiusz Bubała <[email protected]> Closes #6076 Closes #6746
*	arcstat: flush stdout / outfile after each line	Fabian-Gruenbichler	2017-10-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Otherwise, if arcstat gets interrupted before the desired number of iterations is reached, the output file will be empty (both if set via '-o' or via shell redirection). Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Fabian Grünbichler <[email protected]> Closes #6775
*	Ensure arc_size_break is filled in arc_summary.py	Giuseppe Di Natale	2017-10-23	1	-22/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	Use mfu_size and mru_size pulled from the arcstats kstat file to calculate the mfu and mru percentages for arc size breakdown. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: AndCycle <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #5526 Closes #6770
*	Correct flake8 errors after STYLE builder update	Giuseppe Di Natale	2017-10-23	1	-1/+1
\| \| \| \| \| \| \| \| \|	Fix new flake8 errors related to bare excepts and ambiguous variable names due to a STYLE builder update. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #6776
*	Use ashift=12 by default on SSDSC2BW48 disks	adisbladis	2017-10-23	1	-0/+1
\| \| \| \| \| \| \| \| \|	Currently the 480GB models of this disk do not use ashift=12 by default. SSDSC2BW48 is also optimized for 4k blocks. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: adisbladis <[email protected]> Closes #6774
*	Fix coverity defects: CID 161388	Tobin Harding	2017-10-17	1	-2/+4
\| \| \| \| \| \| \| \| \| \|	CID 161388: Resource Leak (REASOURCE_LEAK) Jump to errout so that file descriptor gets closed before returning from function. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tobin C. Harding <[email protected]> Closes #6755
*	Fix coverity defects: 147480, 147584	Tobin Harding	2017-10-16	2	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CID 147480: Logically dead code (DEADCODE) Remove non-null check and subsequent function call. Add ASSERT to future proof the code. usage label is only jumped to before `zhp` is initialized. CID 147584: Out-of-bounds access (OVERRUN) Subtract length of current string from buffer length for `size` argument to `snprintf`. Starting address for the write is the start of the buffer + the current string length. We need to subtract this string length else risk a buffer overflow. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tobin C. Harding <[email protected]> Closes #6745
*	Fixes for #6639	Tom Caputi	2017-10-11	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Several issues were uncovered by running stress tests with zfs encryption and raw sends in particular. The issues and their associated fixes are as follows: * arc_read_done() has the ability to chain several requests for the same block of data via the arc_callback_t struct. In these cases, the ARC would only use the first request's dsobj from the bookmark to decrypt the data. This is problematic because the first request might be a prefetch zio which is able to handle the key not being loaded, while the second might use a different key that it is sure will work. The fix here is to pass the dsobj with each individual arc_callback_t so that each request can attempt to decrypt the data separately. * DRR_FREE and DRR_FREEOBJECT records in a send file were not having their transactions properly tagged as raw during raw sends, which caused a panic when the dbuf code attempted to decrypt these blocks. * traverse_prefetch_metadata() did not properly set ZIO_FLAG_SPECULATIVE when issuing prefetch IOs. * Added a few asserts and code cleanups to ensure these issues are more detectable in the future. Signed-off-by: Tom Caputi <[email protected]>
*	Encryption patch follow-up	Tom Caputi	2017-10-11	3	-31/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* PBKDF2 implementation changed to OpenSSL implementation. * HKDF implementation moved to its own file and tests added to ensure correctness. * Removed libzfs's now unnecessary dependency on libzpool and libicp. * Ztest can now create and test encrypted datasets. This is currently disabled until issue #6526 is resolved, but otherwise functions as advertised. * Several small bug fixes discovered after enabling ztest to run on encrypted datasets. * Fixed coverity defects added by the encryption patch. * Updated man pages for encrypted send / receive behavior. * Fixed a bug where encrypted datasets could receive DRR_WRITE_EMBEDDED records. * Minor code cleanups / consolidation. Signed-off-by: Tom Caputi <[email protected]>
*	Use bitwise '&' instead of logical '&&'	Tobin Harding	2017-10-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make two instances of the same change. Change bitwise AND (&) to logical AND (&&). Currently the code uses a bitwise AND between two boolean values. In the first instance; The first operand is a flag that has been bitwise combined with a bit mask to get a boolean value as to whether a file has group write permissions set. The second operand used is a struct member that is intended as a boolean flag not a bit mask. In the second instance the argument is the same except with world write permissions instead of group write (S_IWOTH, S_IWGRP). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Signed-off-by: Tobin C. Harding <[email protected]> Closes #6684 Closes #6722
*	vdev_id: extension for new scsi topology	Simon Guest	2017-09-27	1	-2/+116
\| \| \| \| \| \| \| \| \| \|	On systems with SCSI rather than SAS disk topology, this change enables the vdev_id script to match against the block device path, and therefore create a vdev alias in /dev/disk/by-vdev. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Simon Guest <[email protected]> Closes #6592
*	Fix some ZFS Test Suite issues	LOLi	2017-09-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add 'zfs bookmark' coverage (zfs_bookmark_cliargs) * Add OpenZFS 8166 coverage (zpool_scrub_offline_device) * Fix "busy" zfs_mount_remount failures * Fix bootfs_003_pos, bootfs_004_neg, zdb_005_pos local cleanup * Update usage of $KEEP variable, add get_all_pools() function * Enable history_008_pos and rsend_019_pos (non-32bit builders) * Enable zfs_copies_005_neg, update local cleanup * Fix zfs_send_007_pos (large_dnode + OpenZFS 8199) * Fix rollback_003_pos (use dataset name, not mountpoint, to unmount) * Update default_raidz_setup() to work properly with more than 3 disks * Use $TEST_BASE_DIR instead of hardcoded (/var)/tmp for file VDEVs * Update usage of /dev/random to /dev/urandom Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Issue #6086 Closes #5658 Closes #6143 Closes #6421 Closes #6627 Closes #6632
*	Fix ZTS MMP tests and ztest -M behavior	Olaf Faaland	2017-09-23	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Quote "$MMP_IMPORT_MSG" when it is passed as an argument, as it is a multi-word string. Some tests were passing when they should not have, because the grep was only testing for the first word. Correct the message expected when no hostid is set and the test attempts to enable multihost. It did not match the actual output in that situation. Disable ztest_reguid() when ztest is invoked with the -M option. If ztest performs a reguid, a concurrent import attempt may fail with the error "one or more devices is currently unavailable" if the guid sum is calculated on the original device guids but compared against the guid sum ztest wrote based on the new device guids. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6666
*	Remove FRU and LIBTOPO Support	David Quigley	2017-09-18	2	-205/+10
\| \| \| \| \| \| \| \| \|	FRU and LIBTOPO support are illumos only features that will not be ported to Linux and make the code more complicated than necessary. This commit makes way for further cleanups of the zed/FMA code. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: David Quigley <[email protected]> Closes #6641
*	ZTEST: Always enable asserts	David Quigley	2017-09-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	The build for ztest always enabled debug information but does not enable asserts unless --enable-debug is used. This will always enable asserts in the ztest code. Reviewed-by: George Melikov <[email protected]> Reviewed-by Brian Behlendorf <[email protected]> Signed-off-by: David Quigley <[email protected]> Closes #6640
*	Add -vnP support to 'zfs send' for bookmarks	LOLi	2017-09-08	1	-15/+4
\| \| \| \| \| \| \| \| \|	This leverages the functionality introduced in cf7684b to expose verbose, dry-run and parsable 'zfs send' options for bookmarks. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #3666 Closes #6601
*	Improved dnode allocation and dmu_hold_impl()	Olaf Faaland	2017-09-05	1	-7/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactor dmu_object_alloc_dnsize() and dnode_hold_impl() to simplify the code, fix errors introduced by commit dbeb879 (PR #6117) interacting badly with large dnodes, and improve performance. * When allocating a new dnode in dmu_object_alloc_dnsize(), update the percpu object ID for the core's metadnode chunk immediately. This eliminates most lock contention when taking the hold and creating the dnode. * Correct detection of the chunk boundary to work properly with large dnodes. * Separate the dmu_hold_impl() code for the FREE case from the code for the ALLOCATED case to make it easier to read. * Fully populate the dnode handle array immediately after reading a block of the metadnode from disk. Subsequently the dnode handle array provides enough information to determine which dnode slots are in use and which are free. * Add several kstats to allow the behavior of the code to be examined. * Verify dnode packing in large_dnode_008_pos.ksh. Since the test is purely creates, it should leave very few holes in the metadnode. * Add test large_dnode_009_pos.ksh, which performs concurrent creates and deletes, to complement existing test which does only creates. With the above fixes, there is very little contention in a test of about 200,000 racing dnode allocations produced by tests 'large_dnode_008_pos' and 'large_dnode_009_pos'. name type data dnode_hold_dbuf_hold 4 0 dnode_hold_dbuf_read 4 0 dnode_hold_alloc_hits 4 3804690 dnode_hold_alloc_misses 4 216 dnode_hold_alloc_interior 4 3 dnode_hold_alloc_lock_retry 4 0 dnode_hold_alloc_lock_misses 4 0 dnode_hold_alloc_type_none 4 0 dnode_hold_free_hits 4 203105 dnode_hold_free_misses 4 4 dnode_hold_free_lock_misses 4 0 dnode_hold_free_lock_retry 4 0 dnode_hold_free_overflow 4 0 dnode_hold_free_refcount 4 57 dnode_hold_free_txg 4 0 dnode_allocate 4 203154 dnode_reallocate 4 0 dnode_buf_evict 4 23918 dnode_alloc_next_chunk 4 4887 dnode_alloc_race 4 0 dnode_alloc_next_block 4 18 The performance is slightly improved for concurrent creates with 16+ threads, and unchanged for low thread counts. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #5396 Closes #6522 Closes #6414 Closes #6564
*	Add support for DMU_OTN_* types in dbufstat.py	LOLi	2017-08-22	1	-4/+29
\| \| \| \| \| \| \|	Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6535
*	Fix range locking in ZIL commit codepath	LOLi	2017-08-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr offset and length to the offset and length of the BIO from zvol_write()->zvol_log_write(): these offset and length are later used to take a range lock in zillog->zl_get_data function: zvol_get_data(). Now suppose we have a ZVOL with blocksize=8K and push 4K writes to offset 0: we will only be range-locking 0-4096. This means the ASSERTion we make in dbuf_unoverride() is no longer valid because now dmu_sync() is called from zilog's get_data functions holding a partial lock on the dbuf. Fix this by taking a range lock on the whole block in zvol_get_data(). Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6238 Closes #6315 Closes #6356 Closes #6477
*	Retire legacy test infrastructure	Brian Behlendorf	2017-08-15	6	-1296/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Removed zpios kmod, utility, headers and man page. * Removed unused scripts zpios-profile/, zpios-test/, zpool-config/, smb.sh, zpios-sanity.sh, zpios-survey.sh, zpios.sh, and zpool-create.sh. Removed zfs-script-config.sh.in. When building 'make' generates a common.sh with in-tree path information from the common.sh.in template. This file and sourced by the test scripts and used for in-tree testing, it is not included in the packages. When building packages 'make install' uses the same template to create a new common.sh which is appropriate for the packaging. * Removed unused functions/variables from scripts/common.sh.in. Only minimal path information and configuration environment variables remain. * Removed unused scripts from scripts/ directory. * Remaining shell scripts in the scripts directory updated to cleanly pass shellcheck and added to checked scripts. * Renamed tests/test-runner/cmd/ to tests/test-runner/bin/ to match install location name. * Removed last traces of the --enable-debug-dmu-tx configure options which was retired some time ago. Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6509
*	vdev_id: implement slot numbering by port id	sckobras	2017-08-14	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	With HPE hardware and hpsa-driven SAS adapters, only a single phy is reported, but no individual per-port phys (ie. no phy* entry below port_dir), which breaks topology detection in the current sas_handler code. Instead, slot information can be derived directly from the port number. This change implements a new slot keyword "port" similar to "id" and "lun", and assumes a default phy/port of 0 if no individual phy entry can be found. It allows to use the "sas_direct" topology with current HPE Dxxxx and Apollo 45xx JBODs. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Daniel Kobras <[email protected]> Closes #6484
*	Add corruption failure option to zinject(8)	Don Brady	2017-08-14	1	-10/+24
\| \| \| \| \| \| \| \| \| \| \|	Added a 'corrupt' error option that will flip a bit in the data after a read operation. This is useful for generating checksum errors at the device layer (in a mirror config for example). It is also used to validate the diagnosis of checksum errors from the zfs diagnosis engine. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #6345
*	Native Encryption for ZFS on Linux	Tom Caputi	2017-08-14	7	-81/+525
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #494 Closes #5769
*	Simplify threads, mutexs, cvs and rwlocks	Brian Behlendorf	2017-08-11	2	-32/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Simplify threads, mutexs, cvs and rwlocks * Update the zk_thread_create() function to use the same trick as Illumos. Specifically, cast the new pthread_t to a void pointer and return that as the kthread_t . This avoids the issues associated with managing a wrapper structure and is safe as long as the callers never attempt to dereference it. Update all function prototypes passed to pthread_create() to match the expected prototype. We were getting away this with before since the function were explicitly cast. * Replaced direct zk_thread_create() calls with thread_create() for code consistency. All consumers of libzpool now use the proper wrappers. * The mutex_held() calls were converted to MUTEX_HELD(). * Removed all mutex_owner() calls and retired the interface. Instead use MUTEX_HELD() which provides the same information and allows the implementation details to be hidden. In this case the use of the pthread_equals() function. * The kthread_t, kmutex_t, krwlock_t, and krwlock_t types had any non essential fields removed. In the case of kthread_t and kcondvar_t they could be directly typedef'd to pthread_t and pthread_cond_t respectively. * Removed all extra ASSERTS from the thread, mutex, rwlock, and cv wrapper functions. In practice, pthreads already provides the vast majority of checks as long as we check the return code. Removing this code from our wrappers help readability. * Added TS_JOINABLE state flag to pass to request a joinable rather than detached thread. This isn't a standard thread_create() state but it's the least invasive way to pass this information and is only used by ztest. TEST_ZTEST_TIMEOUT=3600 Chunwei Chen <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4547 Closes #5503 Closes #5523 Closes #6377 Closes #6495
*	Add libtpool (thread pools)	Brian Behlendorf	2017-08-09	15	-67/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	OpenZFS provides a library called tpool which implements thread pools for user space applications. Porting this library means the zpool utility no longer needs to borrow the kernel mutex and taskq interfaces from libzpool. This code was updated to use the tpool library which behaves in a very similar fashion. Porting libtpool was relatively straight forward and minimal modifications were needed. The core changes were: * Fully convert the library to use pthreads. * Updated signal handling. * lmalloc/lfree converted to calloc/free * Implemented portable pthread_attr_clone() function. Finally, update the build system such that libzpool.so is no longer linked in to zfs(8), zpool(8), etc. All that is required is libzfs to which the zcommon soures were added (which is the way it always should have been). Removing the libzpool dependency resulted in several build issues which needed to be resolved. * Moved zfeature support to module/zcommon/zfeature_common.c * Moved ratelimiting to to module/zfs/zfs_ratelimit.c * Moved get_system_hostid() to lib/libspl/gethostid.c * Removed use of cmn_err() in zcommon source * Removed dprintf_setup() call from zpool_main.c and zfs_main.c * Removed highbit() and lowbit() * Removed unnecessary library dependencies from Makefiles * Removed fletcher-4 kstat in user space * Added sha2 support explicitly to libzfs * Added highbit64() and lowbit64() to zpool_util.c Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6442
*	Fix zpool events scripted mode tab separator	Sen Haerens	2017-08-03	1	-3/+6
\| \| \| \| \| \| \| \| \|	Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Sen Haerens <[email protected]> Closes #6444 Closes #6445
*	Some additional send stream validity checking	Ned Bass	2017-07-25	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check in the DMU whether an object record in a send stream being received contains an unsupported dnode slot count, and return an error if it does. Failure to catch an unsupported dnode slot count would result in a panic when the SPA attempts to increment the reference count for the large_dnode feature and the pool has the feature disabled. This is not normally an issue for a well-formed send stream which would have the DMU_BACKUP_FEATURE_LARGE_DNODE flag set if it contains large dnodes, so it will be rejected as unsupported if the required feature is disabled. This change adds a missing object record field validation. Add missing stream feature flag checks in dmu_recv_resume_begin_check(). Consolidate repetitive comment blocks in dmu_recv_begin_check(). Update zstreamdump to print the dnode slot count (dn_slots) for an object record when running in verbose mode. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #6396
*	Add callback for zfs_multihost_interval	Olaf Faaland	2017-07-25	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a callback to wake all running mmp threads when zfs_multihost_interval is changed. This is necessary when the interval is changed from a very large value to a significantly lower one, while pools are imported that have the multihost property enabled. Without this commit, the mmp thread does not wake up and detect the new interval until after it has waited the old multihost interval time. A user monitoring mmp writes via the provided kstat would be led to believe that the changed setting did not work. Added a test in the ZTS under mmp to verify the new functionality is working. Added a test to ztest which starts and stops mmp threads, and calls into the code to signal sleeping mmp threads, to test for deadlocks or similar locking issues. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6387
*	Skip activity check for zhack RO import	Olaf Faaland	2017-07-25	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	"zhack feature stat" performs a read-only import, so the MMP activity check is not necessary. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6388 Closes #6389
*	Add zgenhostid utility script	Olaf Faaland	2017-07-25	4	-3/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Turning the multihost property on requires that a hostid be set to allow ZFS to determine when a foreign system is attemping to import a pool. The error message instructing the user to set a hostid refers to genhostid(1). Genhostid(1) is not available on SUSE Linux. This commit adds a script modeled after genhostid(1) for those users. Zgenhostid checks for an /etc/hostid file; if it does not exist, it creates one and stores a value. If the user has provided a hostid as an argument, that value is used. Otherwise, a random hostid is generated and stored. This differs from the CENTOS 6/7 versions of genhostid, which overwrite the /etc/hostid file even though their manpages state otherwise. A man page for zgenhostid is added. The one for genhostid is in (1), but I put zgenhostid in (8) because I believe it's more appropriate. The mmp tests are modified to use zgenhostid to set the hostid instead of using the spl_hostid module parameter. zgenhostid will not replace an existing /etc/hostid file, so new mmp_clear_hostid calls are required. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6358 Closes #6379
*	Fix don't zero_label when replace with spare	Chunwei Chen	2017-07-24	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	When replacing a disk with non-wholedisk spare, we shouldn't zero_label it. The wholedisk case already skip it. In fact, zero_label function will fail saying device busy because it's already opened exclusively, but since there's no error checking, the replace command will succeed, causing great confusion. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #6369
*	Restrict zpool iostat/status -c to search path	Giuseppe Di Natale	2017-07-24	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	zpool iostat/status -c is supposed to be restricted by its search path, but currently isn't. To prevent arbitrary scripts from being executed, disallow '/' from commands. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Ned Bass <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #6353 Closes #6359
*	Fix coverity defects: CID 165757	George Melikov	2017-07-14	1	-0/+1
\| \| \| \| \| \| \| \|	CID 165757: Control flow issues (MISSING_BREAK) Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #6348
*	Multi-modifier protection (MMP)	Olaf Faaland	2017-07-13	4	-175/+268
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add multihost=on\|off pool property to control MMP. When enabled a new thread writes uberblocks to the last slot in each label, at a set frequency, to indicate to other hosts the pool is actively imported. These uberblocks are the last synced uberblock with an updated timestamp. Property defaults to off. During tryimport, find the "best" uberblock (newest txg and timestamp) repeatedly, checking for change in the found uberblock. Include the results of the activity test in the config returned by tryimport. These results are reported to user in "zpool import". Allow the user to control the period between MMP writes, and the duration of the activity test on import, via a new module parameter zfs_multihost_interval. The period is specified in milliseconds. The activity test duration is calculated from this value, and from the mmp_delay in the "best" uberblock found initially. Add a kstat interface to export statistics about Multiple Modifier Protection (MMP) updates. Include the last synced txg number, the timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV label that received the last MMP update, and the VDEV path. Abbreviated output below. $ cat /proc/spl/kstat/zfs/mypool/multihost 31 0 0x01 10 880 105092382393521 105144180101111 txg timestamp mmp_delay vdev_guid vdev_label vdev_path 20468 261337 250274925 68396651780 3 /dev/sda 20468 261339 252023374 6267402363293 1 /dev/sdc 20468 261340 252000858 6698080955233 1 /dev/sdx 20468 261341 251980635 783892869810 2 /dev/sdy 20468 261342 253385953 8923255792467 3 /dev/sdd 20468 261344 253336622 042125143176 0 /dev/sdab 20468 261345 253310522 1200778101278 2 /dev/sde 20468 261346 253286429 0950576198362 2 /dev/sdt 20468 261347 253261545 96209817917 3 /dev/sds 20468 261349 253238188 8555725937673 3 /dev/sdb Add a new tunable zfs_multihost_history to specify the number of MMP updates to store history for. By default it is set to zero meaning that no MMP statistics are stored. When using ztest to generate activity, for automated tests of the MMP function, some test functions interfere with the test. For example, the pool is exported to run zdb and then imported again. Add a new ztest function, "-M", to alter ztest behavior to prevent this. Add new tests to verify the new functionality. Tests provided by Giuseppe Di Natale. Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Ned Bass <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #745 Closes #6279
*	OpenZFS 8067 - zdb should be able to dump literal embedded block pointer	Matthew Ahrens	2017-07-07	1	-3/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Alex Reece <[email protected]> Reviewed by: Yuri Pankov <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8067 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8173085 Closes #6319
*	Implemented zpool scrub pause/resume	Alek P	2017-07-06	1	-15/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, there is no way to pause a scrub. Pausing may be useful when the pool is busy with other I/O to preserve bandwidth. This patch adds the ability to pause and resume scrubbing. This is achieved by maintaining a persistent on-disk scrub state. While the state is 'paused' we do not scrub any more blocks. We do however perform regular scan housekeeping such as freeing async destroyed and deadlist blocks while paused. Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> Closes #6167
*	OpenZFS 8378 - crash due to bp in-memory modification of nopwrite block	Matthew Ahrens	2017-07-04	1	-6/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Matthew Ahrens <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> The problem is that zfs_get_data() supplies a stale zgd_bp to dmu_sync(), which we then nopwrite against. zfs_get_data() doesn't hold any DMU-related locks, so after it copies db_blkptr to zgd_bp, dbuf_write_ready() could change db_blkptr, and dbuf_write_done() could remove the dirty record. dmu_sync() then sees the stale BP and that the dbuf it not dirty, so it is eligible for nop-writing. The fix is for dmu_sync() to copy db_blkptr to zgd_bp after acquiring the db_mtx. We could still see a stale db_blkptr, but if it is stale then the dirty record will still exist and thus we won't attempt to nopwrite. OpenZFS-issue: https://www.illumos.org/issues/8378 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3127742 Closes #6293
*	Clean up large dnode code	Matthew Ahrens	2017-06-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Resolves issues discovered when porting to OpenZFS. * Lint warnings. * Made dnode_move_impl() large dnode aware. This functionality is currently unused on Linux. Reviewed-by: Ned Bass <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #6262
*	GCC 7.1 fixes	Tony Hutter	2017-06-28	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	GCC 7.1 with will warn when we're not checking the snprintf() return code in cases where the buffer could be truncated. This patch either checks the snprintf return code (where applicable), or simply disables the warnings (ztest.c). Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #6253
*	Dashes for zero latency values in zpool iostat -p	Tony Hutter	2017-06-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This prints dashes instead of zeros for zero latency values in 'zpool iostat -p'. You'll get zero latencies reported when the disk is idle, but technically a zero latency is invalid, since you can't measure the latency of doing nothing. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #6210
*	Inject zinject(8) a percentage amount of dev errs	Don Brady	2017-06-16	1	-10/+38
\| \| \| \| \| \| \| \| \| \| \|	In the original form of device error injection, it was an all or nothing situation. To help simulate intermittent error conditions, you can now specify a real number percentage value. This is also very useful for our ZFS fault diagnosis testing and for injecting intermittent errors during load testing. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #6227
*	Add missing \n for "invalid optionusage" output	kpande	2017-06-09	1	-1/+1
\| \| \| \| \| \| \| \| \|	Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: DHE <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Jack Draak <[email protected]> Signed-off-by: Kash Pande <[email protected]> Closes #6203
*	OpenZFS 7578 - Fix/improve some aspects of ZIL writing	Giuseppe Di Natale	2017-06-09	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- After some ZIL changes 6 years ago zil_slog_limit got partially broken due to zl_itx_list_sz not updated when async itx'es upgraded to sync. Actually because of other changes about that time zl_itx_list_sz is not really required to implement the functionality, so this patch removes some unneeded broken code and variables. - Original idea of zil_slog_limit was to reduce chance of SLOG abuse by single heavy logger, that increased latency for other (more latency critical) loggers, by pushing heavy log out into the main pool instead of SLOG. Beside huge latency increase for heavy writers, this implementation caused double write of all data, since the log records were explicitly prepared for SLOG. Since we now have I/O scheduler, I've found it can be much more efficient to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG. - Existing ZIL implementation had problem with space efficiency when it has to write large chunks of data into log blocks of limited size. In some cases efficiency stopped to almost as low as 50%. In case of ZIL stored on spinning rust, that also reduced log write speed in half, since head had to uselessly fly over allocated but not written areas. This change improves the situation by offloading problematic operations from z_log_write() to zil_lwb_commit(), which knows real situation of log blocks allocation and can split large requests into pieces much more efficiently. Also as side effect it removes one of two data copy operations done by ZIL code WR_COPIED case. - While there, untangle and unify code of z_log_write() functions. Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing block boundary, that may also improve efficiency if ZPL is made to do that. Sponsored by: iXsystems, Inc. Authored by: Alexander Motin <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Andriy Gapon <[email protected]> Reviewed by: Steven Hartland <[email protected]> Reviewed by: Brad Lewis <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/7578 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/aeb13ac Closes #6191
*	Add MS_MANDLOCK mount failure message	Brian Behlendorf	2017-06-07	1	-2/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit torvalds/linux@9e8925b6 allowed for kernels to be built without support for mandatory locking (MS_MANDLOCK). This will result in 'zfs mount' failing when the nbmand=on property is set if the kernel is built without CONFIG_MANDATORY_FILE_LOCKING. Unfortunately we can not reliably detect prior to the mount(2) system call if the kernel was built with this support. The best we can do is check if the mount failed with EPERM and if we passed 'mand' as a mount option and then print a more useful error message. e.g. filesystem 'tank/fs' has the 'nbmand=on' property set, this mount option may be disabled in your kernel. Use 'zfs set nbmand=off' to disable this option and try to mount the filesystem again. Additionally, switch the default error message case to use strerror() to produce a more human readable message. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4729 Closes #6199
*	Allow add of raidz and mirror with same redundancy	Håkan Johansson	2017-06-05	1	-1/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow new members to be added to a pool mixing raidz and mirror vdevs without giving -f, as long as they have matching redundancy. This case was missed in #5915, which only handled zpool create. Add zfstest zpool_add_010_pos.ksh, with test of zpool create followed by zpool add of mixed raidz and mirror vdevs. Add some more mixed raidz and mirror cases to zpool_create_006_pos.ksh. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Haakan Johansson <[email protected]> Issue #5915 Closes #6181
*	zpool iostat/status -c improvements	Giuseppe Di Natale	2017-06-05	26	-38/+313
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Users can now provide their own scripts to be run with 'zpool iostat/status -c'. User scripts should be placed in ~/.zpool.d to be included in zpool's default search path. Provide a script which can be used with 'zpool iostat\|status -c' that will return the type of device (hdd, sdd, file). Provide a script to get various values from smartctl when using 'zpool iostat/status -c'. Allow users to define the ZPOOL_SCRIPTS_PATH environment variable which can be used to override the default 'zpool iostat/status -c' search path. Allow the ZPOOL_SCRIPTS_ENABLED environment variable to enable or disable 'zpool status/iostat -c' functionality. Use the new smart script to provide the serial command. Install /etc/sudoers.d/zfs file which contains the sudoer rule for smartctl as a sample. Allow 'zpool iostat/status -c' tests to run in tree. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #6121 Closes #6153
*	Enable remaining tests	Brian Behlendorf	2017-05-22	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable most of the remaining test cases which were previously disabled. The required fixes are as follows: * cache_001_pos - No changes required. * cache_010_neg - Updated to use losetup under Linux. Loopback cache devices are allowed, ZVOLs as cache devices are not. Disabled until all the builders pass reliably. * cachefile_001_pos, cachefile_002_pos, cachefile_003_pos, cachefile_004_pos - Set set_device_dir path in cachefile.cfg, updated CPATH1 and CPATH2 to reference unique files. * zfs_clone_005_pos - Wait for udev to create volumes. * zfs_mount_007_pos - Updated mount options to expected Linux names. * zfs_mount_009_neg, zfs_mount_all_001_pos - No changes required. * zfs_unmount_005_pos, zfs_unmount_009_pos, zfs_unmount_all_001_pos - Updated to expect -f to not unmount busy mount points under Linux. * rsend_019_pos - Observed to occasionally take a long time on both 32-bit systems and the kmemleak builder. * zfs_written_property_001_pos - Switched sync(1) to sync_pool. * devices_001_pos, devices_002_neg - Updated create_dev_file() helper for Linux. * exec_002_neg.ksh - Fixed mmap_exec.c to preserve errno. Updated test case to expect EPERM from Linux as described by mmap(2). * grow_pool_001_pos - Adding missing setup.ksh and cleanup.ksh scripts from OpenZFS. * grow_replicas_001_pos.ksh - Added missing $SLICE_* variables. * history_004_pos, history_006_neg, history_008_pos - Fixed by previous commits and were not enabled. No changes required. * zfs_allow_010_pos - Added missing spaces after assorted zfs commands in delegate_common.kshlib. * inuse_* - Illumos dump device tests skipped. Remaining test cases updated to correctly create required partitions. * large_files_001_pos - Fixed largest_file.c to accept EINVAL as well as EFBIG as described in write(2). * link_count_001 - Added nproc to required commands. * umountall_001 - Updated to use umount -a. * online_offline_001_* - Pull in OpenZFS change to file_trunc.c to make the '-c 0' option run the test in a loop. Included online_offline.cfg file in all test cases. * rename_dirs_001_pos - Updated to use the rename_dir test binary, pkill restricted to exact matches and total runtime reduced. * slog_013_neg, write_dirs_002_pos - No changes required. * slog_013_pos.ksh - Updated to use losetup under Linux. * slog_014_pos.ksh - ZED will not be running, manually degrade the damaged vdev as expected. * nopwrite_varying_compression, nopwrite_volume - Forced pool sync with sync_pool to ensure up to date property values. * Fixed typos in ZED log messages. Refactored zed_* helper functions to resolve all-syslog exit=1 errors in zedlog. * zfs_copies_005_neg, zfs_get_004_pos, zpool_add_004_pos, zpool_destroy_001_pos, largest_pool_001_pos, clone_001_pos.ksh, clone_001_pos, - Skip until layering pools on zvols is solid. * largest_pool_001_pos - Limited to 7eb pool, maximum supported size in 8eb-1 on Linux. * zpool_expand_001_pos, zpool_expand_003_neg - Requires additional support from the ZED, updated skip reason. * zfs_rollback_001_pos, zfs_rollback_002_pos - Properly cleanup busy mount points under Linux between test loops. * privilege_001_pos, privilege_003_pos, rollback_003_pos, threadsappend_001_pos - Skip with log_unsupported. * snapshot_016_pos - No changes required. * snapshot_008_pos - Increased LIMIT from 512K to 2M and added sync_pool to avoid false positives. Signed-off-by: Brian Behlendorf <[email protected]> Closes #6128
*	Implemented zpool sync command	Alek P	2017-05-19	1	-0/+46
\| \| \| \| \| \| \| \| \| \| \|	This addition will enable us to sync an open TXG to the main pool on demand. The functionality is similar to 'sync(2)' but 'zpool sync' will return when data has hit the main storage instead of potentially just the ZIL as is the case with the 'sync(2)' cmd. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> Closes #6122
*	Force fault a vdev with 'zpool offline -f'	Tony Hutter	2017-05-19	1	-8/+23
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a '-f' option to 'zpool offline' to fault a vdev instead of bringing it offline. Unlike the OFFLINE state, the FAULTED state will trigger the FMA code, allowing for things like autoreplace and triggering the slot fault LED. The -f faults persist across imports, unless they were set with the temporary (-t) flag. Both persistent and temporary faults can be cleared with zpool clear. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #6094