aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* OpenZFS 8056 - zfs send size estimate is inaccurate for some zvolsPaul Dagnelie2017-06-091-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Authored by: Paul Dagnelie <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: Kash Pande <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> The send size estimate for a zvol can be too low, if the size of the record headers (dmu_replay_record_t's) is a significant portion of the size. This is typically the case when the data is highly compressible, especially with embedded blocks. The problem is that dmu_adjust_send_estimate_for_indirects() assumes that blocks are the size of the "recordsize" property (128KB). However, for zvols, the blocks are the size of the "volblocksize" property (8KB). Therefore, we estimate that there will be 16x less record headers than there really will be. The fix is to check the type of the object set (whether it is a zvol or not) and pick the appropriate property. In addition, while we are at it, we also add the size of the BEGIN and END records to the estimate. OpenZFS-issue: https://www.illumos.org/issues/8056 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/faf09cd Closes #6205
* OpenZFS 8156 - dbuf_evict_notify() does not need dbuf_evict_lockMatthew Ahrens2017-06-091-11/+7
| | | | | | | | | | | | | | | | | | | | | Authored by: Matthew Ahrens <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> dbuf_evict_notify() holds the dbuf_evict_lock while checking if it should do the eviction itself (because the evict thread is not able to keep up). This can result in massive lock contention. It isn't necessary to hold the lock, because if we make the wrong choice occasionally, nothing bad will happen. This commit results in a ~60% performance improvement for ARC-cached sequential reads. OpenZFS-issue: https://www.illumos.org/issues/8156 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f73e5d9 Closes #6204
* OpenZFS 8199 - multi-threaded dmu_object_alloc()Matthew Ahrens2017-06-094-71/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dmu_object_alloc() is single-threaded, so when multiple threads are creating files in a single filesystem, they spend a lot of time waiting for the os_obj_lock. To improve performance of multi-threaded file creation, we must make dmu_object_alloc() typically not grab any filesystem-wide locks. The solution is to have a "next object to allocate" for each CPU. Each of these "next object"s is in a different block of the dnode object, so that concurrent allocation holds dnodes in different dbufs. When a thread's "next object" reaches the end of a chunk of objects (by default 4 blocks worth -- 128 dnodes), it will be reset to the per-objset os_obj_next, which will be increased by a chunk of objects (128). Only when manipulating the os_obj_next will we need to grab the os_obj_lock. This decreases lock contention dramatically, because each thread only needs to grab the os_obj_lock briefly, once per 128 allocations. This results in a 70% performance improvement to multi-threaded object creation (where each thread is creating objects in its own directory), from 67,000/sec to 115,000/sec, with 8 CPUs. Work sponsored by Intel Corp. Authored by: Matthew Ahrens <[email protected]> Reviewed-by: Ned Bass <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Matthew Ahrens <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8199 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/374 Closes #4703 Closes #6117
* OpenZFS 7578 - Fix/improve some aspects of ZIL writingGiuseppe Di Natale2017-06-0910-128/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - After some ZIL changes 6 years ago zil_slog_limit got partially broken due to zl_itx_list_sz not updated when async itx'es upgraded to sync. Actually because of other changes about that time zl_itx_list_sz is not really required to implement the functionality, so this patch removes some unneeded broken code and variables. - Original idea of zil_slog_limit was to reduce chance of SLOG abuse by single heavy logger, that increased latency for other (more latency critical) loggers, by pushing heavy log out into the main pool instead of SLOG. Beside huge latency increase for heavy writers, this implementation caused double write of all data, since the log records were explicitly prepared for SLOG. Since we now have I/O scheduler, I've found it can be much more efficient to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG. - Existing ZIL implementation had problem with space efficiency when it has to write large chunks of data into log blocks of limited size. In some cases efficiency stopped to almost as low as 50%. In case of ZIL stored on spinning rust, that also reduced log write speed in half, since head had to uselessly fly over allocated but not written areas. This change improves the situation by offloading problematic operations from z*_log_write() to zil_lwb_commit(), which knows real situation of log blocks allocation and can split large requests into pieces much more efficiently. Also as side effect it removes one of two data copy operations done by ZIL code WR_COPIED case. - While there, untangle and unify code of z*_log_write() functions. Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing block boundary, that may also improve efficiency if ZPL is made to do that. Sponsored by: iXsystems, Inc. Authored by: Alexander Motin <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Andriy Gapon <[email protected]> Reviewed by: Steven Hartland <[email protected]> Reviewed by: Brad Lewis <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/7578 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/aeb13ac Closes #6191
* OpenZFS 8155 - simplify dmu_write_policy handling of pre-compressed buffersMatthew Ahrens2017-06-075-26/+18
| | | | | | | | | | | | | | | | | | | | | | | Authored by: Matthew Ahrens <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> When writing pre-compressed buffers, arc_write() requires that the compression algorithm used to compress the buffer matches the compression algorithm requested by the zio_prop_t, which is set by dmu_write_policy(). This makes dmu_write_policy() and its callers a bit more complicated. We simplify this by making arc_write() trust the caller to supply the type of pre-compressed buffer that it wants to write, and override the compression setting in the zio_prop_t. OpenZFS-issue: https://www.illumos.org/issues/8155 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b55ff58 Closes #6200
* Add MS_MANDLOCK mount failure messageBrian Behlendorf2017-06-071-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | Commit torvalds/linux@9e8925b6 allowed for kernels to be built without support for mandatory locking (MS_MANDLOCK). This will result in 'zfs mount' failing when the nbmand=on property is set if the kernel is built without CONFIG_MANDATORY_FILE_LOCKING. Unfortunately we can not reliably detect prior to the mount(2) system call if the kernel was built with this support. The best we can do is check if the mount failed with EPERM and if we passed 'mand' as a mount option and then print a more useful error message. e.g. filesystem 'tank/fs' has the 'nbmand=on' property set, this mount option may be disabled in your kernel. Use 'zfs set nbmand=off' to disable this option and try to mount the filesystem again. Additionally, switch the default error message case to use strerror() to produce a more human readable message. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4729 Closes #6199
* Skip tests that are slow on 32-bit buildersGiuseppe Di Natale2017-06-063-1/+13
| | | | | | | | | | zpool_create_024_pos, zvol_misc_002_pos, write_dirs_002_pos are slow on the buildbot 32-bit builder. Skip the test cases for now on 32-bit builders. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #6195
* Reduce async_destroy_001_pos memory requirementsBrian Behlendorf2017-06-062-11/+5
| | | | | | | | | | | | The number of blocks which can be freed per TXG is controlled by the zfs_free_max_blocks module option (defaults to 100,000). Both speed up this test case and reduce the memory requirements by only creating 4 TXGs worth of blocks to be freed. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #5479 Closes #6192
* Allow add of raidz and mirror with same redundancyHÃ¥kan Johansson2017-06-055-4/+192
| | | | | | | | | | | | | | | Allow new members to be added to a pool mixing raidz and mirror vdevs without giving -f, as long as they have matching redundancy. This case was missed in #5915, which only handled zpool create. Add zfstest zpool_add_010_pos.ksh, with test of zpool create followed by zpool add of mixed raidz and mirror vdevs. Add some more mixed raidz and mirror cases to zpool_create_006_pos.ksh. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Haakan Johansson <[email protected]> Issue #5915 Closes #6181
* Linux 4.9 compat: fix zfs_ctldir xattr handlingLOLi2017-06-051-0/+3
| | | | | | | | | Since torvalds/linux@d0a5b99 IOP_XATTR is used to indicate the inode has xattr support: clear it for the ctldir inodes to avoid EIO errors. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6189
* zpool iostat/status -c improvementsGiuseppe Di Natale2017-06-0543-61/+812
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Users can now provide their own scripts to be run with 'zpool iostat/status -c'. User scripts should be placed in ~/.zpool.d to be included in zpool's default search path. Provide a script which can be used with 'zpool iostat|status -c' that will return the type of device (hdd, sdd, file). Provide a script to get various values from smartctl when using 'zpool iostat/status -c'. Allow users to define the ZPOOL_SCRIPTS_PATH environment variable which can be used to override the default 'zpool iostat/status -c' search path. Allow the ZPOOL_SCRIPTS_ENABLED environment variable to enable or disable 'zpool status/iostat -c' functionality. Use the new smart script to provide the serial command. Install /etc/sudoers.d/zfs file which contains the sudoer rule for smartctl as a sample. Allow 'zpool iostat/status -c' tests to run in tree. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #6121 Closes #6153
* Fix "snapdev" property issuesLOLi2017-06-025-64/+248
| | | | | | | | | | | | | | | | | | | | When inheriting the "snapdev" property to we don't always call zfs_prop_set_special(): this prevents device nodes from being created in certain situations. Because "snapdev" is the only *special* property that is also inheritable we need to call zfs_prop_set_special() even when we're not reverting it to the received value ('zfs inherit -S'). Additionally, fix a NULL pointer dereference accidentally introduced in 5559ba0 that can be triggered when setting the "snapdev" property to the value "hidden" twice. Finally, add a new test case "zvol_misc_snapdev" to the ZFS Test Suite. Reviewed by: Boris Protopopov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6131 Closes #6175 Closes #6176
* Fix import wrong spare/l2 device when path changeChunwei Chen2017-06-011-6/+0
| | | | | | | | | | | | | | | If, for example, your aux device was /dev/sdc, but now the aux device is removed and /dev/sdc points to other device. zpool import will still use that device and corrupt it. The problem is that the spa_validate_aux in spa_import, rather than validate the on-disk label, it would actually write label to disk. We remove them since spa_load_{spares,l2cache} seems to do everything we need and they would actually validate on-disk label. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #6158
* Fix import finding spare/l2cache when path changesChunwei Chen2017-06-011-2/+16
| | | | | | | | | | | | | | | | When spare or l2cache device path changes, zpool import will not fix up their paths like normal vdev. The issue is that when you supply a pool name argument to zpool import, it will use it to filter out device which doesn't have the pool name in the label. Since spare and l2cache device never have that in the label, they'll always get filtered out. We fix this by making sure we never filter out a spare or l2cache device. Reviewed by: Richard Elling <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #6158
* Retire filebench testingGiuseppe Di Natale2017-06-011-13/+0
| | | | | | | | | We no longer perform automated filebench testing. Remove references to it for the automated testing. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #6186
* Fix memory leak in zvol_set_volsize()LOLi2017-05-311-1/+2
| | | | | | | | | | | Move kmem_free() so it's called for every error path: this is preferred over making `dmu_object_info_t doi` local to accommodate older kernels with limited stacks. Reviewed by: Boris Protopopov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6177
* Explain reason for Signed-off-by in CONTRIBUTINGkpande2017-05-311-2/+6
| | | | | | | | Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Haakan T Johansson <[email protected]> Signed-off-by: Kash Pande <[email protected]> Closes #6183
* Fix ida leak in zvol_create_minor_implBoris Protopopov2017-05-261-0/+7
| | | | | | | | | | Added missing ida_simple_remove() in the error handling path. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Boris Protopopov <[email protected]> Closes #6159 Closes #6172
* Don't dirty bpobj if it has no entriesAlek P2017-05-261-0/+4
| | | | | | | | | | | | | | | In certain cases (dsl_scan_sync() is one), we may end up calling bpobj_iterate() on an empty bpobj. Even though we don't end up modifying the bpobj it still gets dirtied, causing unneeded writes to the pool. This patch adds an early bail from bpobj_iterate_impl() if bpobj is empty to prevent unneeded writes. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> Closes #6164
* Revert "Fix "snapdev" property inheritance behaviour"Brian Behlendorf2017-05-265-230/+59
| | | | | | | | | | | This reverts commit 959f56b99366c8727647b5b19fb3d47555c96cf3. An issue was uncovered by the new zvol_misc_snapdev test case which needs to be investigated and resolved. Reviewed-by: loli10K <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6174 Issue #6131
* OpenZFS 8077 - zfs-tests suite fails zpool_get_002_posYuri Pankov2017-05-251-19/+48
| | | | | | | | | | | | | | | | | Authored by: Yuri Pankov <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: John Kennedy <[email protected]> Approved by: Richard Lowe <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: bunder2015 <[email protected]> Porting Notes: * Also corrected a quoting mistake found in our copy OpenZFS-issue: https://www.illumos.org/issues/8077 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/481467d Closes #6163
* OpenZFS 8076 - zfs-tests suite fails rootpool_002_negYuri Pankov2017-05-251-11/+12
| | | | | | | | | | | | | | Authored by: Yuri Pankov <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Prakash Surya <[email protected]> Approved by: Richard Lowe <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: bunder2015 <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8076 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ab3407e Closes #6162
* OpenZFS 8071 - zfs-tests: 7290 missed some casesYuri Pankov2017-05-253-13/+10
| | | | | | | | | | | | | | Authored by: Yuri Pankov <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: John Kennedy <[email protected]> Approved by: Richard Lowe <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: bunder2015 <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8071 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/e84991e Closes #6161
* OpenZFS 8070 - Add some ZFS commentsAlan Somers2017-05-252-0/+7
| | | | | | | | | | | | | | Authored by: Alan Somers <[email protected]> Reviewed by: Yuri Pankov <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: bunder2015 <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8070 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/40713f2 Closes #6160
* Fix "snapdev" property inheritance behaviourLOLi2017-05-255-59/+230
| | | | | | | | | | | When inheriting the "snapdev" property to we don't always call zfs_prop_set_special(): this prevents device nodes from being created in certain situations. Because "snapdev" is the only *special* property that is also inheritable we need to call zfs_prop_set_special() even when we're not reverting it to the received value ('zfs inherit -S'). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6131
* OpenZFS 8072 - zfs-tests: several test cases incorrectly spell TESTPOOLYuri Pankov2017-05-255-19/+17
| | | | | | | | | | | | | Authored by: Yuri Pankov <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: John Kennedy <[email protected]> Approved by: Richard Lowe <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8072 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/56e4733 Closes #6137
* config: allow --with-linux without --with-linux-objChunwei Chen2017-05-251-1/+2
| | | | | | | | | | Don't use `uname -r` to determine kernel build directory when the user specified kernel source with --with-linux. Otherwise, the user is forced to use --with-linux-obj even if they are the same directory, which is very counterintuitive. Signed-off-by: Chunwei Chen <[email protected]> Requires-spl: refs/pull/617/head
* Improve gitignoreChunwei Chen2017-05-252-0/+5
| | | | | | | Ignore .*.d and exclude Makefile.in in module/ Also, ignore *.patch and *.orig files Signed-off-by: Chunwei Chen <[email protected]>
* Linux 4.12 compat: fix super_setup_bdi_name() callLOLi2017-05-252-6/+6
| | | | | | | | | | Provide a format parameter to super_setup_bdi_name() so we don't create duplicate names in '/devices/virtual/bdi' sysfs namespace which would prevent us from mounting more than one ZFS filesystem at a time. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6147
* Retire zconfig.shBrian Behlendorf2017-05-223-701/+0
| | | | | | | | All of the test coverage provided by this script is now handled as part of the ZFS Test Suite. Remove it. Signed-off-by: Brian Behlendorf <[email protected]> Closes #6128
* Add zpool events testsBrian Behlendorf2017-05-2218-12/+487
| | | | | | | | | | | | | | | | | | * events_001_pos - Verify the expected events are generated when invoking the various zpool sub-commands. These events must appear in `zpool event` and be consumed by the ZED. * events_002_pos - Verify the ZED consumes events which were generated while it wasn't running when it is started. Additionally, verify that events are only processed once. As part of this change the default.cfg used by the test suite was changed to a default.cfg.in file. This was needed so the install location of all zed scripts, not only the enabled ones, could be reliably determined. Signed-off-by: Brian Behlendorf <[email protected]> Closes #6128
* Enable xattr testsBrian Behlendorf2017-05-2217-186/+387
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Updated the xattr_common.ksh helper functions to use the attr command on Linux to manipulate xattrs. Added an xattr.cfg file and reworked the user/group functionality to be consist with the existing delegate test cases. The intent of each test case was preserved. * xattr_001_pos, xattr_002_neg - Updated to verity xattr=on and xattr=sa sytle xattrs. * xattr_003_neg - Use user_run helper instead of su. * xattr_004_pos - Updated to work with ext2 xattrs. * xattr_007_neg - Updated to use attr instead of runat. * xattr_008_pos, xattr_009_neg8_pos, xattr_010_neg - Test cases disables since they aren't applicable to Linux. * xattr_011_pos - Updated to expected behavior from GNU versions of the tested utilities. * xattr_012_pos - Updated to use xattrtest to create many small xattrs instead of a single large one. * xattr_013_pos - Updated to use attr instead of runat. Signed-off-by: Brian Behlendorf <[email protected]> Closes #6128
* Enable remaining testsBrian Behlendorf2017-05-2285-354/+799
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable most of the remaining test cases which were previously disabled. The required fixes are as follows: * cache_001_pos - No changes required. * cache_010_neg - Updated to use losetup under Linux. Loopback cache devices are allowed, ZVOLs as cache devices are not. Disabled until all the builders pass reliably. * cachefile_001_pos, cachefile_002_pos, cachefile_003_pos, cachefile_004_pos - Set set_device_dir path in cachefile.cfg, updated CPATH1 and CPATH2 to reference unique files. * zfs_clone_005_pos - Wait for udev to create volumes. * zfs_mount_007_pos - Updated mount options to expected Linux names. * zfs_mount_009_neg, zfs_mount_all_001_pos - No changes required. * zfs_unmount_005_pos, zfs_unmount_009_pos, zfs_unmount_all_001_pos - Updated to expect -f to not unmount busy mount points under Linux. * rsend_019_pos - Observed to occasionally take a long time on both 32-bit systems and the kmemleak builder. * zfs_written_property_001_pos - Switched sync(1) to sync_pool. * devices_001_pos, devices_002_neg - Updated create_dev_file() helper for Linux. * exec_002_neg.ksh - Fixed mmap_exec.c to preserve errno. Updated test case to expect EPERM from Linux as described by mmap(2). * grow_pool_001_pos - Adding missing setup.ksh and cleanup.ksh scripts from OpenZFS. * grow_replicas_001_pos.ksh - Added missing $SLICE_* variables. * history_004_pos, history_006_neg, history_008_pos - Fixed by previous commits and were not enabled. No changes required. * zfs_allow_010_pos - Added missing spaces after assorted zfs commands in delegate_common.kshlib. * inuse_* - Illumos dump device tests skipped. Remaining test cases updated to correctly create required partitions. * large_files_001_pos - Fixed largest_file.c to accept EINVAL as well as EFBIG as described in write(2). * link_count_001 - Added nproc to required commands. * umountall_001 - Updated to use umount -a. * online_offline_001_* - Pull in OpenZFS change to file_trunc.c to make the '-c 0' option run the test in a loop. Included online_offline.cfg file in all test cases. * rename_dirs_001_pos - Updated to use the rename_dir test binary, pkill restricted to exact matches and total runtime reduced. * slog_013_neg, write_dirs_002_pos - No changes required. * slog_013_pos.ksh - Updated to use losetup under Linux. * slog_014_pos.ksh - ZED will not be running, manually degrade the damaged vdev as expected. * nopwrite_varying_compression, nopwrite_volume - Forced pool sync with sync_pool to ensure up to date property values. * Fixed typos in ZED log messages. Refactored zed_* helper functions to resolve all-syslog exit=1 errors in zedlog. * zfs_copies_005_neg, zfs_get_004_pos, zpool_add_004_pos, zpool_destroy_001_pos, largest_pool_001_pos, clone_001_pos.ksh, clone_001_pos, - Skip until layering pools on zvols is solid. * largest_pool_001_pos - Limited to 7eb pool, maximum supported size in 8eb-1 on Linux. * zpool_expand_001_pos, zpool_expand_003_neg - Requires additional support from the ZED, updated skip reason. * zfs_rollback_001_pos, zfs_rollback_002_pos - Properly cleanup busy mount points under Linux between test loops. * privilege_001_pos, privilege_003_pos, rollback_003_pos, threadsappend_001_pos - Skip with log_unsupported. * snapshot_016_pos - No changes required. * snapshot_008_pos - Increased LIMIT from 512K to 2M and added sync_pool to avoid false positives. Signed-off-by: Brian Behlendorf <[email protected]> Closes #6128
* Fix LZ4_uncompress_unknownOutputSize caused panicFeng Sun2017-05-191-8/+19
| | | | | | | | | | | | | | | | Sync with kernel patches for lz4 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/lib/lz4 4a3a99 lz4: add overrun checks to lz4_uncompress_unknownoutputsize() d5e7ca LZ4 : fix the data abort issue bea2b5 lib/lz4: Pull out constant tables 99b7e9 lz4: fix system halt at boot kernel on x86_64 Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Feng Sun <[email protected]> Closes #5975 Closes #5973
* Implemented zpool sync commandAlek P2017-05-1920-70/+397
| | | | | | | | | | | This addition will enable us to sync an open TXG to the main pool on demand. The functionality is similar to 'sync(2)' but 'zpool sync' will return when data has hit the main storage instead of potentially just the ZIL as is the case with the 'sync(2)' cmd. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> Closes #6122
* Force fault a vdev with 'zpool offline -f'Tony Hutter2017-05-1913-29/+266
| | | | | | | | | | | | | This patch adds a '-f' option to 'zpool offline' to fault a vdev instead of bringing it offline. Unlike the OFFLINE state, the FAULTED state will trigger the FMA code, allowing for things like autoreplace and triggering the slot fault LED. The -f faults persist across imports, unless they were set with the temporary (-t) flag. Both persistent and temporary faults can be cleared with zpool clear. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #6094
* Fixed small memory leak in ereport handlingTom Caputi2017-05-181-6/+6
| | | | | | | | | | | One pre-check in zfs_ereport_start() was being called after the nvlists were being allocated. This simply corrects that issue. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #6140
* Fix large dnode send stream flag conflictBrian Behlendorf2017-05-181-1/+2
| | | | | | | | | | | | | | | | | | Bit 21 of the send stream flags was inadvertently used for two different features under concurrent development. To avoid any future compatibility problems the large dnode flag is being switched to bit 23 which is unused. The large dnode feature has only been present in pre-releases of ZoL and dnodesize defaults to legacy which is compatible with existing OpenZFS implementations. Users with dnodesize=auto needing to use zfs send/recv must update ZoL on both the source and destination systems. Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Ned Bass <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6139
* Compatibilty with glibc-2.23Justin Lecher2017-05-161-0/+1
| | | | | | | | | | In glibc-2.23 <sys/sysmacros.h> isn't automatically included in <sys/types.h> [1], so we need ot explicitely include it. https://sourceware.org/ml/libc-alpha/2015-11/msg00253.html Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Justin Lecher <[email protected]> Closes #6132
* Introduce zv_state_lockBoris Protopopov2017-05-161-71/+124
| | | | | | | | | | | | The lock is designed to protect internal state of zvol_state_t and to avoid taking spa_namespace_lock (e.g. in dmu_objset_own() code path) while holding zvol_stat_lock. Refactor the code accordingly. Signed-off-by: Boris Protopopov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3484 Closes #6065 Closes #6134
* Revert commit 1ee159f4Boris Protopopov2017-05-161-2/+29
| | | | | | | | | | | | Fix lock order inversion with zvol_open() as it did not account for use of zvols as vdevs. The latter use cases resulted in the lock order inversion deadlocks that involved spa_namespace_lock and bdev->bd_mutex. Signed-off-by: Boris Protopopov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #6065 Issue #6134
* Skip spurious resilver IO on raidz vdevIsaac Huang2017-05-1210-33/+125
| | | | | | | | | | | | | | | | On a raidz vdev, a block that does not span all child vdevs, excluding its skip sectors if any, may not be affected by a child vdev outage or failure. In such cases, the block does not need to be resilvered. However, current resilver algorithm simply resilvers all blocks on a degraded raidz vdev. Such spurious IO is not only wasteful, but also adds the risk of overwriting good data. This patch eliminates such spurious IOs. Reviewed-by: Gvozden Neskovic <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Isaac Huang <[email protected]> Closes #5316
* Enable additional test casesBrian Behlendorf2017-05-1153-252/+416
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable additional test cases, in most cases this required a few minor modifications to the test scripts. In a few cases a real bug was uncovered and fixed. And in a handful of cases where pools are layered on pools the test case will be skipped until this is supported. Details below for each test case. * zpool_add_004_pos - Skip test on Linux until adding zvols to pools is fully supported and deadlock free. * zpool_add_005_pos.ksh - Skip dumpadm portion of the test which isn't relevant for Linux. The find_vfstab_dev, find_mnttab_dev, and save_dump_dev functions were updated accordingly for Linux. Add O_EXCL to the in-use check to prevent the -f (force) option from working for mounted filesystems and improve the resulting error. * zpool_add_006_pos - Update test case such that it doesn't depend on nested pools. Switch to truncate from mkfile to reduce space requirements and speed up the test case. * zpool_clear_001_pos - Speed up test case by filling filesystem to 25% capacity. * zpool_create_002_pos, zpool_create_004_pos - Use sparse files for file vdevs in order to avoid increasing the partition size. * zpool_create_006_pos - 6ba1ce9 allows raidz+mirror configs with similar redundancy. Updating the valid_args and forced_args cases. * zpool_create_008_pos - Disable overlapping partition portion. * zpool_create_011_neg - Fix to correctly create the extra partition. Modified zpool_vdev.c to use fstat64_blk() wrapper which includes the st_size even for block devices. * zpool_create_012_neg - Updated to properly find swap devices. * zpool_create_014_neg, zpool_create_015_neg - Updated to use swap_setup() and swap_cleanup() wrappers which do the right thing on Linux and Illumos. Removed '-n' option which succeeds under Linux due to differences in the in-use checks. * zpool_create_016_pos.ksh - Skipped test case isn't useful. * zpool_create_020_pos - Added missing / to cleanup() function. Remove cache file prior to test to ensure a clean environment and avoid false positives. * zpool_destroy_001_pos - Removed test case which creates a pool on a zvol. This is more likely to deadlock under Linux and has never been completely supported on any platform. * zpool_destroy_002_pos - 'zpool destroy -f' is unsupported on Linux. Mount point must not be busy in order to unmount them. * zfs_destroy_001_pos - Handle EBUSY error which can occur with volumes when racing with udev. * zpool_expand_001_pos, zpool_expand_003_neg - Skip test on Linux until adding zvols to pools is fully supported and deadlock free. The test could be modified to use loop-back devices but it would be preferable to use the test case as is for improved coverage. * zpool_export_004_pos - Updated test case to such that it doesn't depend on nested pools. Normal file vdev under /var/tmp are fine. * zpool_import_all_001_pos - Updated to skip partition 1, which is known as slice 2, on Illumos. This prevents overwriting the default TESTPOOL which was causing the failure. * zpool_import_002_pos, zpool_import_012_pos - No changes needed. * zpool_remove_003_pos - No changes needed * zpool_upgrade_002_pos, zpool_upgrade_004_pos - Root cause addressed by upstream OpenZFS commit 3b7f360. * zpool_upgrade_007_pos - Disabled in test case due to known failure. Opened issue https://github.com/zfsonlinux/zfs/issues/6112 * zvol_misc_002_pos - Updated to to use ext2. * zvol_misc_001_neg, zvol_misc_003_neg, zvol_misc_004_pos, zvol_misc_005_neg, zvol_misc_006_pos - Moved to skip list, these test case could be updated to use Linux's crash dump facility. * zvol_swap_* - Updated to use swap_setup/swap_cleanup helpers. File creation switched from /tmp to /var/tmp. Enabled minimal useful tests for Linux, skip test cases which aren't applicable. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3484 Issue #5634 Issue #2437 Issue #5202 Issue #4034 Closes #6095
* OpenZFS 8063 - verify that we do not attempt to access inactive txgMatthew Ahrens2017-05-108-26/+53
| | | | | | | | | | | | | | | | | | | | | Authored by: Matthew Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: George Melikov <[email protected]> A standard practice in ZFS is to keep track of "per-txg" state. Any of the 3 active TXG's (open, quiescing, syncing) can have different values for this state. We should assert that we do not attempt to modify other (inactive) TXG's. Porting Notes: - ASSERTV added to txg_sync_waiting() for unused variable. OpenZFS-issue: https://www.illumos.org/issues/8063 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/01acb46 Closes #6109
* OpenZFS 8166 - zpool scrub thinks it repaired offline deviceMatthew Ahrens2017-05-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Authored by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Matthew Ahrens <[email protected]> If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". OpenZFS-issue: https://www.illumos.org/issues/8166 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/372 Closes #5806 Closes #6103
* Add missing arc_free_cksum() to arc_release()Tom Caputi2017-05-101-0/+4
| | | | | | | | | | | | | | | | | | The arc layer tracks checksums of its data in the arc header so that it can ensure that buffers haven't changed when they're not supposed to. This checksum is only maintained while there is an uncompressed buffer still attached to the header. Unfortunately there is a missing call to arc_free_cksum() in arc_release() that can trigger ASSERTs. This has not been a common issue because the checksums are only maintained for debug builds and triggering the bug requires writing a block (and therefore calling arc_release()) while a compressed buffer is still being used on a debug build. This simply corrects the issue. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #6105
* Linux 4.12 compat: CURRENT_TIME removedBrian Behlendorf2017-05-107-11/+45
| | | | | | | | Linux 4.9 added current_time() as the preferred interface to get the filesystem time. CURRENT_TIME was retired in Linux 4.12. Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6114
* Add property overriding (-o|-x) to 'zfs receive'LOLi2017-05-099-89/+901
| | | | | | | | | | | | | | | | | | | | This allows users to specify "-o property=value" to override and "-x property" to exclude properties when receiving a zfs send stream. Both native and user properties can be specified. This is useful when using zfs send/receive for periodic backup/replication because it lets users change properties such as canmount, mountpoint, or compression without modifying the source. References: https://www.illumos.org/issues/2745 https://www.illumos.org/issues/3753 Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #1350 Closes #5349
* Make createtxg and guid properties publicChristian Schwarz2017-05-094-5/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Document the existence of `createtxg` and `guid` native properties in man pages and zfs command output. One of the great features of ZFS is incremental replication of snapshots, possibly between pools on different machines. Shell scripts are commonly used to auomate this procedure. They have to find the most recent common snapshot between both sides and then perform incremental send & recv. Currently, scripts rely on the sorting order of `zfs list`, which defaults to `createtxg`, and the assumption that snapshot names on either side do not change. By making `createtxg` and `guid` part of the public ZFS interface, scripts are enabled to use a) `createtxg` to determine the logical & temporal order of snapshots (the creation property is not an equivalent substitute since multiple snapshots may be created within one second) b) `guid` to uniquely identify a snapshot, independent of its current display name This has the potential of making scripts safer and correct. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: DHE <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: Christian Schwarz <[email protected]> Closes #6102
* Fix NULL pointer dereference in 'zfs create'LOLi2017-05-091-1/+3
| | | | | | | | | | | A race condition between 'zpool export' and 'zfs create' can crash the latter: this is because we never check libzfs`zpool_open() return value in libzfs`zfs_create(). Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6096