summaryrefslogtreecommitdiffstats
path: root/tests/zfs-tests
Commit message (Collapse)AuthorAgeFilesLines
* Clean up lib dependenciesArvind Sankar2020-07-107-20/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | libzutil is currently statically linked into libzfs, libzfs_core and libzpool. Avoid the unnecessary duplication by removing it from libzfs and libzpool, and adding libzfs_core to libzpool. Remove a few unnecessary dependencies: - libuutil from libzfs_core - libtirpc from libspl - keep only libcrypto in libzfs, as we don't use any functions from libssl - librt is only used for clock_gettime, however on modern systems that's in libc rather than librt. Add a configure check to see if we actually need librt - libdl from raidz_test Add a few missing dependencies: - zlib to libefi and libzfs - libuuid to zpool, and libuuid and libudev to zed - libnvpair uses assertions, so add assert.c to provide aok and libspl_assertf Sort the LDADD for programs so that libraries that satisfy dependencies come at the end rather than the beginning of the linker command line. Revamp the configure tests for libaries to use FIND_SYSTEM_LIBRARY instead. This can take advantage of pkg-config, and it also avoids polluting LIBS. List all the required dependencies in the pkgconfig files, and move the one for libzfs_core into the latter's directory. Install pkgconfig files in $(libdir)/pkgconfig on linux and $(prefix)/libdata/pkgconfig on FreeBSD, instead of /usr/share/pkgconfig, as the more correct location for library .pc files. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10538
* Move libspl_assertf into .c fileArvind Sankar2020-07-102-3/+3
| | | | | | | | | | | | Variadic functions cannot be inlined. libspl_assertf ends up being duplicated in every file that uses it. Fix this by moving the function into a new assert.c. Also move the definition of aok into the new file instead of zone.c. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10538
* ZTS: Make bc conditional use compatible with new BSD bcRyan Moeller2020-07-091-2/+3
| | | | | | | | | | | | | | | | FreeBSD recently replaced the GNU bc and dc in the base system with BSD licensed versions. They are supposed to be compatible with all the features present in the GNU versions, but it turns out they are picky about `if` statements having a corresponding `else`. ZTS uses `echo "if ($x > $y) 1" | bc` in a few places, which causes tests to fail unexpectedly with the new bc. Change the two expressions in ZTS to `if ($x > $y) 1 else 0` for compatibility with the new BSD bc. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10551
* Add device rebuild featureBrian Behlendorf2020-07-0325-172/+1105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The device_rebuild feature enables sequential reconstruction when resilvering. Mirror vdevs can be rebuilt in LBA order which may more quickly restore redundancy depending on the pools average block size, overall fragmentation and the performance characteristics of the devices. However, block checksums cannot be verified as part of the rebuild thus a scrub is automatically started after the sequential resilver completes. The new '-s' option has been added to the `zpool attach` and `zpool replace` command to request sequential reconstruction instead of healing reconstruction when resilvering. zpool attach -s <pool> <existing vdev> <new vdev> zpool replace -s <pool> <old vdev> <new vdev> The `zpool status` output has been updated to report the progress of sequential resilvering in the same way as healing resilvering. The one notable difference is that multiple sequential resilvers may be in progress as long as they're operating on different top-level vdevs. The `zpool wait -t resilver` command was extended to wait on sequential resilvers. From this perspective they are no different than healing resilvers. Sequential resilvers cannot be supported for RAIDZ, but are compatible with the dRAID feature being developed. As part of this change the resilver_restart_* tests were moved in to the functional/replacement directory. Additionally, the replacement tests were renamed and extended to verify both resilvering and rebuilding. Original-patch-by: Isaac Huang <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: John Poduska <[email protected]> Co-authored-by: Mark Maybee <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10349
* Add block histogram to zdbRobert Novak2020-06-262-0/+273
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The block histogram tracks the changes to psize, lsize and asize both in the count of the number of blocks (by blocksize) and the total length of all of the blocks for that blocksize. It also keeps a running total of the cumulative size of all of the blocks up to each size to help determine the size of caching SSDs to be added to zfs hardware deployments. The block history counts and lengths are summarized in bins which are powers of two. Even rows with counts of zero are printed. This change is accessed by specifying one of two options: zdb -bbb pool zdb -Pbbb pool The first version prints the table in fixed size columns. The second prints in "parseable" output that can be placed into a CSV file. Fixed Column, nicenum output sample: block psize lsize asize size Count Length Cum. Count Length Cum. Count Length Cum. 512: 3.50K 1.75M 1.75M 3.43K 1.71M 1.71M 3.41K 1.71M 1.71M 1K: 3.65K 3.67M 5.43M 3.43K 3.44M 5.15M 3.50K 3.51M 5.22M 2K: 3.45K 6.92M 12.3M 3.41K 6.83M 12.0M 3.59K 7.26M 12.5M 4K: 3.44K 13.8M 26.1M 3.43K 13.7M 25.7M 3.49K 14.1M 26.6M 8K: 3.42K 27.3M 53.5M 3.41K 27.3M 53.0M 3.44K 27.6M 54.2M 16K: 3.43K 54.9M 108M 3.50K 56.1M 109M 3.42K 54.7M 109M 32K: 3.44K 110M 219M 3.41K 109M 218M 3.43K 110M 219M 64K: 3.41K 218M 437M 3.41K 218M 437M 3.44K 221M 439M 128K: 3.41K 437M 874M 3.70K 474M 911M 3.41K 437M 876M 256K: 3.41K 874M 1.71G 3.41K 874M 1.74G 3.41K 874M 1.71G 512K: 3.41K 1.71G 3.41G 3.41K 1.71G 3.45G 3.41K 1.71G 3.42G 1M: 3.41K 3.41G 6.82G 3.41K 3.41G 6.86G 3.41K 3.41G 6.83G 2M: 0 0 6.82G 0 0 6.86G 0 0 6.83G 4M: 0 0 6.82G 0 0 6.86G 0 0 6.83G 8M: 0 0 6.82G 0 0 6.86G 0 0 6.83G 16M: 0 0 6.82G 0 0 6.86G 0 0 6.83G Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Robert E. Novak <[email protected]> Closes: #9158 Closes #10315
* Fixes for make distArvind Sankar2020-06-267-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduce the usage of EXTRA_DIST. If files are conditionally included in _SOURCES, _HEADERS etc, automake is smart enough to dist all files that could possibly be included, but this does not apply to EXTRA_DIST, resulting in make dist depending on the configuration. Add some files that were missing altogether in various Makefile's. The changes to disted files in this commit (excluding deleted files): +./cmd/zed/agents/README.md +./etc/init.d/README.md +./lib/libspl/os/freebsd/getexecname.c +./lib/libspl/os/freebsd/gethostid.c +./lib/libspl/os/freebsd/getmntany.c +./lib/libspl/os/freebsd/mnttab.c -./lib/libzfs/libzfs_core.pc -./lib/libzfs/libzfs.pc +./lib/libzfs/os/freebsd/libzfs_compat.c +./lib/libzfs/os/freebsd/libzfs_fsshare.c +./lib/libzfs/os/freebsd/libzfs_ioctl_compat.c +./lib/libzfs/os/freebsd/libzfs_zmount.c +./lib/libzutil/os/freebsd/zutil_compat.c +./lib/libzutil/os/freebsd/zutil_device_path_os.c +./lib/libzutil/os/freebsd/zutil_import_os.c +./module/lua/README.zfs +./module/os/linux/spl/README.md +./tests/README.md +./tests/zfs-tests/tests/functional/cli_root/zfs_clone/zfs_clone_rm_nested.ksh +./tests/zfs-tests/tests/functional/cli_root/zfs_send/zfs_send_encrypted_unloaded.ksh +./tests/zfs-tests/tests/functional/inheritance/README.config +./tests/zfs-tests/tests/functional/inheritance/README.state +./tests/zfs-tests/tests/functional/rsend/rsend_016_neg.ksh +./tests/zfs-tests/tests/perf/fio/sequential_readwrite.fio Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10501
* pam: implement a zfs_key pam modulefelixdoerre2020-06-248-0/+222
| | | | | | | | | | | | | | | | | Implements a pam module for automatically loading zfs encryption keys for home datasets. The pam module: - loads a zfs key and mounts the dataset when a session opens. - unmounts the dataset and unloads the key when the session closes. - when the user is logged on and changes the password, the module changes the encryption key. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: @jengelh <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Felix Dörre <[email protected]> Closes #9886 Closes #9903
* Add zfs_multihost_interval tunable handler for FreeBSDRyan Moeller2020-06-231-1/+1
| | | | | | | | | | This tunable required a handler to be implemented for ZFS_MODULE_PARAM_CALL. Add the handler so the tunable can be declared in common code. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10490
* Clarify comments in config/*.m4, vdev_geom.c, zfs_allow_*.kshMatthew Ahrens2020-06-222-8/+9
| | | | | | | | | Rephrase comments to be more clear. Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10481
* Remove dead codeArvind Sankar2020-06-181-14/+0
| | | | | | | | | Delete unused functions. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10470
* Mark functions as staticArvind Sankar2020-06-183-3/+3
| | | | | | | | | | | Mark functions used only in the same translation unit as static. This only includes functions that do not have a prototype in a header file either. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10470
* linux: add basic fallocate(mode=0/2) compatibilityadilger2020-06-186-0/+223
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement semi-compatible functionality for mode=0 (preallocation) and mode=FALLOC_FL_KEEP_SIZE (preallocation beyond EOF) for ZPL. Since ZFS does COW and snapshots, preallocating blocks for a file cannot guarantee that writes to the file will not run out of space. Even if the first overwrite was guaranteed, it would not handle any later overwrite of blocks due to COW, so strict compliance is futile. Instead, make a best-effort check that at least enough free space is currently available in the pool (with a bit of margin), then create a sparse file of the requested size and continue on with life. This does not handle all cases (e.g. several fallocate() calls before writing into the files when the filesystem is nearly full), which would require a more complex mechanism to be implemented, probably based on a modified version of dmu_prealloc(), but is usable as-is. A new module option zfs_fallocate_reserve_percent is used to control the reserve margin for any single fallocate call. By default, this is 110% of the requested preallocation size, so an additional 10% of available space is reserved for overhead to allow the application a good chance of finishing the write when the fallocate() succeeds. If the heuristics of this basic fallocate implementation are not desirable, the old non-functional behavior of returning EOPNOTSUPP for calls can be restored by setting zfs_fallocate_reserve_percent=0. The parameter of zfs_statvfs() is changed to take an inode instead of a dentry, since no dentry is available in zfs_fallocate_common(). A few tests from @behlendorf cover basic fallocate functionality. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Arshad Hussain <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Signed-off-by: Andreas Dilger <[email protected]> Issue #326 Closes #10408
* Remove unnecessary references to slaveryMatthew Ahrens2020-06-101-11/+11
| | | | | | | | | | | | | | | | | | | | | | The horrible effects of human slavery continue to impact society. The casual use of the term "slave" in computer software is an unnecessary reference to a painful human experience. This commit removes all possible references to the term "slave". Implementation notes: The zpool.d/slaves script is renamed to dm-deps, which uses the same terminology as `dmsetup deps`. References to the `/sys/class/block/$dev/slaves` directory remain. This directory name is determined by the Linux kernel. Although `dmsetup deps` provides the same information, it unfortunately requires elevated privileges, whereas the `/sys/...` directory is world-readable. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10435
* Fix typosAndrea Gelmini2020-06-0915-18/+18
| | | | | | | | | Correct various typos in the comments and tests. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #10423
* File incorrectly zeroed when receiving incremental stream that toggles -LMatthew Ahrens2020-06-092-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Background: By increasing the recordsize property above the default of 128KB, a filesystem may have "large" blocks. By default, a send stream of such a filesystem does not contain large WRITE records, instead it decreases objects' block sizes to 128KB and splits the large blocks into 128KB blocks, allowing the large-block filesystem to be received by a system that does not support the `large_blocks` feature. A send stream generated by `zfs send -L` (or `--large-block`) preserves the large block size on the receiving system, by using large WRITE records. When receiving an incremental send stream for a filesystem with large blocks, if the send stream's -L flag was toggled, a bug is encountered in which the file's contents are incorrectly zeroed out. The contents of any blocks that were not modified by this send stream will be lost. "Toggled" means that the previous send used `-L`, but this incremental does not use `-L` (-L to no-L); or that the previous send did not use `-L`, but this incremental does use `-L` (no-L to -L). Changes: This commit addresses the problem with several changes to the semantics of zfs send/receive: 1. "-L to no-L" incrementals are rejected. If the previous send used `-L`, but this incremental does not use `-L`, the `zfs receive` will fail with this error message: incremental send stream requires -L (--large-block), to match previous receive. 2. "no-L to -L" incrementals are handled correctly, preserving the smaller (128KB) block size of any already-received files that used large blocks on the sending system but were split by `zfs send` without the `-L` flag. 3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`. This feature indicates that we can correctly handle "no-L to -L" incrementals. This flag is currently not set on any send streams. In the future, we intend for incremental send streams of snapshots that have large blocks to use `-L` by default, and these streams will also have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams from the default use of `zfs send` won't encounter the bug mentioned above, because they can't be received by software with the bug. Implementation notes: To facilitate accessing the ZPL's generation number, `zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and restructured to fill in a struct with ZPL-specific info including owner and generation. In the "no-L to -L" case, if this is a compressed send stream (from `zfs send -cL`), large WRITE records that are being written to small (128KB) blocksize files need to be decompressed so that they can be written split up into multiple blocks. The zio pipeline will recompress each smaller block individually. A new test case, `send-L_toggle`, is added, which tests the "no-L to -L" case and verifies that we get an error for the "-L to no-L" case. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #6224 Closes #10383
* ZTS: Fix add-o_ashift.kshIgor K2020-06-091-1/+1
| | | | | | | | | Use option '-o' after action for compatibility Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Igor Kozhukhov <[email protected]> Closes #10426
* Trim L2ARCGeorge Amanakis2020-06-094-4/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The l2arc_evict() function is responsible for evicting buffers which reference the next bytes of the L2ARC device to be overwritten. Teach this function to additionally TRIM that vdev space before it is overwritten if the device has been filled with data. This is done by vdev_trim_simple() which trims by issuing a new type of TRIM, TRIM_TYPE_SIMPLE. We also implement a "Trim Ahead" feature. It is a zfs module parameter, expressed in % of the current write size. This trims ahead of the current write size. A minimum of 64MB will be trimmed. The default is 0 which disables TRIM on L2ARC as it can put significant stress to underlying storage devices. To enable TRIM on L2ARC we set l2arc_trim_ahead > 0. We also implement TRIM of the whole cache device upon addition to a pool, pool creation or when the header of the device is invalid upon importing a pool or onlining a cache device. This is dependent on l2arc_trim_ahead > 0. TRIM of the whole device is done with TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t. We save the TRIM state for the whole device and the time of completion on-disk in the header, and restore these upon L2ARC rebuild so that zpool status -t can correctly report them. Whole device TRIM is done asynchronously so that the user can export of the pool or remove the cache device while it is trimming (ie if it is too slow). We do not TRIM the whole device if persistent L2ARC has been disabled by l2arc_rebuild_enabled = 0 because we may not want to lose all cached buffers (eg we may want to import the pool with l2arc_rebuild_enabled = 0 only once because of memory pressure). If persistent L2ARC has been disabled by setting the module parameter l2arc_rebuild_blocks_min_l2size to a value greater than the size of the cache device then the whole device is trimmed upon creation or import of a pool if l2arc_trim_ahead > 0. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam D. Moss <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #9713 Closes #9789 Closes #10224
* mkfile: include missing headersalaviss2020-06-051-0/+2
| | | | | | | | Without these headers, compilation fails on musl libc with offset_t being undeclared and MIN being implictly declared. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Hiếu Lê <[email protected]> Closes #10406
* ZTS: Retry export/destroy when busy in zpool_import_012Ryan Moeller2020-05-271-2/+2
| | | | | | | | | | | | | It can take a moment for the NFS server to give up the mountpoint after unsharing a filesystem. Use log_must_busy to retry export/destroy a few times after switching off sharenfs. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10380
* ZTS: Fix zfs_mount.kshlib cleanupBrian Behlendorf2020-05-231-1/+1
| | | | | | | | | | Update cleanup_filesystem to use destroy_dataset when performing cleanup. This ensures the destroy is retried if the pool is busy preventing occasional failures. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10358
* mount: use the mount syscall directlyfelixdoerre2020-05-201-1/+1
| | | | | | | | | | | | | | | | | | | | Allow zfs datasets to be mounted on Linux without relying on the invocation of an external processes. This is the same behavior which is implemented for FreeBSD. Use of the libmount library was originally considered because it provides functionality to properly lock and update the /etc/mtab file. However, these days /etc/mtab is typically a symlink to /proc/self/mounts so there's nothing to updated. Therefore, we call mount(2) directly and avoid any additional dependencies. If required the legacy behavior can be enabled by setting the ZFS_MOUNT_HELPER environment variable. This may be needed in environments where SELinux in enabled and the zfs binary does not have mount permission. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Felix Dörre <[email protected]> #10294
* Small program that converts a dataset id and an object id to a pathPaul Dagnelie2020-05-206-1/+164
| | | | | | | | | Small program that converts a dataset id and an object id to a path Reviewed-by: Prakash Surya <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #10204
* ZTS: zpool_split_indirect deletes zfstest log fileJohn Wren Kennedy2020-05-141-1/+2
| | | | | | | | | | | | | The cleanup routine for this test attempts to remove some temporary files with `rm -f $VDEV_*`, but VDEV_ is undefined. As a result, all files in the current working directory (/var/tmp/test_results/current) get removed instead. This includes the complete log file of all tests. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: George Amanakis <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: John Kennedy <[email protected]> Closes #10324
* Resilver restarts unnecessarily when it encounters errorsJohn Poduska2020-05-133-1/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When a resilver finishes, vdev_dtl_reassess is called to hopefully excise DTL_MISSING (amongst other things). If there are errors during the resilver, they are tracked in DTL_SCRUB, as spelled out in the block comment in vdev.c. DTL_SCRUB is in-core only, so it can only be used if the pool was online for the whole resilver. This state is tracked with the spa_scrub_started flag, which only gets set when the scan is initialized. Unfortunately, this flag gets cleared right before vdev_dtl_reassess gets called, so if there are any errors during the scan, DTL_MISSING will never get excised and the resilver will just continually restart. This fix simply moves clearing that flag until after the call to vdev_dtl_reasses. In addition, if a pool is imported and already has scn_errors > 0, this change will restart the resilver immediately instead of doing the rest of the scan and then restarting it from the beginning. On the other hand, if scn_errors == 0 at import, then no errors have been encountered so far, so the spa_scrub_started flag can be safely set. A test has been added to verify that resilver does not restart when relevant DTL's are available. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Zuchowski <[email protected]> Signed-off-by: John Poduska <[email protected]> Closes #10291
* Fixed LDADD library links in Makefiles for cross compilation buildsPetros Koutoupis2020-05-093-1/+6
| | | | | | | | | | | When building on native dev system, there are no issues but when cross-compiling for target system, some linker errors are observed. The only way to avoid these errors is by adjusting the Makefile.am of those various components to add the library dependencies. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Petros Koutoupis <[email protected]> Closes #10304
* ZTS: refreserv_005_pos.kshBrian Behlendorf2020-05-081-3/+3
| | | | | | | | | | When recursively destroying the dataset it's possible for the dataset volume to be open by an unrelated process, like blkid. Use the destroy_dataset() which will retry when this occurs. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10305
* Improvements on persistent L2ARCGeorge Amanakis2020-05-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Functional changes: We implement refcounts of log blocks and their aligned size on the cache device along with two corresponding arcstats. The refcounts are reflected in the header of the device and provide valuable information as to whether log blocks are accounted for correctly. These are dynamically adjusted as log blocks are committed/evicted. zdb also uses this information in the device header and compares it to the corresponding values as reported by dump_l2arc_log_blocks() which emulates l2arc_rebuild(). If the refcounts saved in the device header report higher values, zdb exits with an error. For this feature to work correctly there should be no active writes on the device. This is also employed in the tests of persistent L2ARC. We extend the structure of the cache device header by adding the two new variables mirroring the refcounts after the existing variables to preserve backward compatibility in terms of persistent L2ARC. 1) a new arcstat "l2_log_blk_asize" and refcount "l2ad_lb_asize" which reflect the total aligned size of log blocks on the device. This is also reflected in the header of the cache device as "dh_lb_asize". 2) a new arcstat "l2arc_log_blk_count" and refcount "l2ad_lb_count" which reflect the total number of L2ARC log blocks present on cache devices. It is also reflected in the header of the cache device as "dh_lb_count". In l2arc_rebuild_vdev() if the amount of committed log entries in a log block is 0 and the device header is valid we update the device header. This will facilitate trimming of the whole device in this case when TRIM for L2ARC is implemented. Improve loop protection in l2arc_rebuild() by using the starting offset of the payload of each log block instead of the starting offset of the log block. If the zio in l2arc_write_buffers() fails, restore the lbps array in the header of the device to its previous state in l2arc_write_done(). If l2arc_rebuild() ends the rebuild process without restoring any L2ARC log blocks in ARC and without any other error, this means that the lbps array in the header is pointing to non-existent or invalid log blocks. Reset the device header in this case. In l2arc_rebuild() change the zfs_dbgmsg messages to spa_history_log_internal() making them user visible with zpool history command. Non-functional changes: Make the first test in persistent L2ARC use `zdb -lll` to increase coverage in `zdb.c`. Rename psize with asize when referring to log blocks, since L2ARC_SET_PSIZE stores the vdev aligned size for log blocks. Also rename dh_log_blk_entries to dh_log_entries to make it clear that it is a mirror of l2ad_log_entries. Added comments for both changes. Fix inaccurate comments for example in l2arc_log_blk_restore(). Add asserts at the end in l2arc_evict() and l2arc_write_buffers(). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10228
* Add support for boot environment data to be stored in the labelPaul Dagnelie2020-05-071-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modern bootloaders leverage data stored in the root filesystem to enable some of their powerful features. GRUB specifically has a grubenv file which can store large amounts of configuration data that can be read and written at boot time and during normal operation. This allows sysadmins to configure useful features like automated failover after failed boot attempts. Unfortunately, due to the Copy-on-Write nature of ZFS, the standard behavior of these tools cannot handle writing to ZFS files safely at boot time. We need an alternative way to store data that allows the bootloader to make changes to the data. This work is very similar to work that was done on Illumos to enable similar functionality in the FreeBSD bootloader. This patch is different in that the data being stored is a raw grubenv file; this file can store arbitrary variables and values, and the scripting provided by grub is powerful enough that special structures are not required to implement advanced behavior. We repurpose the second padding area in each label to store the grubenv file, protected by an embedded checksum. We add two ioctls to get and set this data, and libzfs_core and libzfs functions to access them more easily. There are no direct command line interfaces to these functions; these will be added directly to the bootloader utilities. Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #10009
* Enable splitting mirrors with indirect vdevsGeorge Amanakis2020-05-062-1/+70
| | | | | | | | | | | When a top-level vdev is removed from a pool it is converted to an indirect vdev. Until now splitting such mirrored pools was not possible with zpool split. This patch enables handling of indirect vdevs and splitting of those pools with zpool split. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10283
* ZTS: Count CKSUM for all vdevs in verify_poolRyan Moeller2020-04-301-1/+6
| | | | | | | | | | | The verify_pool function should detect checksum errors on any vdev, but it was only checking at the root of the pool. Accumulate the errors for all vdevs to obtain the correct count. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10271
* Add more sanity testing for zdb input argsSara Hartse2020-04-283-10/+125
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: sara hartse <[email protected]> Closes #10243
* zfs_create: round up volume size to multiple of bsalex2020-04-242-1/+19
| | | | | | | | | | | Round up the volume size requested in `zfs create -V size` to the next higher multiple of the volblocksize. Updates the man page and adds a test to verify the new behavior. Reviewed-by: Brian Behlendorf <[email protected]> Reported-by: puffi <[email protected]> Signed-off-by: Alex John <[email protected]> Closes #8541 Closes #10196
* Remove deduplicated send/receive codeMatthew Ahrens2020-04-2310-99/+129
| | | | | | | | | | | | | | | | | | | | | | | | | Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such streams) are deprecated. Deduplicated send streams can be received by first converting them to non-deduplicated with the `zstream redup` command. This commit removes the code for sending and receiving deduplicated send streams. `zfs send -D` will now print a warning, ignore the `-D` flag, and generate a regular (non-deduplicated) send stream. `zfs receive` of a deduplicated send stream will print an error message and fail. The resulting code simplification (especially in the kernel's support for receiving dedup streams) should help enable future performance enhancements. Several new tests are added which leverage `zstream redup`. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Issue #7887 Issue #10117 Issue #10156 Closes #10212
* Persistent L2ARC minor fixesGeorge Amanakis2020-04-171-2/+2
| | | | | | | | | | | Minor fixes on persistent L2ARC improving code readability and fixing a typo in zdb.c when byte-swapping a log block. It also improves the pesist_l2arc_007_pos.ksh test by giving it more time to retrieve log blocks on the cache device. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam D. Moss <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10210
* Update FreeBSD tunablesRyan Moeller2020-04-151-1/+1
| | | | | | | | Remove some obsolete legacy compat, rename some misnamed, and add some missing tunables for FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10203
* Add FreeBSD support to OpenZFSMatthew Macy2020-04-143-10/+16
| | | | | | | | | | | | | | | | | | Add the FreeBSD platform code to the OpenZFS repository. As of this commit the source can be compiled and tested on FreeBSD 11 and 12. Subsequent commits are now required to compile on FreeBSD and Linux. Additionally, they must pass the ZFS Test Suite on FreeBSD which is being run by the CI. As of this commit 1230 tests pass on FreeBSD and there are no unexpected failures. Reviewed-by: Sean Eric Fagan <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #898 Closes #8987
* ZTS: Fix and change testcase cache_010_negalex2020-04-133-28/+28
| | | | | | | | | | | | | | | | Commit 379ca9c removed the requirement on aux devices to be block devices only but the test case cache_010_neg was not updated, making it fail consistently. This change changes the test to check that cache devices _can_ be anything that presents a block interface. The testcase is renamed to cache_010_pos and the exceptions for known failure removed from the test runner. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reported-by: Richard Elling <[email protected]> Signed-off-by: Alex John <[email protected]> Closes #10172
* Add `zstream redup` command to convert deduplicated send streamsMatthew Ahrens2020-04-103-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Deduplicated send and receive is deprecated. To ease migration to the new dedup-send-less world, the commit adds a `zstream redup` utility to convert deduplicated send streams to normal streams, so that they can continue to be received indefinitely. The new `zstream` command also replaces the functionality of `zstreamdump`, by way of the `zstream dump` subcommand. The `zstreamdump` command is replaced by a shell script which invokes `zstream dump`. The way that `zstream redup` works under the hood is that as we read the send stream, we build up a hash table which maps from `<GUID, object, offset> -> <file_offset>`. Whenever we see a WRITE record, we add a new entry to the hash table, which indicates where in the stream file to find the WRITE record for this block. (The key is `drr_toguid, drr_object, drr_offset`.) For entries other than WRITE_BYREF, we pass them through unchanged (except for the running checksum, which is recalculated). For WRITE_BYREF records, we change them to WRITE records. We find the referenced WRITE record by looking in the hash table (for the record with key `drr_refguid, drr_refobject, drr_refoffset`), and then reading the record header and payload from the specified offset in the stream file. This is why the stream can not be a pipe. The found WRITE record replaces the WRITE_BYREF record, with its `drr_toguid`, `drr_object`, and `drr_offset` fields changed to be the same as the WRITE_BYREF's (i.e. we are writing the same logical block, but with the data supplied by the previous WRITE record). This algorithm requires memory proportional to the number of WRITE records (same as `zfs send -D`), but the size per WRITE record is relatively low (40 bytes, vs. 72 for `zfs send -D`). A 1TB send stream with 8KB blocks (`recordsize=8k`) would use around 5GB of RAM to "redup". Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10124 Closes #10156
* Persistent L2ARCGeorge Amanakis2020-04-1014-0/+965
| | | | | | | | | | | | | | | | | | | | | This commit makes the L2ARC persistent across reboots. We implement a light-weight persistent L2ARC metadata structure that allows L2ARC contents to be recovered after a reboot. This significantly eases the impact a reboot has on read performance on systems with large caches. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Saso Kiselkov <[email protected]> Co-authored-by: Jorgen Lundman <[email protected]> Co-authored-by: George Amanakis <[email protected]> Ported-by: Yuxuan Shui <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #925 Closes #1823 Closes #2672 Closes #3744 Closes #9582
* ZTS: Fix non-portable date formatRyan Moeller2020-04-062-18/+17
| | | | | | | | | | | | | | The delegate tests use `date(1)` to generate snapshot names, using the format '%F-%T-%N' to get nanosecond resolution (since multiple snapshots may be taken in the same second). '%N' is not portable, and causes tests to fail on FreeBSD. Since the only purpose these timestamps serve is to create a unique name, simply use $RANDOM instead. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10170
* Add 'zfs wait' commandPaul Dagnelie2020-04-017-0/+201
| | | | | | | | | | | | | | | | | | | | | | Add a mechanism to wait for delete queue to drain. When doing redacted send/recv, many workflows involve deleting files that contain sensitive data. Because of the way zfs handles file deletions, snapshots taken quickly after a rm operation can sometimes still contain the file in question, especially if the file is very large. This can result in issues for redacted send/recv users who expect the deleted files to be redacted in the send streams, and not appear in their clones. This change duplicates much of the zpool wait related logic into a zfs wait command, which can be used to wait until the internal deleteq has been drained. Additional wait activities may be added in the future. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Gallagher <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #9707
* Reset l2ad_hand and l2ad_first in l2arc_evictGeorge Amanakis2020-03-314-1/+128
| | | | | | | | | | | | | | | | | | | | Increasing l2arc_write_size or l2arc_write_boost can result in l2arc_write_buffers() not having enough space to perform its writes and panic zio_write_phys(). Instead of resetting l2ad_hand to l2ad_start at the end of l2arc_write_buffers() and not taking into account a possible user-mediated increase of l2arc_write_max, we do this in l2arc_evict(), right after l2arc_write_size() has run. If there is not enough space to evict (ie we will exceed l2ad_end) we evict to the end of the device, reset l2ad_hand to l2ad_start, set l2ad_first to 0 and iterate l2arc_evict(). We avoid infinite iteration of l2arc_evict() by making sure in l2arc_write_size() that l2ad_start + size does not exceed l2ad_end. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10154
* ZTS: Skip udev actions in zvol_misc when not LinuxRyan Moeller2020-03-311-3/+3
| | | | | | | | | | | udev is only used on Linux. Skip udev_wait and udev_cleanup when not on Linux. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10165
* ZTS: Wait for free space between quota testsRyan Moeller2020-03-265-7/+15
| | | | | | | | And in removal tests, sync the specific pool we are waiting on. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10146
* Avoid core dump on invalid redaction bookmarkRyan Moeller2020-03-182-3/+3
| | | | | | | | | | | | | | | | libzfs aborts and dumps core on EINVAL from the kernel when trying to do a redacted send with a bookmark that is not a redaction bookmark. Move redacted bookmark validation into libzfs. Check if the bookmark given for redactions is actually a redaction bookmark. Print an error message and exit gracefully if it is not. Don't abort on EINVAL in zfs_send_one. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10138
* Fix cstyle warningsBrian Behlendorf2020-03-171-1/+2
| | | | | | | Fix minor cstyle warnings accidentally introduced by 7145123b. Reviewed-by: Paul Dagnelie <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10143
* Separate warning for incomplete and corrupt streamsPaul Dagnelie2020-03-171-1/+1
| | | | | | | | | | This change adds a separate return code to zfs_ioc_recv that is used for incomplete streams, in addition to the existing return code for streams that contain corruption. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #10122
* Add option for forcible unmounting dataset while receiving snapshot.Mariusz Zaborski2020-03-172-0/+86
| | | | | | | | | | | | | | | | | | | | | | | | Currently when the dataset is in use we can't receive snapshots. zfs send test/1@asd | zfs recv -FM test/2 cannot unmount '/test/2': Device busy This commits add option 'M' which attempts to forcibly unmount the dataset. Thanks to this we can enforce receiving snapshots in a single step. Note that this functionality is not supported on Linux because the VFS will prevent active mounted filesystems from being unmounted, even with the force option. This is the intended VFS behavior. Test cases were added to verify the expected behavior based on the platform. Discussed-with: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Allan Jude <[email protected]> External-issue: https://reviews.freebsd.org/D22306 Closes #9904
* ZTS: Use default_cleanup_noexit where neededRyan Moeller2020-03-173-3/+7
| | | | | | | | | And add log_pass appropriately. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10136
* ZTS: Test boundary conditions in alloc_class_012Ryan Moeller2020-03-121-39/+55
| | | | | | | | | | | | | | | Issue #9142 describes an error in the checks for device removal that can prevent removal of special allocation class vdevs in some situations. Enhance alloc_class/alloc_class_012_pos to check situations where this bug occurs. Update zts-report with knowledge of issue #9142. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10116 Issue #9142