aboutsummaryrefslogtreecommitdiffstats
path: root/cmd
Commit message (Collapse)AuthorAgeFilesLines
* Add support to decode a resume tokentony-zfs2020-07-234-1/+86
| | | | | | | | | | | | Adding a new subcommand to zstream called token. This now allows users to decode a resume token to retrieve the toname field. This can be useful for tools that need this information. The syntax works as follows zstream token <resume_token>. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Zuchowski <[email protected]> Signed-off-by: Tony Perkins <[email protected]> Closes #10558
* FreeBSD: Add legacy arc_min and arc_maxRyan Moeller2020-07-191-1/+1
| | | | | | | | | These tunables were renamed from vfs.zfs.arc_min and vfs.zfs.arc_max to vfs.zfs.arc.min and vfs.zfs.arc.max. Add legacy compat tunables for the old names. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10579
* Extend zdb to print inconsistencies in livelists and metaslabsMatthew Ahrens2020-07-142-57/+599
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Livelists and spacemaps are data structures that are logs of allocations and frees. Livelists entries are block pointers (blkptr_t). Spacemaps entries are ranges of numbers, most often used as to track allocated/freed regions of metaslabs/vdevs. These data structures can become self-inconsistent, for example if a block or range can be "double allocated" (two allocation records without an intervening free) or "double freed" (two free records without an intervening allocation). ZDB (as well as zfs running in the kernel) can detect these inconsistencies when loading livelists and metaslab. However, it generally halts processing when the error is detected. When analyzing an on-disk problem, we often want to know the entire set of inconsistencies, which is not possible with the current behavior. This commit adds a new flag, `zdb -y`, which analyzes the livelist and metaslab data structures and displays all of their inconsistencies. Note that this is different from the leak detection performed by `zdb -b`, which checks for inconsistencies between the spacemaps and the tree of block pointers, but assumes the spacemaps are self-consistent. The specific checks added are: Verify livelists by iterating through each sublivelists and: - report leftover FREEs - report double ALLOCs and double FREEs - record leftover ALLOCs together with their TXG [see Cross Check] Verify spacemaps by iterating over each metaslab and: - iterate over spacemap and then the metaslab's entries in the spacemap log, then report any double FREEs and double ALLOCs Verify that livelists are consistenet with spacemaps. The space referenced by livelists (after using the FREE's to cancel out corresponding ALLOCs) should be allocated, according to the spacemaps. Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Sara Hartse <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> External-issue: DLPX-66031 Closes #10515
* Centralize variable substitutionArvind Sankar2020-07-149-42/+22
| | | | | | | | | | | | A bunch of places need to edit files to incorporate the configured paths i.e. bindir, sbindir etc. Move this logic into a common file. Create arc_summary by copying arc_summary[23] as appropriate at build time instead of install time. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10559
* Remove dependency on sharetab file and refactor sharing logicGeorge Wilson2020-07-132-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | == Motivation and Context The current implementation of 'sharenfs' and 'sharesmb' relies on the use of the sharetab file. The use of this file is os-specific and not required by linux or freebsd. Currently the code must maintain updates to this file which adds complexity and presents a significant performance impact when sharing many datasets. In addition, concurrently running 'zfs sharenfs' command results in missing entries in the sharetab file leading to unexpected failures. == Description This change removes the sharetab logic from the linux and freebsd implementation of 'sharenfs' and 'sharesmb'. It still preserves an os-specific library which contains the logic required for sharing NFS or SMB. The following entry points exist in the vastly simplified libshare library: - sa_enable_share -- shares a dataset but may not commit the change - sa_disable_share -- unshares a dataset but may not commit the change - sa_is_shared -- determine if a dataset is shared - sa_commit_share -- notify NFS/SMB subsystem to commit the shares - sa_validate_shareopts -- determine if sharing options are valid The sa_commit_share entry point is provided as a performance enhancement and is not required. The sa_enable_share/sa_disable_share may commit the share as part of the implementation. Libshare provides a framework for both NFS and SMB but some operating systems may not fully support these protocols or all features of the protocol. NFS Operation: For linux, libshare updates /etc/exports.d/zfs.exports to add and remove shares and then commits the changes by invoking 'exportfs -r'. This file, is automatically read by the kernel NFS implementation which makes for better integration with the NFS systemd service. For FreeBSD, libshare updates /etc/zfs/exports to add and remove shares and then commits the changes by sending a SIGHUP to mountd. SMB Operation: For linux, libshare adds and removes files in /var/lib/samba/usershares by calling the 'net' command directly. There is no need to commit the changes. FreeBSD does not support SMB. == Performance Results To test sharing performance we created a pool with an increasing number of datasets and invoked various zfs actions that would enable and disable sharing. The performance testing was limited to NFS sharing. The following tests were performed on an 8 vCPU system with 128GB and a pool comprised of 4 50GB SSDs: Scale testing: - Share all filesystems in parallel -- zfs sharenfs=on <dataset> & - Unshare all filesystems in parallel -- zfs sharenfs=off <dataset> & Functional testing: - share each filesystem serially -- zfs share -a - unshare each filesystem serially -- zfs unshare -a - reset sharenfs property and unshare -- zfs inherit -r sharenfs <pool> For 'zfs sharenfs=on' scale testing we saw an average reduction in time of 89.43% and for 'zfs sharenfs=off' we saw an average reduction in time of 83.36%. Functional testing also shows a huge improvement: - zfs share -- 97.97% reduction in time - zfs unshare -- 96.47% reduction in time - zfs inhert -r sharenfs -- 99.01% reduction in time Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Bryant G. Ly <[email protected]> Signed-off-by: George Wilson <[email protected]> External-Issue: DLPX-68690 Closes #1603 Closes #7692 Closes #7943 Closes #10300
* Unconditionally enable debugging for libzpoolSerapheim Dimitropoulos2020-07-104-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | We already enable -DDEBUG unconditionally (meaning regardless of this is a debug build or a performance build) for zdb and ztest as they are mostly used for development and debugging. This patch enables -DDEBUG for libzpool extending the debugging checks for zdb, ztest, and a couple of other test utilities. In addition to passing -DDEBUG we also enable -DZFS_DEBUG so all assertion checks work s expected. We do so not only in libzpool but in every utility that links to it, even if the utility doesn't directly use any functionality wrapped in ZFS_DEBUG macro definitions. The reason is that these utilities may still include headers that contain structs that have more fields when ZFS_DEBUG is defined. This can be a problem as enabling that flag for libzpool but not for zdb can lead into random problems (e.g. segmentation faults) as zdb may be have an incorrect view of a struct passed to it by libzpool. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #10549
* Use abs_top_builddir when referencing librariesArvind Sankar2020-07-1011-33/+33
| | | | | | | | | | | | | | | | | | | | | libtool stores absolute paths in the dependency_libs component of the .la files. If the Makefile for a dependent library refers to the libraries by relative path, some libraries end up duplicated on the link command line. As an example, libzfs specifies libzfs_core, libnvpair and libuutil as dependencies to be linked in. The .la file for libzfs_core also specifies libnvpair, but using an absolute path, with the result that libnvpair is present twice in the linker command line for producing libzfs. While the only thing this causes is to slightly slow down the linking, we can avoid it by using absolute paths everywhere, including for convenience libraries just for consistency. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10538
* Add config.rpath for AM_GNU_GETTEXTArvind Sankar2020-07-103-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | Commit e8864b1b28c2 ("config: libintl/libiconv for gettext() detection") added an empty config.rpath with a comment that the real one doesn't work with libtool. However, an empty config.rpath doesn't really work: eg. on FreeBSD, where libintl is in /usr/local/lib, configure thinks that gettext doesn't exist and NLS should be disabled, which currently isn't supported in the source, and hence requires manual workaround to directly link -lintl without relying on configure. config.rpath is essential to let it be detected either in --prefix or using --with-libintl-prefix. I also don't see the mentioned issue with libtool flags applied to compilation, it seems to work fine to pass LTLIBINTL to libtool. It's unnecessary to include LTLIBICONV as the configure test will automatically append that to LTLIBINTL if it is necessary to link with libiconv. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10538
* Clean up lib dependenciesArvind Sankar2020-07-1010-26/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | libzutil is currently statically linked into libzfs, libzfs_core and libzpool. Avoid the unnecessary duplication by removing it from libzfs and libzpool, and adding libzfs_core to libzpool. Remove a few unnecessary dependencies: - libuutil from libzfs_core - libtirpc from libspl - keep only libcrypto in libzfs, as we don't use any functions from libssl - librt is only used for clock_gettime, however on modern systems that's in libc rather than librt. Add a configure check to see if we actually need librt - libdl from raidz_test Add a few missing dependencies: - zlib to libefi and libzfs - libuuid to zpool, and libuuid and libudev to zed - libnvpair uses assertions, so add assert.c to provide aok and libspl_assertf Sort the LDADD for programs so that libraries that satisfy dependencies come at the end rather than the beginning of the linker command line. Revamp the configure tests for libaries to use FIND_SYSTEM_LIBRARY instead. This can take advantage of pkg-config, and it also avoids polluting LIBS. List all the required dependencies in the pkgconfig files, and move the one for libzfs_core into the latter's directory. Install pkgconfig files in $(libdir)/pkgconfig on linux and $(prefix)/libdata/pkgconfig on FreeBSD, instead of /usr/share/pkgconfig, as the more correct location for library .pc files. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10538
* Add device rebuild featureBrian Behlendorf2020-07-035-62/+343
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The device_rebuild feature enables sequential reconstruction when resilvering. Mirror vdevs can be rebuilt in LBA order which may more quickly restore redundancy depending on the pools average block size, overall fragmentation and the performance characteristics of the devices. However, block checksums cannot be verified as part of the rebuild thus a scrub is automatically started after the sequential resilver completes. The new '-s' option has been added to the `zpool attach` and `zpool replace` command to request sequential reconstruction instead of healing reconstruction when resilvering. zpool attach -s <pool> <existing vdev> <new vdev> zpool replace -s <pool> <old vdev> <new vdev> The `zpool status` output has been updated to report the progress of sequential resilvering in the same way as healing resilvering. The one notable difference is that multiple sequential resilvers may be in progress as long as they're operating on different top-level vdevs. The `zpool wait -t resilver` command was extended to wait on sequential resilvers. From this perspective they are no different than healing resilvers. Sequential resilvers cannot be supported for RAIDZ, but are compatible with the dRAID feature being developed. As part of this change the resilver_restart_* tests were moved in to the functional/replacement directory. Additionally, the replacement tests were renamed and extended to verify both resilvering and rebuilding. Original-patch-by: Isaac Huang <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: John Poduska <[email protected]> Co-authored-by: Mark Maybee <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10349
* Style fixesBrian Behlendorf2020-06-271-0/+1
| | | | | | | * Fix cstyle issue in shrinker.h which exceeded 80 columns. * Silence shellcheck warning in zpool.d/smart script. Signed-off-by: Brian Behlendorf <[email protected]>
* Make zstreamdump output the size of the payload for BEGIN recordsAllan Jude2020-06-271-0/+2
| | | | | | | | | This is helpful for determining the size of the nvlist of snapshots and properties Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #10505
* arcstat: add 'avail', fix 'free'Matthew Ahrens2020-06-261-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The meaning of the `free` field is currently `zfs_arc_sys_free`, which is the target amount of memory to leave free for the system, and is constant after booting. This commit changes the meaning of `free` to arc_free_memory(), the amount of memory that the ARC considers to be free. It also adds a new arcstat field `avail`, which tracks `arc_available_memory()`. Since `avail` can be negative, it also updates the arcstat script to pretty-print negative values. example output: $ arcstat -f time,miss,arcsz,c,grow,need,free,avail 1 time miss arcsz c grow need free avail 15:03:02 39K 114G 114G 0 0 2.4G 407M 15:03:03 42K 114G 114G 0 0 2.1G 120M 15:03:04 40K 114G 114G 0 0 1.8G -177M 15:03:05 24K 113G 112G 0 0 1.7G -269M 15:03:06 29K 111G 110G 0 0 1.6G -385M 15:03:07 27K 110G 108G 0 0 1.4G -535M 15:03:08 13K 108G 108G 0 0 2.2G 239M 15:03:09 33K 107G 107G 0 0 1.3G -639M 15:03:10 16K 105G 102G 0 0 2.6G 704M 15:03:11 7.2K 102G 102G 0 0 5.1G 3.1G 15:03:12 42K 103G 102G 0 0 4.8G 2.8G Reviewed-by: George Melikov <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10494
* Add block histogram to zdbRobert Novak2020-06-261-0/+203
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The block histogram tracks the changes to psize, lsize and asize both in the count of the number of blocks (by blocksize) and the total length of all of the blocks for that blocksize. It also keeps a running total of the cumulative size of all of the blocks up to each size to help determine the size of caching SSDs to be added to zfs hardware deployments. The block history counts and lengths are summarized in bins which are powers of two. Even rows with counts of zero are printed. This change is accessed by specifying one of two options: zdb -bbb pool zdb -Pbbb pool The first version prints the table in fixed size columns. The second prints in "parseable" output that can be placed into a CSV file. Fixed Column, nicenum output sample: block psize lsize asize size Count Length Cum. Count Length Cum. Count Length Cum. 512: 3.50K 1.75M 1.75M 3.43K 1.71M 1.71M 3.41K 1.71M 1.71M 1K: 3.65K 3.67M 5.43M 3.43K 3.44M 5.15M 3.50K 3.51M 5.22M 2K: 3.45K 6.92M 12.3M 3.41K 6.83M 12.0M 3.59K 7.26M 12.5M 4K: 3.44K 13.8M 26.1M 3.43K 13.7M 25.7M 3.49K 14.1M 26.6M 8K: 3.42K 27.3M 53.5M 3.41K 27.3M 53.0M 3.44K 27.6M 54.2M 16K: 3.43K 54.9M 108M 3.50K 56.1M 109M 3.42K 54.7M 109M 32K: 3.44K 110M 219M 3.41K 109M 218M 3.43K 110M 219M 64K: 3.41K 218M 437M 3.41K 218M 437M 3.44K 221M 439M 128K: 3.41K 437M 874M 3.70K 474M 911M 3.41K 437M 876M 256K: 3.41K 874M 1.71G 3.41K 874M 1.74G 3.41K 874M 1.71G 512K: 3.41K 1.71G 3.41G 3.41K 1.71G 3.45G 3.41K 1.71G 3.42G 1M: 3.41K 3.41G 6.82G 3.41K 3.41G 6.86G 3.41K 3.41G 6.83G 2M: 0 0 6.82G 0 0 6.86G 0 0 6.83G 4M: 0 0 6.82G 0 0 6.86G 0 0 6.83G 8M: 0 0 6.82G 0 0 6.86G 0 0 6.83G 16M: 0 0 6.82G 0 0 6.86G 0 0 6.83G Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Robert E. Novak <[email protected]> Closes: #9158 Closes #10315
* Fixes for make distArvind Sankar2020-06-262-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduce the usage of EXTRA_DIST. If files are conditionally included in _SOURCES, _HEADERS etc, automake is smart enough to dist all files that could possibly be included, but this does not apply to EXTRA_DIST, resulting in make dist depending on the configuration. Add some files that were missing altogether in various Makefile's. The changes to disted files in this commit (excluding deleted files): +./cmd/zed/agents/README.md +./etc/init.d/README.md +./lib/libspl/os/freebsd/getexecname.c +./lib/libspl/os/freebsd/gethostid.c +./lib/libspl/os/freebsd/getmntany.c +./lib/libspl/os/freebsd/mnttab.c -./lib/libzfs/libzfs_core.pc -./lib/libzfs/libzfs.pc +./lib/libzfs/os/freebsd/libzfs_compat.c +./lib/libzfs/os/freebsd/libzfs_fsshare.c +./lib/libzfs/os/freebsd/libzfs_ioctl_compat.c +./lib/libzfs/os/freebsd/libzfs_zmount.c +./lib/libzutil/os/freebsd/zutil_compat.c +./lib/libzutil/os/freebsd/zutil_device_path_os.c +./lib/libzutil/os/freebsd/zutil_import_os.c +./module/lua/README.zfs +./module/os/linux/spl/README.md +./tests/README.md +./tests/zfs-tests/tests/functional/cli_root/zfs_clone/zfs_clone_rm_nested.ksh +./tests/zfs-tests/tests/functional/cli_root/zfs_send/zfs_send_encrypted_unloaded.ksh +./tests/zfs-tests/tests/functional/inheritance/README.config +./tests/zfs-tests/tests/functional/inheritance/README.state +./tests/zfs-tests/tests/functional/rsend/rsend_016_neg.ksh +./tests/zfs-tests/tests/perf/fio/sequential_readwrite.fio Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10501
* Fix check for sed --in-placeArvind Sankar2020-06-242-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The test added in commit 4313a5b4c51e ("Detect if sed supports --in-place") doesn't work at least on my system (autoconfig-2.69). The issue is that SED has already been found and cached before this function is evaluated, with the result that the test is completely skipped. ... checking for a sed that does not truncate output... /usr/bin/sed ... checking for sed --in-place... (cached) /usr/bin/sed The first test is executed by libtool.m4. This looks to have been around in libtool for at least 15 years or so, not sure why this was not encountered at the time of the original commit. Fix this by caching the value of the ac_inplace flag rather than the path to SED. Also use $SED and add AC_REQUIRE to ensure that we use the sed that was located by the standard configure test. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10493
* Add trim_finish notify script for ZEDKevin P. Fleming2020-06-242-1/+39
| | | | | | | | | | | Allow users to configure notifications when TRIM operations are completed on pools. Unlike resilver_finish and scrub_finish, the trim_finish event is generated for each vdev in the pool which was trimmed, so the script will generate a notification for each one. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Kevin P. Fleming <[email protected]> Closes #10491
* zed additional featuresJorgen Lundman2020-06-225-12/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds two features to zed, that macOS desires. The first is that when you unload the kernel module, zed would enter into a cpubusy loop calling zfs_events_next() repeatedly. We now look for ENODEV, returned by kernel, so zed can exit gracefully. Second feature is -I (idle) (alas -P persist was taken) is for the deamon to; 1; if started without ZFS kernel module, stick around waiting for it. 2; if kernel module is unloaded, go back to 1. This is due to daemons in macOS is started by launchctl, and is expected to stick around. Currently, the busy loop only exists when errno is ENODEV. This is to ensure that functionality that upstream expects is not changed. It did not care about errors before, and it still does not. (with the exception of ENODEV). However, it is probably better that all errors (ERESTART notwithstanding) exits the loop, and the issues complaining about zed taking all CPU will go away. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #10476
* Remove unnecessary terminology from error-injection in ztestSerapheim Dimitropoulos2020-06-221-13/+15
| | | | | | | | | Rephrase error-injection comment in ztest to be more clear. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Sara Hartse <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #10482
* zfs allow/unallow should work with numeric uid/gidAndriy Gapon2020-06-191-6/+13
| | | | | | | | | | | | | And that should work even (especially) if there is no matching user or group name. The change is originally by Xin Lin <[email protected]>. Original-patch-by: Xin Li <[email protected]> Reviewed-by: Yuri Pankov <[email protected]> Reviewed-by: Andy Stormont <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andriy Gapon <[email protected]> Closes #9792 Closes #10280
* Add include files for prototypesArvind Sankar2020-06-184-0/+4
| | | | | | | | | | Include the header with prototypes in the file that provides definitions as well, to catch any mismatch between prototype and definition. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10470
* Remove dead codeArvind Sankar2020-06-181-10/+0
| | | | | | | | | Delete unused functions. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10470
* Mark functions as staticArvind Sankar2020-06-189-28/+28
| | | | | | | | | | | Mark functions used only in the same translation unit as static. This only includes functions that do not have a prototype in a header file either. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10470
* Avoid adding new primitives in zpool waitJorgen Lundman2020-06-181-11/+22
| | | | | | | | | | zpool wait brought in sem_init() and family, which is a primitive set not previously used in Open ZFS. It also happens to be deprecated on macOS. Replace with phtread API calls. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Gallagher <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #10468
* Remove refences to blacklist/whitelistMatthew Ahrens2020-06-162-3/+3
| | | | | | | | | | | | | | | | | | | | These terms reinforce the incorrect notion that black is bad and white is good. Replace this language with more specific terms which are also more clear and don't rely on metaphor. Specifically: * When vdevs are specified on the command line, they are the "selected" vdevs. * Entries in /dev/ which should not be considered as possible disks are "excluded" devices. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10457
* Remove unnecessary references to slaveryMatthew Ahrens2020-06-102-9/+6
| | | | | | | | | | | | | | | | | | | | | | The horrible effects of human slavery continue to impact society. The casual use of the term "slave" in computer software is an unnecessary reference to a painful human experience. This commit removes all possible references to the term "slave". Implementation notes: The zpool.d/slaves script is renamed to dm-deps, which uses the same terminology as `dmsetup deps`. References to the `/sys/class/block/$dev/slaves` directory remain. This directory name is determined by the Linux kernel. Although `dmsetup deps` provides the same information, it unfortunately requires elevated privileges, whereas the `/sys/...` directory is world-readable. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10435
* Fix VPATH builds for user configArvind Sankar2020-06-101-1/+1
| | | | | | | | | | cmd/zpool and lib/libzutil Makefile's use -I., which won't work with a VPATH build. Replace it with -I$(srcdir) instead. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10379 Closes #10421
* File incorrectly zeroed when receiving incremental stream that toggles -LMatthew Ahrens2020-06-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Background: By increasing the recordsize property above the default of 128KB, a filesystem may have "large" blocks. By default, a send stream of such a filesystem does not contain large WRITE records, instead it decreases objects' block sizes to 128KB and splits the large blocks into 128KB blocks, allowing the large-block filesystem to be received by a system that does not support the `large_blocks` feature. A send stream generated by `zfs send -L` (or `--large-block`) preserves the large block size on the receiving system, by using large WRITE records. When receiving an incremental send stream for a filesystem with large blocks, if the send stream's -L flag was toggled, a bug is encountered in which the file's contents are incorrectly zeroed out. The contents of any blocks that were not modified by this send stream will be lost. "Toggled" means that the previous send used `-L`, but this incremental does not use `-L` (-L to no-L); or that the previous send did not use `-L`, but this incremental does use `-L` (no-L to -L). Changes: This commit addresses the problem with several changes to the semantics of zfs send/receive: 1. "-L to no-L" incrementals are rejected. If the previous send used `-L`, but this incremental does not use `-L`, the `zfs receive` will fail with this error message: incremental send stream requires -L (--large-block), to match previous receive. 2. "no-L to -L" incrementals are handled correctly, preserving the smaller (128KB) block size of any already-received files that used large blocks on the sending system but were split by `zfs send` without the `-L` flag. 3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`. This feature indicates that we can correctly handle "no-L to -L" incrementals. This flag is currently not set on any send streams. In the future, we intend for incremental send streams of snapshots that have large blocks to use `-L` by default, and these streams will also have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams from the default use of `zfs send` won't encounter the bug mentioned above, because they can't be received by software with the bug. Implementation notes: To facilitate accessing the ZPL's generation number, `zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and restructured to fill in a struct with ZPL-specific info including owner and generation. In the "no-L to -L" case, if this is a compressed send stream (from `zfs send -cL`), large WRITE records that are being written to small (128KB) blocksize files need to be decompressed so that they can be written split up into multiple blocks. The zio pipeline will recompress each smaller block individually. A new test case, `send-L_toggle`, is added, which tests the "no-L to -L" case and verifies that we get an error for the "-L to no-L" case. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #6224 Closes #10383
* Trim L2ARCGeorge Amanakis2020-06-091-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The l2arc_evict() function is responsible for evicting buffers which reference the next bytes of the L2ARC device to be overwritten. Teach this function to additionally TRIM that vdev space before it is overwritten if the device has been filled with data. This is done by vdev_trim_simple() which trims by issuing a new type of TRIM, TRIM_TYPE_SIMPLE. We also implement a "Trim Ahead" feature. It is a zfs module parameter, expressed in % of the current write size. This trims ahead of the current write size. A minimum of 64MB will be trimmed. The default is 0 which disables TRIM on L2ARC as it can put significant stress to underlying storage devices. To enable TRIM on L2ARC we set l2arc_trim_ahead > 0. We also implement TRIM of the whole cache device upon addition to a pool, pool creation or when the header of the device is invalid upon importing a pool or onlining a cache device. This is dependent on l2arc_trim_ahead > 0. TRIM of the whole device is done with TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t. We save the TRIM state for the whole device and the time of completion on-disk in the header, and restore these upon L2ARC rebuild so that zpool status -t can correctly report them. Whole device TRIM is done asynchronously so that the user can export of the pool or remove the cache device while it is trimming (ie if it is too slow). We do not TRIM the whole device if persistent L2ARC has been disabled by l2arc_rebuild_enabled = 0 because we may not want to lose all cached buffers (eg we may want to import the pool with l2arc_rebuild_enabled = 0 only once because of memory pressure). If persistent L2ARC has been disabled by setting the module parameter l2arc_rebuild_blocks_min_l2size to a value greater than the size of the cache device then the whole device is trimmed upon creation or import of a pool if l2arc_trim_ahead > 0. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam D. Moss <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #9713 Closes #9789 Closes #10224
* ztest: Fix spa_open() ENOENT failuresBrian Behlendorf2020-06-061-142/+154
| | | | | | | | | | | | | | The pool may not be imported when the previous pass is terminated. In which case, spa_open() will return ENOENT to indicate the pool is not currently imported. Refactor to code slightly to handle this case by importing the pool and then retrying the spa_open(). The ztest_import() function was moved before ztest_run() and the import logic split in to a small internal helper function. The ztest_freeze() function was also moved but no changes were made. Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10407
* ztest: Fix ztest_run_zdb() failureBrian Behlendorf2020-05-291-2/+2
| | | | | | | | | | | | | | | | | | | It's possible for ztest to be killed while the pool is exported which results in an empty cache file. This is a valid state to test, but the validation check performed by ztest_run_zdb() depends on the pool being in the cache file. If it's not the following error is printed. zdb -bccsv -G -d -Y -U /tmp/zloop-run/zpool.cache ztest zdb: can't open '/tmp/zloop-run': No such file or directory Resolve these failures by removing the dependency on the cache file. Functionally, we only care that the pool can be imported and that the zdb verification passes. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10385
* Revert "Let zfs mount all tolerate in-progress mounts"Brian Behlendorf2020-05-261-18/+1
| | | | | | | | | | | | | | | This reverts commit a9cd8bf which introduced a segfault when running `zfs mount -a` multiple times when there are mountpoints which are not empty. This segfault is now seen frequently by the CI after the mount code was updated to directly call mount(2). The original reason this logic was added is described in #8881. Since then the systemd `zfs-share.target` has been updated to run "After" the `zfs-mount.server` which should avoid this issue. Reviewed-by: Don Brady <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9560 Closes #10364
* mount: use the mount syscall directlyfelixdoerre2020-05-202-237/+2
| | | | | | | | | | | | | | | | | | | | Allow zfs datasets to be mounted on Linux without relying on the invocation of an external processes. This is the same behavior which is implemented for FreeBSD. Use of the libmount library was originally considered because it provides functionality to properly lock and update the /etc/mtab file. However, these days /etc/mtab is typically a symlink to /proc/self/mounts so there's nothing to updated. Therefore, we call mount(2) directly and avoid any additional dependencies. If required the legacy behavior can be enabled by setting the ZFS_MOUNT_HELPER environment variable. This may be needed in environments where SELinux in enabled and the zfs binary does not have mount permission. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Felix Dörre <[email protected]> #10294
* Small program that converts a dataset id and an object id to a pathPaul Dagnelie2020-05-204-1/+107
| | | | | | | | | Small program that converts a dataset id and an object id to a path Reviewed-by: Prakash Surya <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #10204
* Fix gcc 10.1 stringop-truncation errorGeorge Amanakis2020-05-191-3/+3
| | | | | | | | | As we do not expect the destination of these strncpy calls to be NULL terminated, substitute them with memcpy. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10346
* Fix outdated comment headerAJ Jordan2020-05-111-8/+13
| | | | | | | Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: AJ Jordan <[email protected]> Closes #10288
* Fix inconsistent capitalization in `arcstat -v`AJ Jordan2020-05-111-11/+11
| | | | | | | Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: AJ Jordan <[email protected]> Closes #10288
* Fixed LDADD library links in Makefiles for cross compilation buildsPetros Koutoupis2020-05-096-0/+6
| | | | | | | | | | | When building on native dev system, there are no issues but when cross-compiling for target system, some linker errors are observed. The only way to avoid these errors is by adjusting the Makefile.am of those various components to add the library dependencies. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Petros Koutoupis <[email protected]> Closes #10304
* Improvements on persistent L2ARCGeorge Amanakis2020-05-071-37/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Functional changes: We implement refcounts of log blocks and their aligned size on the cache device along with two corresponding arcstats. The refcounts are reflected in the header of the device and provide valuable information as to whether log blocks are accounted for correctly. These are dynamically adjusted as log blocks are committed/evicted. zdb also uses this information in the device header and compares it to the corresponding values as reported by dump_l2arc_log_blocks() which emulates l2arc_rebuild(). If the refcounts saved in the device header report higher values, zdb exits with an error. For this feature to work correctly there should be no active writes on the device. This is also employed in the tests of persistent L2ARC. We extend the structure of the cache device header by adding the two new variables mirroring the refcounts after the existing variables to preserve backward compatibility in terms of persistent L2ARC. 1) a new arcstat "l2_log_blk_asize" and refcount "l2ad_lb_asize" which reflect the total aligned size of log blocks on the device. This is also reflected in the header of the cache device as "dh_lb_asize". 2) a new arcstat "l2arc_log_blk_count" and refcount "l2ad_lb_count" which reflect the total number of L2ARC log blocks present on cache devices. It is also reflected in the header of the cache device as "dh_lb_count". In l2arc_rebuild_vdev() if the amount of committed log entries in a log block is 0 and the device header is valid we update the device header. This will facilitate trimming of the whole device in this case when TRIM for L2ARC is implemented. Improve loop protection in l2arc_rebuild() by using the starting offset of the payload of each log block instead of the starting offset of the log block. If the zio in l2arc_write_buffers() fails, restore the lbps array in the header of the device to its previous state in l2arc_write_done(). If l2arc_rebuild() ends the rebuild process without restoring any L2ARC log blocks in ARC and without any other error, this means that the lbps array in the header is pointing to non-existent or invalid log blocks. Reset the device header in this case. In l2arc_rebuild() change the zfs_dbgmsg messages to spa_history_log_internal() making them user visible with zpool history command. Non-functional changes: Make the first test in persistent L2ARC use `zdb -lll` to increase coverage in `zdb.c`. Rename psize with asize when referring to log blocks, since L2ARC_SET_PSIZE stores the vdev aligned size for log blocks. Also rename dh_log_blk_entries to dh_log_entries to make it clear that it is a mirror of l2ad_log_entries. Added comments for both changes. Fix inaccurate comments for example in l2arc_log_blk_restore(). Add asserts at the end in l2arc_evict() and l2arc_write_buffers(). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10228
* Add support for boot environment data to be stored in the labelPaul Dagnelie2020-05-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modern bootloaders leverage data stored in the root filesystem to enable some of their powerful features. GRUB specifically has a grubenv file which can store large amounts of configuration data that can be read and written at boot time and during normal operation. This allows sysadmins to configure useful features like automated failover after failed boot attempts. Unfortunately, due to the Copy-on-Write nature of ZFS, the standard behavior of these tools cannot handle writing to ZFS files safely at boot time. We need an alternative way to store data that allows the bootloader to make changes to the data. This work is very similar to work that was done on Illumos to enable similar functionality in the FreeBSD bootloader. This patch is different in that the data being stored is a raw grubenv file; this file can store arbitrary variables and values, and the scripting provided by grub is powerful enough that special structures are not required to implement advanced behavior. We repurpose the second padding area in each label to store the grubenv file, protected by an embedded checksum. We add two ioctls to get and set this data, and libzfs_core and libzfs functions to access them more easily. There are no direct command line interfaces to these functions; these will be added directly to the bootloader utilities. Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #10009
* Fix column width calculation issue with certain terminal widthsPhilip Pokorny2020-05-061-4/+30
| | | | | | | | | | | | If the reported terminal width is 0 or less than 42, the signed variable width was set to a negative number that was then assigned to the unsigned column width becoming a huge number. Add comments and change logic to better explain what's happening. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Philip Pokorny <[email protected]> Closes #10247
* zdb: Fix ignored zfs_arc_max tuningRyan Moeller2020-04-301-1/+2
| | | | | | | | | | | | | | Running zdb -l $disk shows a warning that zfs_arc_max is being ignored. zdb sets zfs_arc_max below zfs_arc_min, which causes the value to be ignored by arc_tuning_update(). Set zfs_arc_min to the bare minimum in zdb, which is below zfs_arc_max. Reviewed by: Brian Behlendorf <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10269
* zfs_create: round up volume size to multiple of bsalex2020-04-241-0/+25
| | | | | | | | | | | Round up the volume size requested in `zfs create -V size` to the next higher multiple of the volblocksize. Updates the man page and adds a test to verify the new behavior. Reviewed-by: Brian Behlendorf <[email protected]> Reported-by: puffi <[email protected]> Signed-off-by: Alex John <[email protected]> Closes #8541 Closes #10196
* Fix unitialized variable in `zstream redup` commandBrian Behlendorf2020-04-231-1/+1
| | | | | | | | | | | | | Fix uninitialized variable in `zstream redup` command. The compiler may determine the 'stream_offset' variable can be uninitialized because not all rdt_lookup() exit paths set it. This should never happen in practice as documented by the assert, but initialize it regardless to resolve the warning. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10241 Closes #10244
* Remove deduplicated send/receive codeMatthew Ahrens2020-04-231-14/+7
| | | | | | | | | | | | | | | | | | | | | | | | | Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such streams) are deprecated. Deduplicated send streams can be received by first converting them to non-deduplicated with the `zstream redup` command. This commit removes the code for sending and receiving deduplicated send streams. `zfs send -D` will now print a warning, ignore the `-D` flag, and generate a regular (non-deduplicated) send stream. `zfs receive` of a deduplicated send stream will print an error message and fail. The resulting code simplification (especially in the kernel's support for receiving dedup streams) should help enable future performance enhancements. Several new tests are added which leverage `zstream redup`. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Issue #7887 Issue #10117 Issue #10156 Closes #10212
* Don't attempt trimming "hole" vdevsNiklas Haas2020-04-211-1/+2
| | | | | | | | | | | | | On zpools containing hole vdevs (e.g. removed log devices), the `zpool trim` (and presumably `zpool initialize`) commands will attempt calling their respective functions on "hole", which fails, as this is not a real vdev. Avoid this by removing HOLE vdevs in zpool_collect_leaves. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Niklas Haas <[email protected]> Closes #10227
* Persistent L2ARC minor fixesGeorge Amanakis2020-04-171-2/+1
| | | | | | | | | | | Minor fixes on persistent L2ARC improving code readability and fixing a typo in zdb.c when byte-swapping a log block. It also improves the pesist_l2arc_007_pos.ksh test by giving it more time to retrieve log blocks on the cache device. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam D. Moss <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10210
* Fix SC2086 note in zpool.d/smartRyan Moeller2020-04-141-1/+1
| | | | | | | | | | ./cmd/zpool/zpool.d/smart:78:32: note: Double quote to prevent globbing and word splitting. [SC2086] Reported by latest shellcheck on FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10194
* Add FreeBSD support to OpenZFSMatthew Macy2020-04-144-2/+126
| | | | | | | | | | | | | | | | | | Add the FreeBSD platform code to the OpenZFS repository. As of this commit the source can be compiled and tested on FreeBSD 11 and 12. Subsequent commits are now required to compile on FreeBSD and Linux. Additionally, they must pass the ZFS Test Suite on FreeBSD which is being run by the CI. As of this commit 1230 tests pass on FreeBSD and there are no unexpected failures. Reviewed-by: Sean Eric Fagan <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #898 Closes #8987
* Fix allocation errors, detected using ASANJoao Carlos Mendes Luis2020-04-131-2/+3
| | | | | | | | | | | | The test for VDEV_TYPE_INDIRECT is done after a memory allocation, and could return from function without freeing it. Since we don't need that allocation yet, just postpone it. Add a missing free() when buffer is no longer needed. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: João Carlos Mendes Luís <[email protected]> Closes #10193