aboutsummaryrefslogtreecommitdiffstats
path: root/man/man8
Commit message (Collapse)AuthorAgeFilesLines
...
* zpool(8): fix pool-wi[sd]e typoнаб2020-11-161-1/+1
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #11202
* zgenhostid: accept hostid arguments equal to zero.Érico Rolim2020-11-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A common usage pattern for zgenhostid, including in the ZFS dracut module, is running it as: zgenhostid $(hostid) However, zgenhostid only accepted hostid arguments greater than 0, which meant that, when the output of hostid(1) was "00000000", zgenhostid would error out, even though 0 is a possible return value for the gethostid(3) function used by hostid(1): - On current musl libc, gethostid(3) is a stub that always returns 0. - On glibc, gethostid(3) will return 0 if /etc/hostid exists but is smaller than 4 bytes. In these cases, it makes more sense for zgenhostid to treat a value of 0 as other parts of the zfs codebase do, meaning that a hostid value couldn't be determined; therefore, it should attempt to generate a random value to write into /etc/hostid. The manpage and usage output have been updated to reflect this. Whitespace has also been fixed in the usage output. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Georgy Yakovlev <[email protected]> Reviewed-by: Andrew J. Hesford <[email protected]> Signed-off-by: Érico Rolim <[email protected]> Closes #11174 Closes #11189
* Distributed Spare (dRAID) FeatureBrian Behlendorf2020-11-133-4/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid|raidz|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <[email protected]> Co-authored-by: Mark Maybee <[email protected]> Co-authored-by: Don Brady <[email protected]> Co-authored-by: Matthew Ahrens <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10102
* zpoolprops.8: clarify vdev expansion rulesнаб2020-11-101-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | Remove reference to EFI(?), explain that the new space is beyond the GPT for whole-disk vdevs, and add section noting how it behaves with partition vdevs in terms of how the user is most likely to encounter it ‒ the previous phrasing was confusing and seemed to indicate that "zpool online -e" will be able to claim GPT[whatever, ZFS, free space, whatever] into GPT[whatever, ZFS, whatever] but that's not the case, as it'll only be able to do so after manually resizing the ZFS partition to include the free space beforehand, i.e.: GPT[whatever, ZFS, free space, whatever] GPT[whatever, [ZFS + free space], potentially left-overs, whatever] # zpool online -e GPT[whatever, ZFS, whatever] Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #11158
* Properly format NAME subsection of zfs/zpool subcommandsxtouqh2020-10-2253-250/+251
| | | | | | | | | | | | | Use proper names (i.e. zfs-allow and zpool-add) in NAME subsections of zfs/zpool subcommands instead of current "pretty-printed" ones as makewhatis utilities (or some implementations of it, namely the one from mandoc suite used in FreeBSD) look not only at the document title but also in NAME subsection, adding zfs(8)/zpool(8) to search results which is not correct. (Common sense and other utilities splitting subcommands in multiple man pages, e.g. git, do the same.) Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: xtouqh <[email protected]> Closes #11086
* Cross-platform acltypeRyan Moeller2020-10-131-3/+14
| | | | | | | | | | | | | | | | | The acltype property is currently hidden on FreeBSD and does not reflect the NFSv4 style ZFS ACLs used on the platform. This makes it difficult to observe that a pool imported from FreeBSD on Linux has a different type of ACL that is being ignored, and vice versa. Add an nfsv4 acltype and expose the property on FreeBSD. Make the default acltype nfsv4 on FreeBSD. Setting acltype to an unhanded style is treated the same as setting it to off. The ACLs will not be removed, but they will be ignored. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10520
* Add zpool_influxdb commandRichard Elling2020-10-092-1/+95
| | | | | | | | | | | | | | | A zpool_influxdb command is introduced to ease the collection of zpool statistics into the InfluxDB time-series database. Examples are given on how to integrate with the telegraf statistics aggregator, a companion to influxdb. Finally, a grafana dashboard template is included to show how pool latency distributions can be visualized in a ZFS + telegraf + influxdb + grafana environment. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Richard Elling <[email protected]> Closes #10786
* Linux: Initialize zp in zfs_setattr_dirRyan Moeller2020-10-091-1/+1
| | | | | | | | | | | | | | | | The value of zp is used without having been initialized under some conditions. Initialize the pointer to NULL. Add a regression test case using chown in acl/posix. However, this is not enough because the setup sets xattr=sa, which means zfs_setattr_dir will not be called. Create a second group of acl tests in acl/posix-sa duplicating the acl/posix tests with symlinks, and remove xattr=sa from the original acl/posix tests. This provides more coverage for the default xattr=on code. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10043 Closes #11025
* Replace ZFS on Linux references with OpenZFSBrian Behlendorf2020-10-082-3/+2
| | | | | | | | | | | | | This change updates the documentation to refer to the project as OpenZFS instead ZFS on Linux. Web links have been updated to refer to https://github.com/openzfs/zfs. The extraneous zfsonlinux.org web links in the ZED and SPL sources have been dropped. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11007
* Note that keys must be loaded for 'zpool remove'grodik2020-09-281-0/+2
| | | | | | | | | | | | | | | The error returned by `zpool remove` when the encryption keys aren't loaded isn't very helpful. Furthermore, the man pages make no mention that the keys need to be loaded. This change doesn't resolve the error message but it does update the man page to mention this requirement. Authored-by: grodik <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10939 Closes #10948
* zfs userspace: use zfs_path_to_zhandle so argument can be a pathAllan Jude2020-09-251-7/+8
| | | | | | | | | Change zfs userspace subcommand to use zfs_path_to_zhandle() so that the provided dataset can be a path (/usr) or a dataset (rpool/usr). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #8915
* Rename acltype=posixacl to acltype=posixRyan Moeller2020-09-161-3/+6
| | | | | | | Prefer acltype=off|posix, retaining the old names as aliases. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10918
* cmd/zgenhostid: replace with simple c implementationGeorgy Yakovlev2020-09-161-20/+51
| | | | | | | | | | | | | | | | | | It was discovered that dracut scripts and zgenhostid always generate little-endian /etc/hostid. This commit provides simple endianess-aware binary and updates the scripts to use it. New features include: -f flag to force overwrite. -o flag to write to different file (for dracut) accepting both 0x01234567 and 01234567 values as input Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Signed-off-by: Georgy Yakovlev <[email protected]> Closes #10887 Closes #10925
* zpoolprops.8: fix raidz par[i]ty typoнаб2020-09-151-1/+1
| | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #10923
* man: Cross-reference zfs-load-key(8) for ENCRYPTION mentionRyan Moeller2020-09-041-3/+4
| | | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Submitted-by: Harry Schmalzbauer Signed-off-by: Ryan Moeller <[email protected]> Closes #10866
* man: Add `zfs rename -r` to zfs-rename(8) SYNOPSISRyan Moeller2020-09-041-3/+6
| | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10866
* Add 'zfs rename -u' to rename without remountingRyan Moeller2020-09-011-3/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | Allow to rename file systems without remounting if it is possible. It is possible for file systems with 'mountpoint' property set to 'legacy' or 'none' - we don't have to change mount directory for them. Currently such file systems are unmounted on rename and not even mounted back. This introduces layering violation, as we need to update 'f_mntfromname' field in statfs structure related to mountpoint (for the dataset we are renaming and all its children). In my opinion it is worth it, as it allow to update FreeBSD in even cleaner way - in ZFS-only configuration root file system is ZFS file system with 'mountpoint' property set to 'legacy'. If root dataset is named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone it (system/oldrootfs), update FreeBSD and if it doesn't boot we can boot back from system/oldrootfs and rename it back to system/rootfs while it is mounted as /. Before it was not possible, because unmounting / was not possible. Authored by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported by: Matt Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10839
* Add the Xr's to the SEE ALSO as wellJohn-Mark Gurney2020-08-262-3/+70
| | | | | | | | | | | | | There are a ton of zfs-* and zpool-* man pages. This adds them to the SEE ALSO section so that people can more quickly look through what all the options are, now that the pages have been split. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Allan Jude <[email protected]> Signed-off-by: John-Mark Gurney <[email protected]> Closes #10589
* man: Canonicalize .TH usageRyan Moeller2020-08-248-9/+9
| | | | | | | | | | | | | * Use all caps for document title. * Remove section name as it can be inferred from the section number. * Name "OpenZFS" as the document source. * Bump modification date. While here, fixed trailing whitespace reported by igor. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10792
* Improve documentation of zpool import -d/-c vs -sChris McDonough2020-08-231-3/+4
| | | | | | | | | | | Specify that, by default, zpool import uses the libblkid cache on Linux and geom on FreeBSD, and only scans when -d/-s is provided. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Chris McDonough <[email protected]> Closes #7656 Closes #10771
* Import vdev ashift optimization from FreeBSDRyan Moeller2020-08-211-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many modern devices use physical allocation units that are much larger than the minimum logical allocation size accessible by external commands. Two prevalent examples of this are 512e disk drives (512b logical sector, 4K physical sector) and flash devices (512b logical sector, 4K or larger allocation block size, and 128k or larger erase block size). Operations that modify less than the physical sector size result in a costly read-modify-write or garbage collection sequence on these devices. Simply exporting the true physical sector of the device to ZFS would yield optimal performance, but has two serious drawbacks: 1. Existing pools created with devices that have different logical and physical block sizes, but were configured to use the logical block size (e.g. because the OS version used for pool construction reported the logical block size instead of the physical block size) will suddenly find that the vdev allocation size has increased. This can be easily tolerated for active members of the array, but ZFS would prevent replacement of a vdev with another identical device because it now appears that the smaller allocation size required by the pool is not supported by the new device. 2. The device's physical block size may be too large to be supported by ZFS. The optimal allocation size for the vdev may be quite large. For example, a RAID controller may export a vdev that requires read-modify-write cycles unless accessed using 64k aligned/sized requests. ZFS currently has an 8k minimum block size limit. Reporting both the logical and physical allocation sizes for vdevs solves these problems. A device may be used so long as the logical block size is compatible with the configuration. By comparing the logical and physical block sizes, new configurations can be optimized and administrators can be notified of any existing pools that are sub-optimal. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Matthew Macy <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10619
* Remove hard coded "Linux" OS from manpagesRyan Moeller2020-08-2162-61/+71
| | | | | | | | | | | | | | | The recommended practice for `.Os` on FreeBSD is to not specify any arguments. The correct OS name is used automatically. Oddly enough, on the Linux distro I tested this on (CentOS 7), the man pager defaulted to displaying "BSD" as the OS rather than "Linux". To accommodate this, tack " Linux" back on in an install hook on Linux. This is much simpler than removing it for FreeBSD when vendored in the base system. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10760
* Add zstd support to zfsMichael Niewöhner2020-08-201-1/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Co-authored-by: Sebastian Gottschall <[email protected]> Co-authored-by: Kjeld Schouten-Lebbing <[email protected]> Co-authored-by: Michael Niewöhner <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Sebastian Gottschall <[email protected]> Signed-off-by: Kjeld Schouten-Lebbing <[email protected]> Signed-off-by: Michael Niewöhner <[email protected]> Closes #6247 Closes #9024 Closes #10277 Closes #10278
* Remove GRUB restrictionsRichard Laager2020-08-171-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The GRUB restrictions are based around the pool's bootfs property. Given the current situation where GRUB is not staying current with OpenZFS pool features, having either a non-ZFS /boot or a separate pool with limited features are pretty much the only long-term answers for GRUB support. Only the second case matters in this context. For the restrictions to be useful, the bootfs property would have to be set on the boot pool, because that is where we need the restrictions, as that is the pool that GRUB reads from. The documentation for bootfs describes it as pointing to the root pool. That's also how it's used in the initramfs. ZFS does not allow setting bootfs to point to a dataset in another pool. (If it did, it'd be difficult-to-impossible to enforce these restrictions cross-pool). Accordingly, bootfs is pretty much useless for GRUB scenarios moving forward. Even for users who have only one pool, the existing restrictions for GRUB are incomplete. They don't prevent you from enabling the unsupported checksums, for example. For that reason, I have ripped out all the GRUB restrictions. A little longer-term, I think extending the proposed features=portable system to define a features=grub is a much more useful approach. The user could set that on the boot pool at creation, and things would Just Work. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8627
* Add support to decode a resume tokentony-zfs2020-07-231-0/+9
| | | | | | | | | | | | Adding a new subcommand to zstream called token. This now allows users to decode a resume token to retrieve the toname field. This can be useful for tools that need this information. The syntax works as follows zstream token <resume_token>. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Zuchowski <[email protected]> Signed-off-by: Tony Perkins <[email protected]> Closes #10558
* Extend zdb to print inconsistencies in livelists and metaslabsMatthew Ahrens2020-07-141-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Livelists and spacemaps are data structures that are logs of allocations and frees. Livelists entries are block pointers (blkptr_t). Spacemaps entries are ranges of numbers, most often used as to track allocated/freed regions of metaslabs/vdevs. These data structures can become self-inconsistent, for example if a block or range can be "double allocated" (two allocation records without an intervening free) or "double freed" (two free records without an intervening allocation). ZDB (as well as zfs running in the kernel) can detect these inconsistencies when loading livelists and metaslab. However, it generally halts processing when the error is detected. When analyzing an on-disk problem, we often want to know the entire set of inconsistencies, which is not possible with the current behavior. This commit adds a new flag, `zdb -y`, which analyzes the livelist and metaslab data structures and displays all of their inconsistencies. Note that this is different from the leak detection performed by `zdb -b`, which checks for inconsistencies between the spacemaps and the tree of block pointers, but assumes the spacemaps are self-consistent. The specific checks added are: Verify livelists by iterating through each sublivelists and: - report leftover FREEs - report double ALLOCs and double FREEs - record leftover ALLOCs together with their TXG [see Cross Check] Verify spacemaps by iterating over each metaslab and: - iterate over spacemap and then the metaslab's entries in the spacemap log, then report any double FREEs and double ALLOCs Verify that livelists are consistenet with spacemaps. The space referenced by livelists (after using the FREE's to cancel out corresponding ALLOCs) should be allocated, according to the spacemaps. Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Sara Hartse <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> External-issue: DLPX-66031 Closes #10515
* Centralize variable substitutionArvind Sankar2020-07-141-13/+3
| | | | | | | | | | | | A bunch of places need to edit files to incorporate the configured paths i.e. bindir, sbindir etc. Move this logic into a common file. Create arc_summary by copying arc_summary[23] as appropriate at build time instead of install time. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10559
* Add device rebuild featureBrian Behlendorf2020-07-033-13/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The device_rebuild feature enables sequential reconstruction when resilvering. Mirror vdevs can be rebuilt in LBA order which may more quickly restore redundancy depending on the pools average block size, overall fragmentation and the performance characteristics of the devices. However, block checksums cannot be verified as part of the rebuild thus a scrub is automatically started after the sequential resilver completes. The new '-s' option has been added to the `zpool attach` and `zpool replace` command to request sequential reconstruction instead of healing reconstruction when resilvering. zpool attach -s <pool> <existing vdev> <new vdev> zpool replace -s <pool> <old vdev> <new vdev> The `zpool status` output has been updated to report the progress of sequential resilvering in the same way as healing resilvering. The one notable difference is that multiple sequential resilvers may be in progress as long as they're operating on different top-level vdevs. The `zpool wait -t resilver` command was extended to wait on sequential resilvers. From this perspective they are no different than healing resilvers. Sequential resilvers cannot be supported for RAIDZ, but are compatible with the dRAID feature being developed. As part of this change the resilver_restart_* tests were moved in to the functional/replacement directory. Additionally, the replacement tests were renamed and extended to verify both resilvering and rebuilding. Original-patch-by: Isaac Huang <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: John Poduska <[email protected]> Co-authored-by: Mark Maybee <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10349
* Fixes for make distArvind Sankar2020-06-261-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduce the usage of EXTRA_DIST. If files are conditionally included in _SOURCES, _HEADERS etc, automake is smart enough to dist all files that could possibly be included, but this does not apply to EXTRA_DIST, resulting in make dist depending on the configuration. Add some files that were missing altogether in various Makefile's. The changes to disted files in this commit (excluding deleted files): +./cmd/zed/agents/README.md +./etc/init.d/README.md +./lib/libspl/os/freebsd/getexecname.c +./lib/libspl/os/freebsd/gethostid.c +./lib/libspl/os/freebsd/getmntany.c +./lib/libspl/os/freebsd/mnttab.c -./lib/libzfs/libzfs_core.pc -./lib/libzfs/libzfs.pc +./lib/libzfs/os/freebsd/libzfs_compat.c +./lib/libzfs/os/freebsd/libzfs_fsshare.c +./lib/libzfs/os/freebsd/libzfs_ioctl_compat.c +./lib/libzfs/os/freebsd/libzfs_zmount.c +./lib/libzutil/os/freebsd/zutil_compat.c +./lib/libzutil/os/freebsd/zutil_device_path_os.c +./lib/libzutil/os/freebsd/zutil_import_os.c +./module/lua/README.zfs +./module/os/linux/spl/README.md +./tests/README.md +./tests/zfs-tests/tests/functional/cli_root/zfs_clone/zfs_clone_rm_nested.ksh +./tests/zfs-tests/tests/functional/cli_root/zfs_send/zfs_send_encrypted_unloaded.ksh +./tests/zfs-tests/tests/functional/inheritance/README.config +./tests/zfs-tests/tests/functional/inheritance/README.state +./tests/zfs-tests/tests/functional/rsend/rsend_016_neg.ksh +./tests/zfs-tests/tests/perf/fio/sequential_readwrite.fio Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10501
* zed additional featuresJorgen Lundman2020-06-221-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds two features to zed, that macOS desires. The first is that when you unload the kernel module, zed would enter into a cpubusy loop calling zfs_events_next() repeatedly. We now look for ENODEV, returned by kernel, so zed can exit gracefully. Second feature is -I (idle) (alas -P persist was taken) is for the deamon to; 1; if started without ZFS kernel module, stick around waiting for it. 2; if kernel module is unloaded, go back to 1. This is due to daemons in macOS is started by launchctl, and is expected to stick around. Currently, the busy loop only exists when errno is ENODEV. This is to ensure that functionality that upstream expects is not changed. It did not care about errors before, and it still does not. (with the exception of ENODEV). However, it is probably better that all errors (ERESTART notwithstanding) exits the loop, and the issues complaining about zed taking all CPU will go away. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #10476
* man.8: Add bookmark to list of typesGrischa Zengel2020-06-101-1/+2
| | | | | | | | | | | | | | | While checking bash_completion I missed bookmark as type. ``` # zfs get type zpool2#b NAME PROPERTY VALUE SOURCE zpool2#b type bookmark - ``` Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Grischa Zengel <[email protected]> Closes #10419
* Remove unnecessary references to slaveryMatthew Ahrens2020-06-101-12/+7
| | | | | | | | | | | | | | | | | | | | | | The horrible effects of human slavery continue to impact society. The casual use of the term "slave" in computer software is an unnecessary reference to a painful human experience. This commit removes all possible references to the term "slave". Implementation notes: The zpool.d/slaves script is renamed to dm-deps, which uses the same terminology as `dmsetup deps`. References to the `/sys/class/block/$dev/slaves` directory remain. This directory name is determined by the Linux kernel. Although `dmsetup deps` provides the same information, it unfortunately requires elevated privileges, whereas the `/sys/...` directory is world-readable. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10435
* Fix typosAndrea Gelmini2020-06-091-1/+1
| | | | | | | | | Correct various typos in the comments and tests. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #10423
* Trim L2ARCGeorge Amanakis2020-06-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The l2arc_evict() function is responsible for evicting buffers which reference the next bytes of the L2ARC device to be overwritten. Teach this function to additionally TRIM that vdev space before it is overwritten if the device has been filled with data. This is done by vdev_trim_simple() which trims by issuing a new type of TRIM, TRIM_TYPE_SIMPLE. We also implement a "Trim Ahead" feature. It is a zfs module parameter, expressed in % of the current write size. This trims ahead of the current write size. A minimum of 64MB will be trimmed. The default is 0 which disables TRIM on L2ARC as it can put significant stress to underlying storage devices. To enable TRIM on L2ARC we set l2arc_trim_ahead > 0. We also implement TRIM of the whole cache device upon addition to a pool, pool creation or when the header of the device is invalid upon importing a pool or onlining a cache device. This is dependent on l2arc_trim_ahead > 0. TRIM of the whole device is done with TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t. We save the TRIM state for the whole device and the time of completion on-disk in the header, and restore these upon L2ARC rebuild so that zpool status -t can correctly report them. Whole device TRIM is done asynchronously so that the user can export of the pool or remove the cache device while it is trimming (ie if it is too slow). We do not TRIM the whole device if persistent L2ARC has been disabled by l2arc_rebuild_enabled = 0 because we may not want to lose all cached buffers (eg we may want to import the pool with l2arc_rebuild_enabled = 0 only once because of memory pressure). If persistent L2ARC has been disabled by setting the module parameter l2arc_rebuild_blocks_min_l2size to a value greater than the size of the cache device then the whole device is trimmed upon creation or import of a pool if l2arc_trim_ahead > 0. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam D. Moss <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #9713 Closes #9789 Closes #10224
* mount: use the mount syscall directlyfelixdoerre2020-05-201-0/+9
| | | | | | | | | | | | | | | | | | | | Allow zfs datasets to be mounted on Linux without relying on the invocation of an external processes. This is the same behavior which is implemented for FreeBSD. Use of the libmount library was originally considered because it provides functionality to properly lock and update the /etc/mtab file. However, these days /etc/mtab is typically a symlink to /proc/self/mounts so there's nothing to updated. Therefore, we call mount(2) directly and avoid any additional dependencies. If required the legacy behavior can be enabled by setting the ZFS_MOUNT_HELPER environment variable. This may be needed in environments where SELinux in enabled and the zfs binary does not have mount permission. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Felix Dörre <[email protected]> #10294
* Small program that converts a dataset id and an object id to a pathPaul Dagnelie2020-05-202-0/+51
| | | | | | | | | Small program that converts a dataset id and an object id to a path Reviewed-by: Prakash Surya <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #10204
* Improvements on persistent L2ARCGeorge Amanakis2020-05-071-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Functional changes: We implement refcounts of log blocks and their aligned size on the cache device along with two corresponding arcstats. The refcounts are reflected in the header of the device and provide valuable information as to whether log blocks are accounted for correctly. These are dynamically adjusted as log blocks are committed/evicted. zdb also uses this information in the device header and compares it to the corresponding values as reported by dump_l2arc_log_blocks() which emulates l2arc_rebuild(). If the refcounts saved in the device header report higher values, zdb exits with an error. For this feature to work correctly there should be no active writes on the device. This is also employed in the tests of persistent L2ARC. We extend the structure of the cache device header by adding the two new variables mirroring the refcounts after the existing variables to preserve backward compatibility in terms of persistent L2ARC. 1) a new arcstat "l2_log_blk_asize" and refcount "l2ad_lb_asize" which reflect the total aligned size of log blocks on the device. This is also reflected in the header of the cache device as "dh_lb_asize". 2) a new arcstat "l2arc_log_blk_count" and refcount "l2ad_lb_count" which reflect the total number of L2ARC log blocks present on cache devices. It is also reflected in the header of the cache device as "dh_lb_count". In l2arc_rebuild_vdev() if the amount of committed log entries in a log block is 0 and the device header is valid we update the device header. This will facilitate trimming of the whole device in this case when TRIM for L2ARC is implemented. Improve loop protection in l2arc_rebuild() by using the starting offset of the payload of each log block instead of the starting offset of the log block. If the zio in l2arc_write_buffers() fails, restore the lbps array in the header of the device to its previous state in l2arc_write_done(). If l2arc_rebuild() ends the rebuild process without restoring any L2ARC log blocks in ARC and without any other error, this means that the lbps array in the header is pointing to non-existent or invalid log blocks. Reset the device header in this case. In l2arc_rebuild() change the zfs_dbgmsg messages to spa_history_log_internal() making them user visible with zpool history command. Non-functional changes: Make the first test in persistent L2ARC use `zdb -lll` to increase coverage in `zdb.c`. Rename psize with asize when referring to log blocks, since L2ARC_SET_PSIZE stores the vdev aligned size for log blocks. Also rename dh_log_blk_entries to dh_log_entries to make it clear that it is a mirror of l2ad_log_entries. Added comments for both changes. Fix inaccurate comments for example in l2arc_log_blk_restore(). Add asserts at the end in l2arc_evict() and l2arc_write_buffers(). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10228
* OpenZFS 3254 - add support in zfs for aclmode=restrictedPaul B. Henson2020-04-301-1/+1
| | | | | | | | | | | | | Authored-by: Paul B. Henson <[email protected]> Reviewed by: Albert Lee <[email protected]> Reviewed by: Gordon Ross <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Richard Lowe <[email protected]> Ported-by: Paul B. Henson <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/3254 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/71dbfc287c Closes #10266
* OpenZFS 742 - Resurrect the ZFS "aclmode" property OpenZFS 664 - Umask ↵Paul B. Henson2020-04-302-12/+14
| | | | | | | | | | | | | | | | | | | | | | | masking "deny" ACL entries OpenZFS 279 - Bug in the new ACL (post-PSARC/2010/029) semantics Porting notes: * Updated zfs_acl_chmod to take 'boolean_t isdir' as first parameter rather than 'zfsvfs_t *zfsvfs' * zfs man pages changes mixed between zfs and new zfsprops man pages Reviewed by: Aram Hvrneanu <[email protected]> Reviewed by: Gordon Ross <[email protected]> Reviewed by: Robert Gordon <[email protected]> Reviewed by: [email protected] Reviewed by: Brian Behlendorf <[email protected]> Approved by: Garrett D'Amore <[email protected]> Ported-by: Paul B. Henson <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/742 OpenZFS-issue: https://www.illumos.org/issues/664 OpenZFS-issue: https://www.illumos.org/issues/279 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/a3c49ce110 Closes #10266
* zfs_create: round up volume size to multiple of bsalex2020-04-241-2/+1
| | | | | | | | | | | Round up the volume size requested in `zfs create -V size` to the next higher multiple of the volblocksize. Updates the man page and adds a test to verify the new behavior. Reviewed-by: Brian Behlendorf <[email protected]> Reported-by: puffi <[email protected]> Signed-off-by: Alex John <[email protected]> Closes #8541 Closes #10196
* Remove deduplicated send/receive codeMatthew Ahrens2020-04-232-25/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such streams) are deprecated. Deduplicated send streams can be received by first converting them to non-deduplicated with the `zstream redup` command. This commit removes the code for sending and receiving deduplicated send streams. `zfs send -D` will now print a warning, ignore the `-D` flag, and generate a regular (non-deduplicated) send stream. `zfs receive` of a deduplicated send stream will print an error message and fail. The resulting code simplification (especially in the kernel's support for receiving dedup streams) should help enable future performance enhancements. Several new tests are added which leverage `zstream redup`. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Issue #7887 Issue #10117 Issue #10156 Closes #10212
* Add `zstream redup` command to convert deduplicated send streamsMatthew Ahrens2020-04-102-0/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Deduplicated send and receive is deprecated. To ease migration to the new dedup-send-less world, the commit adds a `zstream redup` utility to convert deduplicated send streams to normal streams, so that they can continue to be received indefinitely. The new `zstream` command also replaces the functionality of `zstreamdump`, by way of the `zstream dump` subcommand. The `zstreamdump` command is replaced by a shell script which invokes `zstream dump`. The way that `zstream redup` works under the hood is that as we read the send stream, we build up a hash table which maps from `<GUID, object, offset> -> <file_offset>`. Whenever we see a WRITE record, we add a new entry to the hash table, which indicates where in the stream file to find the WRITE record for this block. (The key is `drr_toguid, drr_object, drr_offset`.) For entries other than WRITE_BYREF, we pass them through unchanged (except for the running checksum, which is recalculated). For WRITE_BYREF records, we change them to WRITE records. We find the referenced WRITE record by looking in the hash table (for the record with key `drr_refguid, drr_refobject, drr_refoffset`), and then reading the record header and payload from the specified offset in the stream file. This is why the stream can not be a pipe. The found WRITE record replaces the WRITE_BYREF record, with its `drr_toguid`, `drr_object`, and `drr_offset` fields changed to be the same as the WRITE_BYREF's (i.e. we are writing the same logical block, but with the data supplied by the previous WRITE record). This algorithm requires memory proportional to the number of WRITE records (same as `zfs send -D`), but the size per WRITE record is relatively low (40 bytes, vs. 72 for `zfs send -D`). A 1TB send stream with 8KB blocks (`recordsize=8k`) would use around 5GB of RAM to "redup". Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10124 Closes #10156
* Persistent L2ARCGeorge Amanakis2020-04-103-8/+37
| | | | | | | | | | | | | | | | | | | | | This commit makes the L2ARC persistent across reboots. We implement a light-weight persistent L2ARC metadata structure that allows L2ARC contents to be recovered after a reboot. This significantly eases the impact a reboot has on read performance on systems with large caches. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Saso Kiselkov <[email protected]> Co-authored-by: Jorgen Lundman <[email protected]> Co-authored-by: George Amanakis <[email protected]> Ported-by: Yuxuan Shui <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #925 Closes #1823 Closes #2672 Closes #3744 Closes #9582
* Add 'zfs wait' commandPaul Dagnelie2020-04-013-0/+77
| | | | | | | | | | | | | | | | | | | | | | Add a mechanism to wait for delete queue to drain. When doing redacted send/recv, many workflows involve deleting files that contain sensitive data. Because of the way zfs handles file deletions, snapshots taken quickly after a rm operation can sometimes still contain the file in question, especially if the file is very large. This can result in issues for redacted send/recv users who expect the deleted files to be redacted in the send streams, and not appear in their clones. This change duplicates much of the zpool wait related logic into a zfs wait command, which can be used to wait until the internal deleteq has been drained. Additional wait activities may be added in the future. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Gallagher <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #9707
* Deprecate deduplicated send streamsMatthew Ahrens2020-03-182-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dedup send can only deduplicate over the set of blocks in the send command being invoked, and it does not take advantage of the dedup table to do so. This is a very common misconception among not only users, but developers, and makes the feature seem more useful than it is. As a result, many users are using the feature but not getting any benefit from it. Dedup send requires a nontrivial expenditure of memory and CPU to operate, especially if the dataset(s) being sent is (are) not already using a dedup-strength checksum. Dedup send adds developer burden. It expands the test matrix when developing new features, causing bugs in released code, and delaying development efforts by forcing more testing to be done. As a result, we are deprecating the use of `zfs send -D` and receiving of such streams. This change adds a warning to the man page, and also prints the warning whenever dedup send or receive are used. In a future release, we plan to: 1. remove the kernel code for generating deduplicated streams 2. make `zfs send -D` generate regular, non-deduplicated streams 3. remove the kernel code for receiving deduplicated streams 4. make `zfs receive` of deduplicated streams process them in userland to "re-duplicate" them, so that they can still be received. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #7887 Closes #10117
* Add option for forcible unmounting dataset while receiving snapshot.Mariusz Zaborski2020-03-171-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | Currently when the dataset is in use we can't receive snapshots. zfs send test/1@asd | zfs recv -FM test/2 cannot unmount '/test/2': Device busy This commits add option 'M' which attempts to forcibly unmount the dataset. Thanks to this we can enforce receiving snapshots in a single step. Note that this functionality is not supported on Linux because the VFS will prevent active mounted filesystems from being unmounted, even with the force option. This is the intended VFS behavior. Test cases were added to verify the expected behavior based on the platform. Discussed-with: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Allan Jude <[email protected]> External-issue: https://reviews.freebsd.org/D22306 Closes #9904
* Change default to overlay=onRyan Moeller2020-03-061-6/+8
| | | | | | | | | | | | | | Filesystems allow overlay mounts by default on FreeBSD and Linux. Respect the native convention by switching the default to overlay=on, while retaining the option to turn the property off for compatibility with other operating systems' conventions. Update documentation and tests accordingly. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10030
* Add trim support to zpool waitBrian Behlendorf2020-03-042-5/+9
| | | | | | | | | | | | Manual trims fall into the category of long-running pool activities which people might want to wait synchronously for. This change adds support to 'zpool wait' for waiting for manual trim operations to complete. It also adds a '-w' flag to 'zpool trim' which can be used to turn 'zpool trim' into a synchronous operation. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: John Gallagher <[email protected]> Closes #10071
* Add notice that forcefully unmount is not supported on LinuxMariusz Zaborski2020-02-182-2/+4
| | | | | | | | | | | The Linux VFS will never allow a filesystem which is in use to be unmounted. This behavior differs from other platforms like FreeBSD which allow a filesystem to be force unmounted. This will result in errors being returned to applications actively using the filesystem. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mariusz Zaborski <[email protected]> Closes #10013
* Systemd mount generator: Generate noauto units; add control propertiesInsanePrawn2020-02-142-14/+180
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit refactors the systemd mount generators and makes the following major changes: - The generator now generates units for datasets marked canmount=noauto, too. These units are NOT WantedBy local-fs.target. If there are multiple noauto datasets for a path, no noauto unit will be created. Datasets with canmount=on are prioritized. - Introduces handling of new user properties which are now included in the zfs-list.cache files: - org.openzfs.systemd:requires: List of units to require for this mount unit - org.openzfs.systemd:requires-mounts-for: List of mounts to require by this mount unit - org.openzfs.systemd:before: List of units to order after this mount unit - org.openzfs.systemd:after: List of units to order before this mount unit - org.openzfs.systemd:wanted-by: List of units to add a Wants dependency on this mount unit to - org.openzfs.systemd:required-by: List of units to add a Requires dependency on this mount unit to - org.openzfs.systemd:nofail: Toggles between a wants and a requires dependency. - org.openzfs.systemd:ignore: Do not generate a mount unit for this dataset. Consult the updated man page for detailed documentation. - Restructures and extends the zfs-mount-generator(8) man page with the above properties, information on unit ordering and a license header. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Antonio Russo <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: InsanePrawn <[email protected]> Closes #9649