openzfs/zfs.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix 'zfs userspace' for received datasets in encrypted root	loli10K	2020-11-16	3	-2/+88
\| \| \| \| \| \| \| \| \| \| \| \| \|	For encrypted receives, where user accounting is initially disabled on creation, both 'zfs userspace' and 'zfs groupspace' fails with EOPNOTSUPP: this is because dmu_objset_id_quota_upgrade_cb() forgets to set OBJSET_FLAG_USERACCOUNTING_COMPLETE on the objset flags after a successful dmu_objset_space_upgrade(). Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #9501 Closes #9596
*	Distributed Spare (dRAID) Feature	Brian Behlendorf	2020-11-13	81	-127/+3153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid\|raidz\|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <[email protected]> Co-authored-by: Mark Maybee <[email protected]> Co-authored-by: Don Brady <[email protected]> Co-authored-by: Matthew Ahrens <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10102
*	Linux: Fix mount/unmount when dataset name has a space	Brian Behlendorf	2020-11-11	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The custom zpl_show_devname() helper should translate spaces in to the octal escape sequence \040. The getmntent(2) function is aware of this convention and properly translates the escape character back to a space when reading the fsname. Without this change the `zfs mount` and `zfs unmount` commands incorrectly detect when a dataset with a name containing spaces is mounted. Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11182 Closes #11187
*	Start snapdir_iterate traversals to begin wtih the value of zero.	Tony Perkins	2020-11-11	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \|	The microzap hash can sometimes be zero for single digit snapnames. The zap cursor can then have a serialized value of two (for . and ..), and skip the first entry in the avl tree for the .zfs/snapshot directory listing, and therefore does not return all snapshots. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Cedric Berger <[email protected]> Signed-off-by: Tony Perkins <[email protected]> Closes #11039
*	Fix memleak in cmd/mount_zfs.c	sterlingjensen	2020-11-10	5	-1/+149
\| \| \| \| \| \| \| \| \|	Convert dynamic allocation to static buffer, simplify parse_dataset function return path. Add tests specific to the mount helper. Reviewed-by: Mateusz Guzik <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Sterling Jensen <[email protected]> Closes #11098
*	ZTS: Add L1 corruption test	Ryan Moeller	2020-11-05	4	-4/+98
\| \| \| \| \| \| \| \| \|	Add a new test case which corrupts all level 1 block in a file. Then verifies that corruption is detected and repaired. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11141
*	ZTS: Output all block copies in list_file_blocks	Ryan Moeller	2020-11-05	1	-10/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The second part of list_file_blocks transforms the object description output by zdb -ddddd $ds $objnum into a stream of lines of the form "level path offset length" for the indirect blocks in the given file. The current code only works for the first copy of L0 blocks. L1 and L2 indirect blocks have more than one copy on disk. Add one more -d to the zdb command so we get all block copies and rewrite the transformation to match more than L0 and output all DVAs. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11141
*	ZTS: Fix list_file_blocks for mirror vdevs, level > 0	Ryan Moeller	2020-11-05	1	-21/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The first part of list_file_blocks transforms the pool configuration output by zdb -C $pool into shell code to set up a shell variable, VDEV_MAP, that maps from vdev id to the underlying vdev path. This variable is a simple indexed array. However, the vdev id in a DVA is only the id of the top level vdev. When the pool is mirrored, the top level vdev is a mirror and its children are the mirrored devices. So, what we need is to map from the top level vdev id to a list of the underlying vdev paths. ist_file_blocks does not need to work for raidz vdevs, so we can disregard that case. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11141
*	ZTS: zdb_block_size_histogram increase variance	Brian Behlendorf	2020-11-03	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	The expected variance for this test case was originally set at 10% based on local testing. Additional testing via the CI has show it can be as large as 11%. Increase the expected maximum to 12% to prevent this test from incorrectly failing. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11148
*	ZTS: Wait on all events in events_001_pos.ksh	Brian Behlendorf	2020-11-03	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \|	The events_001_pos.ksh test case can fail because it's possible, and correct, for the config_sync event to be posted after the last "expected" event. To accommodate this the run_and_verify() function has been updated to wait for all non-history events, not just the last event. This does not increase the run time of the test as long as all the events do get generated. Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11147
*	Update references to nonexistent man pages in code	Ryan Moeller	2020-10-30	1	-1/+1
\| \| \| \| \| \| \| \|	Refer to the correct section or alternative for FreeBSD and Linux. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11132
*	ZTS: Fix xattr_004_pos failure, don't use tmpfs	Tony Hutter	2020-10-30	1	-27/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, xattr_004_pos would create files with xattrs on both tmpfs and ext2, and then copy them to zfs to verify that their xattrs were preserved. However tmpfs doesn't support xattrs. This was never noticed until Fedora 33. In Fedora 32 and older, /tmp was on the root partition (like ext4), whereas on Fedora 33 /tmp is actually tmpfs. That caused this test to fail on Fedora 33. This fix updates the test to only create the file on ext2, not tmpfs. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #11133
*	Non-l2arc pool reads shouldn't be l2arc misses	Adam D. Moss	2020-10-20	5	-1/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current l2_misses accounting behavior treats all reads to pools without a configured l2arc as an l2arc miss, IFF there is at least one other pool on the system which does have an l2arc configured. This makes it extremely hard to tune for an improved l2arc hit/miss ratio because this ratio will be modulated by reads from pools which do not (and should not) have l2arc devices; its upper limit will depend on the ratio of reads from l2arc'd pools and non-l2arc'd pools. This PR prevents ARC reads affecting l2arc stats (n.b. l2_misses is the only relevant one) where the target spa doesn't have an l2arc. Includes new test - l2arc_l2miss_pos.ksh Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Adam Moss <[email protected]> Closes #10921
*	zed syslog entries drop important info	Don Brady	2020-10-19	3	-29/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ZED will log zevents summaries to the syslog, however the log entries tend to drop event details that can be useful for diagnosis. This is especially true for ereport events, like io, checksum, and delay. Update the all-syslog.sh script to log additional event information. Add an optional config option, ZED_SYSLOG_DISPLAY_GUIDS, to zed.rc for choosing GUIDs over names for pool and vdev. Change the default ZED_SYSLOG_SUBCLASS_EXCLUDE to exclude history_event events. These events tend to be frequent, convey no meaningful info, and are already logged in the zpool history. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #10967
*	Fix crash caused by invalid snapshot names in redactnvl	Christian Schwarz	2020-10-14	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	This is a follow up fix for commit 0fdd6106bb. The VERIFY is only true when we haven't hit an error code path. See added test case for a reproducer. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Christian Schwarz <[email protected]> Closes #11048
*	Cross-platform acltype	Ryan Moeller	2020-10-13	3	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The acltype property is currently hidden on FreeBSD and does not reflect the NFSv4 style ZFS ACLs used on the platform. This makes it difficult to observe that a pool imported from FreeBSD on Linux has a different type of ACL that is being ignored, and vice versa. Add an nfsv4 acltype and expose the property on FreeBSD. Make the default acltype nfsv4 on FreeBSD. Setting acltype to an unhanded style is treated the same as setting it to off. The ACLs will not be removed, but they will be ignored. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10520
*	Add zpool_influxdb command	Richard Elling	2020-10-09	7	-1/+140
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A zpool_influxdb command is introduced to ease the collection of zpool statistics into the InfluxDB time-series database. Examples are given on how to integrate with the telegraf statistics aggregator, a companion to influxdb. Finally, a grafana dashboard template is included to show how pool latency distributions can be visualized in a ZFS + telegraf + influxdb + grafana environment. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Richard Elling <[email protected]> Closes #10786
*	Linux: Initialize zp in zfs_setattr_dir	Ryan Moeller	2020-10-09	12	-4/+154
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The value of zp is used without having been initialized under some conditions. Initialize the pointer to NULL. Add a regression test case using chown in acl/posix. However, this is not enough because the setup sets xattr=sa, which means zfs_setattr_dir will not be called. Create a second group of acl tests in acl/posix-sa duplicating the acl/posix tests with symlinks, and remove xattr=sa from the original acl/posix tests. This provides more coverage for the default xattr=on code. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10043 Closes #11025
*	Replace ZFS on Linux references with OpenZFS	Brian Behlendorf	2020-10-08	6	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	This change updates the documentation to refer to the project as OpenZFS instead ZFS on Linux. Web links have been updated to refer to https://github.com/openzfs/zfs. The extraneous zfsonlinux.org web links in the ZED and SPL sources have been dropped. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11007
*	ZTS: Fix path to /dev/null in nopwrite_recsize	Ryan Moeller	2020-10-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Don't direct stdout and stderr of dd to $TEST_BASE_DIR/null, direct it to /dev/null. Reviewed-by: Kjeld Schouten <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11026
*	Make dbufstat work on FreeBSD	Ryan Moeller	2020-10-08	5	-15/+39
\| \| \| \| \| \| \| \| \| \| \| \| \|	With procfs_list kstats implemented for FreeBSD, dbufs are now exposed as kstat.zfs.misc.dbufs. On FreeBSD, dbufstats can use the sysctl instead of procfs when no input file has been given. Enable the dbufstats tests on FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11008
*	Make L2ARC tests more robust	George Amanakis	2020-10-05	12	-66/+143
\| \| \| \| \| \| \| \| \| \|	Instead of relying on arbitrary timers after pool export/import or cache device off/online rely on arcstats. This makes the L2ARC tests more robust. Also cleanup some functions related to persistent L2ARC. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10983
*	Mismatched nvlist names in zfs_keys_send_space	John Poduska	2020-10-02	2	-4/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This causes "zfs send -vt ..." to fail with: cannot resume send: Unknown error 1030 It turns out that some of the name/value pairs in the verification list for zfs_ioc_send_space(), zfs_keys_send_space, had the wrong name, so the ioctl got kicked out in zfs_check_input_nvpairs(). Update the names accordingly. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: John Poduska <[email protected]> Closes #10978
*	Drop references when skipping dmu_send due to EXDEV	Ryan Moeller	2020-09-30	8	-2/+205
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an invalid incremental send is requested where the "to" ds is before the "from" ds, make sure to drop the reference to the pool and the dataset before returning the error. Add an assert on FreeBSD to make sure we don't hold any locks after returning from an ioctl. Add some test coverage. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10919
*	vdev_ashift should only be set once	George Wilson	2020-09-18	7	-1/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	== Motivation and Context The new vdev ashift optimization prevents the removal of devices when a zfs configuration is comprised of disks which have different logical and physical block sizes. This is caused because we set 'spa_min_ashift' in vdev_open and then later call 'vdev_ashift_optimize'. This would result in an inconsistency between spa's ashift calculations and that of the top-level vdev. In addition, the optimization logical ignores the overridden ashift value that would be provided by '-o ashift=<val>'. == Description This change reworks the vdev ashift optimization so that it's only set the first time the device is configured. It still allows the physical and logical ahsift values to be set every time the device is opened but those values are only consulted on first open. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Cedric Berger <[email protected]> Signed-off-by: George Wilson <[email protected]> External-Issue: DLPX-71831 Closes #10932
*	Rename acltype=posixacl to acltype=posix	Ryan Moeller	2020-09-16	8	-14/+14
\| \| \| \| \| \| \|	Prefer acltype=off\|posix, retaining the old names as aliases. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10918
*	zfs label bootenv should store data as nvlist	Toomas Soome	2020-09-15	2	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	nvlist does allow us to support different data types and systems. To encapsulate user data to/from nvlist, the libzfsbootenv library is provided. Reviewed-by: Arvind Sankar <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Toomas Soome <[email protected]> Closes #10774
*	Add L2ARC arcstats for MFU/MRU buffers and buffer content type	George Amanakis	2020-09-14	17	-34/+218
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the ARC state (MFU/MRU) of cached L2ARC buffer and their content type is unknown. Knowing this information may prove beneficial in adjusting the L2ARC caching policy. This commit adds L2ARC arcstats that display the aligned size (in bytes) of L2ARC buffers according to their content type (data/metadata) and according to their ARC state (MRU/MFU or prefetch). It also expands the existing evict_l2_eligible arcstat to differentiate between MFU and MRU buffers. L2ARC caches buffers from the MRU and MFU lists of ARC. Upon caching a buffer, its ARC state (MRU/MFU) is stored in the L2 header (b_arcs_state). The l2_m{f,r}u_asize arcstats reflect the aligned size (in bytes) of L2ARC buffers according to their ARC state (based on b_arcs_state). We also account for the case where an L2ARC and ARC cached MRU or MRU_ghost buffer transitions to MFU. The l2_prefetch_asize reflects the alinged size (in bytes) of L2ARC buffers that were cached while they had the prefetch flag set in ARC. This is dynamically updated as the prefetch flag of L2ARC buffers changes. When buffers are evicted from ARC, if they are determined to be L2ARC eligible then their logical size is recorded in evict_l2_eligible_m{r,f}u arcstats according to their ARC state upon eviction. Persistent L2ARC: When committing an L2ARC buffer to a log block (L2ARC metadata) its b_arcs_state and prefetch flag is also stored. If the buffer changes its arcstate or prefetch flag this is reflected in the above arcstats. However, the L2ARC metadata cannot currently be updated to reflect this change. Example: L2ARC caches an MRU buffer. L2ARC metadata and arcstats count this as an MRU buffer. The buffer transitions to MFU. The arcstats are updated to reflect this. Upon pool re-import or on/offlining the L2ARC device the arcstats are cleared and the buffer will now be counted as an MRU buffer, as the L2ARC metadata were not updated. Bug fix: - If l2arc_noprefetch is set, arc_read_done clears the L2CACHE flag of an ARC buffer. However, prefetches may be issued in a way that arc_read_done() is bypassed. Instead, move the related code in l2arc_write_eligible() to account for those cases too. Also add a test and update manpages for l2arc_mfuonly module parameter, and update the manpages and code comments for l2arc_noprefetch. Move persist_l2arc tests to l2arc. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10743
*	Avoid posting duplicate zpool events	Don Brady	2020-09-04	7	-2/+346
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Duplicate io and checksum ereport events can misrepresent that things are worse than they seem. Ideally the zpool events and the corresponding vdev stat error counts in a zpool status should be for unique errors -- not the same error being counted over and over. This can be demonstrated in a simple example. With a single bad block in a datafile and just 5 reads of the file we end up with a degraded vdev, even though there is only one unique error in the pool. The proposed solution to the above issue, is to eliminate duplicates when posting events and when updating vdev error stats. We now save recent error events of interest when posting events so that we can easily check for duplicates when posting an error. Reviewed by: Brad Lewis <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #10861
*	Make spa_stats.c tunables visible on FreeBSD	Ryan Moeller	2020-09-01	1	-2/+2
\| \| \| \| \| \| \| \|	Use ZFS_MODULE_PARAM for cross-platform tunables in spa_stats.c, and add update tunables.cfg in tests for the newly supported ones. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10858
*	Add 'zfs rename -u' to rename without remounting	Ryan Moeller	2020-09-01	4	-4/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow to rename file systems without remounting if it is possible. It is possible for file systems with 'mountpoint' property set to 'legacy' or 'none' - we don't have to change mount directory for them. Currently such file systems are unmounted on rename and not even mounted back. This introduces layering violation, as we need to update 'f_mntfromname' field in statfs structure related to mountpoint (for the dataset we are renaming and all its children). In my opinion it is worth it, as it allow to update FreeBSD in even cleaner way - in ZFS-only configuration root file system is ZFS file system with 'mountpoint' property set to 'legacy'. If root dataset is named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone it (system/oldrootfs), update FreeBSD and if it doesn't boot we can boot back from system/oldrootfs and rename it back to system/rootfs while it is mounted as /. Before it was not possible, because unmounting / was not possible. Authored by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported by: Matt Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10839
*	Always track temporary fses and snapshots for accounting	Paul Dagnelie	2020-08-26	3	-1/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The root cause of the issue is that we only occasionally do as the comments in the code suggest and actually ignore the %recv dataset when it comes to filesystem limit tracking. Specifically, the only time we ignore it is when initializing the filesystem and snapshot limit values; when creating a new %recv dataset or deleting one, we always update the bookkeeping. This causes a problem if you init the fs count on a filesystem that already has a %recv dataset, since the bookmarking will be decremented but not incremented. This is resolved in this patch by simply always tracking the %recv dataset as a child. Reviewed-by: Matt Ahrens <[email protected]> Reviewed by: Jerry Jelinek <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #10791
*	ZTS: Improve block_device_wait on FreeBSD	Ryan Moeller	2020-08-24	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	FreeBSD doesn't have an equivalent to udevadm settle, so we have been resorting to a three second sleep to wait for device changes to take effect. This is far from ideal. We are mainly waiting for volmode=geom zvols to appear in /dev, so as a hack, reading the geom config will have the desired effect of quiescing the geom state. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10768
*	ZFS performance tests should clean up NFS mount	Tony Nguyen	2020-08-23	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	This change umounts client's NFS mount after each test so we can avoid two sporadic issues: 1) client NFS stale mount and 2) zpool export and zpool destroy failed due to dataset busy Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Closes #10767
*	'zfs share -a' should clean noauto exports	Don Brady	2020-08-20	3	-1/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow on to PR #10688 where `zfs share -a` allows the sharing of canmount=noauto datasets if they are mounted. However, when a dataset with canmount=noauto is not mounted, the command should also purge any existing entries from the exports file. Otherwise, after a reboot, the nfs server attempts to export the underlying mountpath, not the dataset. This can lead to a hard hang for existing client mounts. Instead of just skipping the adding of an export if not mounted and canmount=noauto, have it also remove an existing export of the dataset so that, after a reboot, we don't export an unmounted dataset. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #10747
*	Add zstd support to zfs	Michael Niewöhner	2020-08-20	15	-10/+556
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Co-authored-by: Sebastian Gottschall <[email protected]> Co-authored-by: Kjeld Schouten-Lebbing <[email protected]> Co-authored-by: Michael Niewöhner <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Sebastian Gottschall <[email protected]> Signed-off-by: Kjeld Schouten-Lebbing <[email protected]> Signed-off-by: Michael Niewöhner <[email protected]> Closes #6247 Closes #9024 Closes #10277 Closes #10278
*	ZED: Do not offline a missing device if no spare is available	Brian Behlendorf	2020-08-18	1	-21/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to commit d48091d a removed device is now explicitly offlined by the ZED if no spare is available, rather than the letting ZFS detect it as UNAVAIL. This broke auto-replacing of whole-disk devices, as described in issue #10577. In short, when a new device is reinserted in the same slot, the ZED will try to ONLINE it without letting ZFS recreate the necessary partition table. This change simply avoids setting the device OFFLINE when removed if no spare is available (or if spare_on_remove is false). This change has been left minimal to allow it to be backported to 0.8.x release. The auto_offline_001_pos ZTS test has been updated accordingly. Some follow up work is planned to update the ZED so it transitions the vdev to a REMOVED state. This is a state which has always existed but there is no current interface the ZED can use to accomplish this. Therefore it's being left to a follow up PR. Reviewed-by: Gionatan Danti <[email protected]> Co-authored-by: Gionatan Danti <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10577 Closes #10730
*	ZTS: ztest may cause mmp tests failures	Brian Behlendorf	2020-08-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	The mmp_exported_import and mmp_inactive_import tests depend on ztest simulating an active pool. If ztest unexpectedly terminates due to an unrelated issue the test case will fail. Since ztest is not yet 100% reliable I've added these tests to the maybe exception list. They can be removed when the issues with ztest are resolved or if the test cases are updated to handle these unexpected failures. Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10726
*	Fix L2ARC reads when compressed ARC disabled	Allan Jude	2020-08-13	4	-2/+200
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When reading compressed blocks from the L2ARC, with compressed ARC disabled, arc_hdr_size() returns LSIZE rather than PSIZE, but the actual read is PSIZE. This causes l2arc_read_done() to compare the checksum against the wrong size, resulting in checksum failure. This manifests as an increase in the kstat l2_cksum_bad and the read being retried from the main pool, making the L2ARC ineffective. Add new L2ARC tests with Compressed ARC enabled/disabled Blocks are handled differently depending on the state of the zfs_compressed_arc_enabled tunable. If a block is compressed on-disk, and compressed_arc is enabled: - the block is read from disk - It is NOT decompressed - It is added to the ARC in its compressed form - l2arc_write_buffers() may write it to the L2ARC (as is) - l2arc_read_done() compares the checksum to the BP (compressed) However, if compressed_arc is disabled: - the block is read from disk - It is decompressed - It is added to the ARC (uncompressed) - l2arc_write_buffers() will use l2arc_apply_transforms() to recompress the block, before writing it to the L2ARC - l2arc_read_done() compares the checksum to the BP (compressed) - l2arc_read_done() will use l2arc_untransform() to uncompress it This test writes out a test file to a pool consisting of one disk and one cache device, then randomly reads from it. Since the arc_max in the tests is low, this will feed the L2ARC, and result in reads from the L2ARC. We compare the value of the kstat l2_cksum_bad before and after to determine if any blocks failed to survive the trip through the L2ARC. Sponsored-by: The FreeBSD Foundation Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #10693
*	'zfs share -a' should handle 'canmount=noauto'	George Wilson	2020-08-11	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 'zfs share -a' currently skips any filesystems which have 'canmount=noauto' set. This behavior is unexpected since the one would expect 'zfs share -a' to share any mounted filesystem that has the 'sharenfs' property already set. This changes the behavior of 'zfs share -a' to allow the sharing of 'canmount=noauto' datasets if they are mounted. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Don Brady <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: George Wilson <[email protected]> External-issue: DLPX-71313 Closes #10688
*	ZTS: Remove bashisms from zfs-tests.sh	Ryan Moeller	2020-08-07	1	-16/+18
\| \| \| \| \| \| \| \|	Bring zfs-tests.sh in to compliance with the other scripts by converting it /bin/sh for to avoid a dependency on bash. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10640
*	FreeBSD: Fix `zfs jail` and add a test	Ryan Moeller	2020-08-01	8	-1/+192
\| \| \| \| \| \| \| \| \|	zfs_jail was not using zfs_ioctl so failed to map the IOC number correctly. Use zfs_ioctl to perform the jail ioctl and add a test case for FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10658
*	ZTS: FreeBSD does have a l2arc.trim_ahead tunable	Ryan Moeller	2020-07-31	1	-1/+1
\| \| \| \| \|	Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10633
*	ZTS: zvol_misc_volmode is flaky on FreeBSD	Ryan Moeller	2020-07-31	1	-0/+1
\| \| \| \| \| \| \|	Mark this as a known issue. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10655
*	ZTS: Use POSIX-compatible space character class	Ryan Moeller	2020-07-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	FreeBSD recently integrated a change which causes \s in a regex to throw an error instead of silently being misinterpreted as an s. Change the regex in zpool_colors.ksh to use [[:space:]]. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10651
*	zfs promote does not delete livelist of origin	Matthew Ahrens	2020-07-31	1	-1/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a clone is promoted, its livelist is no longer accurate, so it is discarded. If the clone's origin is also a clone (i.e. we are promoting a clone of a clone), then the origin's livelist is also no longer accurate, so it should be discarded, but the code doesn't actually do that. Consider a pool with: * Filesystem A * Clone B, a clone of A * Clone C, a clone of B If we promote C, it discards C's livelist. It should discard B's livelist, but that is not happening. The impact is that when B is destroyed, we use the livelist to find the blocks to free, but the livelist is no longer correct so we end up freeing blocks that are still in use by C. The incorrectly-freed blocks can be reallocated causing checksum errors. And when C is destroyed it can double-free the incorrectly-freed blocks. The problem is that we remove the livelist of `origin_ds->ds_dir`, but the origin snapshot has already been moved to the promoted dsl_dir. So this is actually trying to remove the livelist of the promoted dsl_dir, which was already removed. As explained in a comment in the beginning of `dsl_dataset_promote_sync()`, we need to use the saved `odd` for the origin's dsl_dir. Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed by: Sara Hartse <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10652
*	ZTS: minor improvements to alloc_class_009_pos functional test	Don Brady	2020-07-30	1	-4/+15
\| \| \| \| \| \| \| \| \| \|	* Fixed a typo that cause one of the variations to be a no-op * Added additional coverage for adding special vdev after pool create * Added additional coverage for using 4K sector size Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #10641
*	Fix lua stack overflow on recursive call to gsub()	Matthew Ahrens	2020-07-27	7	-1/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The `zfs program` subcommand invokes a LUA interpreter to run ZFS "channel programs". This interpreter runs in a constrained environment, with defined memory limits. The LUA stack (used for LUA functions that call each other) is allocated in the kernel's heap, and is limited by the `-m MEMORY-LIMIT` flag and the `zfs_lua_max_memlimit` module parameter. The C stack is used by certain LUA features that are implemented in C. The C stack is limited by `LUAI_MAXCCALLS=20`, which limits call depth. Some LUA C calls use more stack space than others, and `gsub()` uses an unusually large amount. With a programming trick, it can be invoked recursively using the C stack (rather than the LUA stack). This overflows the 16KB Linux kernel stack after about 11 iterations, less than the limit of 20. One solution would be to decrease `LUAI_MAXCCALLS`. This could be made to work, but it has a few drawbacks: 1. The existing test suite does not pass with `LUAI_MAXCCALLS=10`. 2. There may be other LUA functions that use a lot of stack space, and the stack space may change depending on compiler version and options. This commit addresses the problem by adding a new limit on the amount of free space (in bytes) remaining on the C stack while running the LUA interpreter: `LUAI_MINCSTACK=4096`. If there is less than this amount of stack space remaining, a LUA runtime error is generated. Reviewed-by: George Wilson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10611 Closes #10613
*	Add support to decode a resume token	tony-zfs	2020-07-23	2	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \|	Adding a new subcommand to zstream called token. This now allows users to decode a resume token to retrieve the toname field. This can be useful for tools that need this information. The syntax works as follows zstream token <resume_token>. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Zuchowski <[email protected]> Signed-off-by: Tony Perkins <[email protected]> Closes #10558
*	ZTS: Fix devname2devid build on FreeBSD with libudev	Ryan Moeller	2020-07-22	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When libudev is installed on FreeBSD, configure finds it and sets WANT_DEVNAME2DEVID, but it isn't found by the linker because we didn't specify where it is. Use LIBUDEV_LIBS so the location of the library gets added to the linker flags for devname2devid. Also use LIBUDEV_CFLAGS here in case some other platform needs it. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Arvind Sankar <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10590