aboutsummaryrefslogtreecommitdiffstats
path: root/cmd
Commit message (Collapse)AuthorAgeFilesLines
* Memory leak in ztest_dmu_objset_own()Matthew Ahrens2021-01-051-1/+5
| | | | | | | | Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #11396
* Memory leak in ztest_vdev_attach_detach()Matthew Ahrens2021-01-051-2/+1
| | | | | | | | Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #11396
* nvlist leaked in zpool_find_config()Matthew Ahrens2021-01-053-0/+12
| | | | | | | | | | | | | | | | | In `zpool_find_config()`, the `pools` nvlist is leaked. Part of it (a sub-nvlist) is returned in `*configp`, but the callers also leak that. Additionally, in `zdb.c:main()`, the `searchdirs` is leaked. The leaks were detected by ASAN (`configure --enable-asan`). This commit resolves the leaks. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #11396
* dbufstat: Fix warnings with Python 3.8Ryan Moeller2020-12-231-2/+2
| | | | | | | Replace "is" with "==" and "is not" with "!=". Reviewed-by: George Melikov <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11394
* Use the correct return type for getoptsterlingjensen2020-12-234-4/+4
| | | | | | | | Use the correct return type for getopt otherwise clang complains about tautological-constant-out-of-range-compare. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Sterling Jensen <[email protected]> Closes #11359
* arc_summary3: Handle overflowing value widthRyan Moeller2020-12-231-2/+6
| | | | | | | | | | | | | | Some tunables shown by arc_summary3 have string values that may exceed the normal line length, leaving a negative offset between the name and value fields. The negative space is of course not valid and Python rightly barfs up an exception traceback. Handle an overflowing value field width by ignoring the line length and separating the name from the value by a single space instead. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11270
* FreeBSD: Update usage of py-sysctlRyan Moeller2020-12-233-28/+26
| | | | | | | | | | | | py-sysctl now includes the CTLTYPE_NODE type nodes in the list returned by sysctl.filter() on FreeBSD head. It also provides descriptions now. Eliminate the subprocess call to get descriptions, and filter out the nodes so we only deal with values. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11318
* mount_zfs: print strerror instead of errno for error reportingÉrico Nogueira Rolim2020-12-231-6/+6
| | | | | | | | Tracking down an error message with the errno value can be difficult, using strerror makes the error message clearer. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Érico Rolim <[email protected]> Closes #11303
* Drop path prefix workaroundsterlingjensen2020-12-231-41/+18
| | | | | | | Canonicalization, the source of the trouble, was disabled in 9000a9f. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Sterling Jensen <[email protected]> Closes #11295
* FreeBSD: notify userspace when a vdev is removedRyan Moeller2020-12-231-0/+2
| | | | | | | | This is needed for zfsd to autoreplace vdevs. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11260
* Make zpool status "remove:" label print in boldAndrew Sun2020-12-231-1/+1
| | | | | | | | | | When ZFS_COLOR is set, zpool status shows row headings in bold, except for the "remove:" heading. This is a quick fix that makes it print in bold too. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrew Sun <[email protected]> Closes #11255
* ZED/zfs-list-cacher.sh: don't exit on ignored event typePrawn2020-12-191-1/+1
| | | | | | | | | | | | | | Check for the history_event type instead. The zfs-list-cacher.sh script currently respects the event types excluded from syslog(!) in ZED_SYSLOG_SUBCLASS_EXCLUDE. This makes little sense in this single-purpose script and silently breaks when history_events are excluded from syslog, which is the default since 13d65987a9d9958de77422f5d9d25b47e486537d. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: InsanePrawn <[email protected]> Closes #11164 Closes #11347
* zpool: correctly align columns with -pнаб2020-11-173-24/+35
| | | | | | | | | | zpool_expand_proplist() now ignores pl_fixed if its new literal argument is true. The rest is a consequence of needing to pass that down. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiao?=~Dska <[email protected]> Closes #11202
* zgenhostid: accept hostid arguments equal to zero.Érico Rolim2020-11-171-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A common usage pattern for zgenhostid, including in the ZFS dracut module, is running it as: zgenhostid $(hostid) However, zgenhostid only accepted hostid arguments greater than 0, which meant that, when the output of hostid(1) was "00000000", zgenhostid would error out, even though 0 is a possible return value for the gethostid(3) function used by hostid(1): - On current musl libc, gethostid(3) is a stub that always returns 0. - On glibc, gethostid(3) will return 0 if /etc/hostid exists but is smaller than 4 bytes. In these cases, it makes more sense for zgenhostid to treat a value of 0 as other parts of the zfs codebase do, meaning that a hostid value couldn't be determined; therefore, it should attempt to generate a random value to write into /etc/hostid. The manpage and usage output have been updated to reflect this. Whitespace has also been fixed in the usage output. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Georgy Yakovlev <[email protected]> Reviewed-by: Andrew J. Hesford <[email protected]> Signed-off-by: Érico Rolim <[email protected]> Closes #11174 Closes #11189
* Assertion failure when logging large output of channel programMatthew Ahrens2020-11-141-0/+6
| | | | | | | | | | | | | | | | | | | The output of ZFS channel programs is logged on-disk in the zpool history, and printed by `zpool history -i`. Channel programs can use 10MB of memory by default, and up to 100MB by using the `zfs program -m` flag. Therefore their output can be up to some fraction of 100MB. In addition to being somewhat wasteful of the limited space reserved for the pool history (which for large pools is 1GB), in extreme cases this can result in a failure of `ASSERT(length <= DMU_MAX_ACCESS);` in `dmu_buf_hold_array_by_dnode()`. This commit limits the output size that will be logged to 1MB. Larger outputs will not be logged, instead a entry will be logged indicating the size of the omitted output. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #11194
* Fix memleak in cmd/mount_zfs.csterlingjensen2020-11-111-42/+28
| | | | | | | | | | Convert dynamic allocation to static buffer, simplify parse_dataset function return path. Add tests specific to the mount helper. Reviewed-by: Mateusz Guzik <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Sterling Jensen <[email protected]> Closes #11098
* Update references to nonexistent man pages in codeRyan Moeller2020-10-301-7/+7
| | | | | | | | | Refer to the correct section or alternative for FreeBSD and Linux. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11132
* Restore identification of VDEVs using non-native block sizeCy Schubert2020-10-301-0/+7
| | | | | | | | | | | | | | NAME STATE READ WRITE CKSUM dsk02 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada1s4a ONLINE 0 0 0 ada2s4a ONLINE 0 0 0 block size: 512B configured, 4096B native Reviewed-by: Matt Macy <[email protected]> Reviewed-by: Toomas Soome <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed off by: Cy Schubert <[email protected]> Closes #11088
* arcstat: Add -a and -p options from FreeNASRyan Moeller2020-10-301-6/+32
| | | | | | | | | | | | Added -a option to automatically print all valid statistics. Added -p option to suppress scaling of printed data. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Authored by: Nick Principe <[email protected]> Ported-by: Ryan Moeller <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11090
* zed syslog entries drop important infoDon Brady2020-10-192-5/+46
| | | | | | | | | | | | | | | | | | | | ZED will log zevents summaries to the syslog, however the log entries tend to drop event details that can be useful for diagnosis. This is especially true for ereport events, like io, checksum, and delay. Update the all-syslog.sh script to log additional event information. Add an optional config option, ZED_SYSLOG_DISPLAY_GUIDS, to zed.rc for choosing GUIDs over names for pool and vdev. Change the default ZED_SYSLOG_SUBCLASS_EXCLUDE to exclude history_event events. These events tend to be frequent, convey no meaningful info, and are already logged in the zpool history. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #10967
* zil_parse: make callback parameters constChristian Schwarz2020-10-161-20/+21
| | | | | | | | | | Code cleanup, a follow up commit to 4d55ea81. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Co-authored-by: Ryan Moeller <[email protected]> Signed-off-by: Christian Schwarz <[email protected]> Closes #11020
* Replace ZFS on Linux references with OpenZFSBrian Behlendorf2020-10-1620-41/+39
| | | | | | | | | | | | | | This change updates the documentation to refer to the project as OpenZFS instead ZFS on Linux. Web links have been updated to refer to https://github.com/openzfs/zfs. The extraneous zfsonlinux.org web links in the ZED and SPL sources have been dropped. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11007
* Make dbufstat work on FreeBSDRyan Moeller2020-10-161-1/+16
| | | | | | | | | | | | | | With procfs_list kstats implemented for FreeBSD, dbufs are now exposed as kstat.zfs.misc.dbufs. On FreeBSD, dbufstats can use the sysctl instead of procfs when no input file has been given. Enable the dbufstats tests on FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11008
* zdb should not output binary data on terminalToomas Soome2020-10-161-1/+15
| | | | | | | | | | | | | The zdb is interpreting byte array as textual string in dump_zap, but there are also binary arrays and we should not output binary data on terminal. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Toomas Soome <[email protected]> External-issue: https://www.illumos.org/issues/12012 External-issue: https://www.illumos.org/issues/11713 Closes #11006
* zfs userspace: use zfs_path_to_zhandle so argument can be a pathAllan Jude2020-10-011-7/+9
| | | | | | | | | | Change zfs userspace subcommand to use zfs_path_to_zhandle() so that the provided dataset can be a path (/usr) or a dataset (rpool/usr). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #8915
* vdev_ashift should only be set onceGeorge Wilson2020-09-181-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | == Motivation and Context The new vdev ashift optimization prevents the removal of devices when a zfs configuration is comprised of disks which have different logical and physical block sizes. This is caused because we set 'spa_min_ashift' in vdev_open and then later call 'vdev_ashift_optimize'. This would result in an inconsistency between spa's ashift calculations and that of the top-level vdev. In addition, the optimization logical ignores the overridden ashift value that would be provided by '-o ashift=<val>'. == Description This change reworks the vdev ashift optimization so that it's only set the first time the device is configured. It still allows the physical and logical ahsift values to be set every time the device is opened but those values are only consulted on first open. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Cedric Berger <[email protected]> Signed-off-by: George Wilson <[email protected]> External-Issue: DLPX-71831 Closes #10932
* zdb leak detection fails with in-progress device removalMatthew Ahrens2020-09-181-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a device removal is in progress, there are 2 locations for the data that's already been moved: the original location, on the device that's being removed; and the new location, which is pointed to by the indirect mapping. When doing leak detection, zdb needs to know about both locations. To determine what's already been copied, we load the spacemaps of the removing vdev, omit the blocks that are yet to be copied, and then use the vdev's remap op to find the new location. The problem is with an optimization to the spacemap-loading code in zdb. When processing the log spacemaps, we ignore entries that are not relevant because they are past the point that's been copied. However, entries which span the point that's been copied (i.e. they are partly relevant and partly irrelevant) are processed normally. This can lead to an illegal spacemap operation, for example if offsets up to 100KB have been copied, and the spacemap log has the following entries: ALLOC 50KB-150KB (partly relevant) FREE 50KB-100KB (entirely relevant) FREE 100KB-150KB (entirely irrlevant - ignored) ALLOC 50KB-150KB (partly relevant) Because the entirely irrelevant entry was ignored, its space remains in the spacemap. When the last entry is processed, we attempt to add it to the spacemap, but it partially overlaps with the 100-150KB entry that was left over. This problem was discovered by ztest/zloop. One solution would be to also ignore the irrelevant parts of partially-irrelevant entries (i.e. when processing the ALLOC 50-150, to only add 50-100 to the spacemap). However, this commit implements a simpler solution, which is to remove this optimization entirely. I.e. to process the entire spacemap log, without regard for the point that's been copied. After reconstructing the entire allocatable range tree, there's already code to remove the parts that have not yet been copied. Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> External-issue: DLPX-71820 Closes #10920
* cmd/zgenhostid: replace with simple c implementationGeorgy Yakovlev2020-09-184-62/+158
| | | | | | | | | | | | | | | | | | | It was discovered that dracut scripts and zgenhostid always generate little-endian /etc/hostid. This commit provides simple endianess-aware binary and updates the scripts to use it. New features include: -f flag to force overwrite. -o flag to write to different file (for dracut) accepting both 0x01234567 and 01234567 values as input Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Signed-off-by: Georgy Yakovlev <[email protected]> Closes #10887 Closes #10925
* Force the use of '.' as decimal separator.xdch472020-09-093-0/+3
| | | | | | | | | | This solves issues occurring with a different decimal operator and keeps the command line interface consistent for all locales . E.g. `zfs set quota=0.5T` Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Felix Neumärker <[email protected]> Closes #10878
* Add 'zfs rename -u' to rename without remountingRyan Moeller2020-09-031-13/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow to rename file systems without remounting if it is possible. It is possible for file systems with 'mountpoint' property set to 'legacy' or 'none' - we don't have to change mount directory for them. Currently such file systems are unmounted on rename and not even mounted back. This introduces layering violation, as we need to update 'f_mntfromname' field in statfs structure related to mountpoint (for the dataset we are renaming and all its children). In my opinion it is worth it, as it allow to update FreeBSD in even cleaner way - in ZFS-only configuration root file system is ZFS file system with 'mountpoint' property set to 'legacy'. If root dataset is named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone it (system/oldrootfs), update FreeBSD and if it doesn't boot we can boot back from system/oldrootfs and rename it back to system/rootfs while it is mounted as /. Before it was not possible, because unmounting / was not possible. Authored by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported by: Matt Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10839
* Typo CorrectionSpencer Kinny2020-08-301-1/+1
| | | | | | | | | | Corrected the typo in zfs/cmd/zfs/zfs_main.c line number 404 pbkfd2iters to pbkdf2iters Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Spencer Kinny <[email protected]> Closes #10850
* zpool: Change base URL for ZFS messages to openzfs-docsRyan Moeller2020-08-271-3/+6
| | | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Kjeld Schouten <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10820
* Import vdev ashift optimization from FreeBSDRyan Moeller2020-08-211-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many modern devices use physical allocation units that are much larger than the minimum logical allocation size accessible by external commands. Two prevalent examples of this are 512e disk drives (512b logical sector, 4K physical sector) and flash devices (512b logical sector, 4K or larger allocation block size, and 128k or larger erase block size). Operations that modify less than the physical sector size result in a costly read-modify-write or garbage collection sequence on these devices. Simply exporting the true physical sector of the device to ZFS would yield optimal performance, but has two serious drawbacks: 1. Existing pools created with devices that have different logical and physical block sizes, but were configured to use the logical block size (e.g. because the OS version used for pool construction reported the logical block size instead of the physical block size) will suddenly find that the vdev allocation size has increased. This can be easily tolerated for active members of the array, but ZFS would prevent replacement of a vdev with another identical device because it now appears that the smaller allocation size required by the pool is not supported by the new device. 2. The device's physical block size may be too large to be supported by ZFS. The optimal allocation size for the vdev may be quite large. For example, a RAID controller may export a vdev that requires read-modify-write cycles unless accessed using 64k aligned/sized requests. ZFS currently has an 8k minimum block size limit. Reporting both the logical and physical allocation sizes for vdevs solves these problems. A device may be used so long as the logical block size is compatible with the configuration. By comparing the logical and physical block sizes, new configurations can be optimized and administrators can be notified of any existing pools that are sub-optimal. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Matthew Macy <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10619
* Silence 'make checkbashisms'Brian Behlendorf2020-08-201-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d2bce6d03 added the 'make checkbashisms' target but did not resolve all of the bashisms in the scripts. This commit doesn't resolve them all either but it does fix up a few, and it excludes the others so 'make checkstyle' no longer prints warnings. It's a small step in the right direction. * Dracut is Linux specific and itself depends on bash. Therefore all dracut support scripts can be bash specific, update their shebang accordingly. * zed-functions.sh, zfs-import, zfs-mount, zfs-zed, smart paxcheck.sh, make_gitrev.sh - these scripts were excuded from the check until they can be updated and properly tested. * zfsunlock - only whole values for sleep are allowed. * vdev_id - removed unneeded locals; use && instead of -a. * dkms.mkconf, dkms.postbuil - use || instead of -o. Reviewed-by: InsanePrawn <[email protected]> Reviewed-by: Gabriel A. Devenyi <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10755
* 'zfs share -a' should clean noauto exportsDon Brady2020-08-201-1/+4
| | | | | | | | | | | | | | | | | | This is a follow on to PR #10688 where `zfs share -a` allows the sharing of canmount=noauto datasets if they are mounted. However, when a dataset with canmount=noauto is not mounted, the command should also purge any existing entries from the exports file. Otherwise, after a reboot, the nfs server attempts to export the underlying mountpath, not the dataset. This can lead to a hard hang for existing client mounts. Instead of just skipping the adding of an export if not mounted and canmount=noauto, have it also remove an existing export of the dataset so that, after a reboot, we don't export an unmounted dataset. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #10747
* Add zstd support to zfsMichael Niewöhner2020-08-202-11/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Co-authored-by: Sebastian Gottschall <[email protected]> Co-authored-by: Kjeld Schouten-Lebbing <[email protected]> Co-authored-by: Michael Niewöhner <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Sebastian Gottschall <[email protected]> Signed-off-by: Kjeld Schouten-Lebbing <[email protected]> Signed-off-by: Michael Niewöhner <[email protected]> Closes #6247 Closes #9024 Closes #10277 Closes #10278
* ZED: Do not offline a missing device if no spare is availableBrian Behlendorf2020-08-181-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | Due to commit d48091d a removed device is now explicitly offlined by the ZED if no spare is available, rather than the letting ZFS detect it as UNAVAIL. This broke auto-replacing of whole-disk devices, as described in issue #10577. In short, when a new device is reinserted in the same slot, the ZED will try to ONLINE it without letting ZFS recreate the necessary partition table. This change simply avoids setting the device OFFLINE when removed if no spare is available (or if spare_on_remove is false). This change has been left minimal to allow it to be backported to 0.8.x release. The auto_offline_001_pos ZTS test has been updated accordingly. Some follow up work is planned to update the ZED so it transitions the vdev to a REMOVED state. This is a state which has always existed but there is no current interface the ZED can use to accomplish this. Therefore it's being left to a follow up PR. Reviewed-by: Gionatan Danti <[email protected]> Co-authored-by: Gionatan Danti <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10577 Closes #10730
* Include scatter_chunk_waste in arc_sizeMatthew Ahrens2020-08-171-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ARC caches data in scatter ABD's, which are collections of pages, which are typically 4K. Therefore, the space used to cache each block is rounded up to a multiple of 4K. The ABD subsystem tracks this wasted memory in the `scatter_chunk_waste` kstat. However, the ARC's `size` is not aware of the memory used by this round-up, it only accounts for the size that it requested from the ABD subsystem. Therefore, the ARC is effectively using more memory than it is aware of, due to the `scatter_chunk_waste`. This impacts observability, e.g. `arcstat` will show that the ARC is using less memory than it effectively is. It also impacts how the ARC responds to memory pressure. As the amount of `scatter_chunk_waste` changes, it appears to the ARC as memory pressure, so it needs to resize `arc_c`. If the sector size (`1<<ashift`) is the same as the page size (or larger), there won't be any waste. If the (compressed) block size is relatively large compared to the page size, the amount of `scatter_chunk_waste` will be small, so the problematic effects are minimal. However, if using 512B sectors (`ashift=9`), and the (compressed) block size is small (e.g. `compression=on` with the default `volblocksize=8k` or a decreased `recordsize`), the amount of `scatter_chunk_waste` can be very large. On a production system, with `arc_size` at a constant 50% of memory, `scatter_chunk_waste` has been been observed to be 10-30% of memory. This commit adds `scatter_chunk_waste` to `arc_size`, and adds a new `waste` field to `arcstat`. As a result, the ARC's memory usage is more observable, and `arc_c` does not need to be adjusted as frequently. Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10701
* Fix reporting of L2ARC writes in arc_summary3George Amanakis2020-08-171-3/+3
| | | | | | | | | | arc_summary3 reports L2ARC writes in bytes. However, the related arc_stat is reported as hits. arc_summary2 report this correctly. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10717
* 'zfs share -a' should handle 'canmount=noauto'George Wilson2020-08-111-1/+9
| | | | | | | | | | | | | | | | The 'zfs share -a' currently skips any filesystems which have 'canmount=noauto' set. This behavior is unexpected since the one would expect 'zfs share -a' to share any mounted filesystem that has the 'sharenfs' property already set. This changes the behavior of 'zfs share -a' to allow the sharing of 'canmount=noauto' datasets if they are mounted. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Don Brady <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: George Wilson <[email protected]> External-issue: DLPX-71313 Closes #10688
* Changes to make openzfs build within FreeBSD buildworldMatthew Macy2020-07-311-1/+1
| | | | | | | | | A collection of header changes to enable FreeBSD to build with vendored OpenZFS. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10635
* Rename refcount.h to zfs_refcount.hMatthew Macy2020-07-291-1/+1
| | | | | | | | | Renamed to avoid conflicting with refcount.h when a different implementation is already provided by the platform. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10620
* Add support to decode a resume tokentony-zfs2020-07-234-1/+86
| | | | | | | | | | | | Adding a new subcommand to zstream called token. This now allows users to decode a resume token to retrieve the toname field. This can be useful for tools that need this information. The syntax works as follows zstream token <resume_token>. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Zuchowski <[email protected]> Signed-off-by: Tony Perkins <[email protected]> Closes #10558
* FreeBSD: Add legacy arc_min and arc_maxRyan Moeller2020-07-191-1/+1
| | | | | | | | | These tunables were renamed from vfs.zfs.arc_min and vfs.zfs.arc_max to vfs.zfs.arc.min and vfs.zfs.arc.max. Add legacy compat tunables for the old names. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10579
* Extend zdb to print inconsistencies in livelists and metaslabsMatthew Ahrens2020-07-142-57/+599
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Livelists and spacemaps are data structures that are logs of allocations and frees. Livelists entries are block pointers (blkptr_t). Spacemaps entries are ranges of numbers, most often used as to track allocated/freed regions of metaslabs/vdevs. These data structures can become self-inconsistent, for example if a block or range can be "double allocated" (two allocation records without an intervening free) or "double freed" (two free records without an intervening allocation). ZDB (as well as zfs running in the kernel) can detect these inconsistencies when loading livelists and metaslab. However, it generally halts processing when the error is detected. When analyzing an on-disk problem, we often want to know the entire set of inconsistencies, which is not possible with the current behavior. This commit adds a new flag, `zdb -y`, which analyzes the livelist and metaslab data structures and displays all of their inconsistencies. Note that this is different from the leak detection performed by `zdb -b`, which checks for inconsistencies between the spacemaps and the tree of block pointers, but assumes the spacemaps are self-consistent. The specific checks added are: Verify livelists by iterating through each sublivelists and: - report leftover FREEs - report double ALLOCs and double FREEs - record leftover ALLOCs together with their TXG [see Cross Check] Verify spacemaps by iterating over each metaslab and: - iterate over spacemap and then the metaslab's entries in the spacemap log, then report any double FREEs and double ALLOCs Verify that livelists are consistenet with spacemaps. The space referenced by livelists (after using the FREE's to cancel out corresponding ALLOCs) should be allocated, according to the spacemaps. Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Sara Hartse <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> External-issue: DLPX-66031 Closes #10515
* Centralize variable substitutionArvind Sankar2020-07-149-42/+22
| | | | | | | | | | | | A bunch of places need to edit files to incorporate the configured paths i.e. bindir, sbindir etc. Move this logic into a common file. Create arc_summary by copying arc_summary[23] as appropriate at build time instead of install time. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10559
* Remove dependency on sharetab file and refactor sharing logicGeorge Wilson2020-07-132-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | == Motivation and Context The current implementation of 'sharenfs' and 'sharesmb' relies on the use of the sharetab file. The use of this file is os-specific and not required by linux or freebsd. Currently the code must maintain updates to this file which adds complexity and presents a significant performance impact when sharing many datasets. In addition, concurrently running 'zfs sharenfs' command results in missing entries in the sharetab file leading to unexpected failures. == Description This change removes the sharetab logic from the linux and freebsd implementation of 'sharenfs' and 'sharesmb'. It still preserves an os-specific library which contains the logic required for sharing NFS or SMB. The following entry points exist in the vastly simplified libshare library: - sa_enable_share -- shares a dataset but may not commit the change - sa_disable_share -- unshares a dataset but may not commit the change - sa_is_shared -- determine if a dataset is shared - sa_commit_share -- notify NFS/SMB subsystem to commit the shares - sa_validate_shareopts -- determine if sharing options are valid The sa_commit_share entry point is provided as a performance enhancement and is not required. The sa_enable_share/sa_disable_share may commit the share as part of the implementation. Libshare provides a framework for both NFS and SMB but some operating systems may not fully support these protocols or all features of the protocol. NFS Operation: For linux, libshare updates /etc/exports.d/zfs.exports to add and remove shares and then commits the changes by invoking 'exportfs -r'. This file, is automatically read by the kernel NFS implementation which makes for better integration with the NFS systemd service. For FreeBSD, libshare updates /etc/zfs/exports to add and remove shares and then commits the changes by sending a SIGHUP to mountd. SMB Operation: For linux, libshare adds and removes files in /var/lib/samba/usershares by calling the 'net' command directly. There is no need to commit the changes. FreeBSD does not support SMB. == Performance Results To test sharing performance we created a pool with an increasing number of datasets and invoked various zfs actions that would enable and disable sharing. The performance testing was limited to NFS sharing. The following tests were performed on an 8 vCPU system with 128GB and a pool comprised of 4 50GB SSDs: Scale testing: - Share all filesystems in parallel -- zfs sharenfs=on <dataset> & - Unshare all filesystems in parallel -- zfs sharenfs=off <dataset> & Functional testing: - share each filesystem serially -- zfs share -a - unshare each filesystem serially -- zfs unshare -a - reset sharenfs property and unshare -- zfs inherit -r sharenfs <pool> For 'zfs sharenfs=on' scale testing we saw an average reduction in time of 89.43% and for 'zfs sharenfs=off' we saw an average reduction in time of 83.36%. Functional testing also shows a huge improvement: - zfs share -- 97.97% reduction in time - zfs unshare -- 96.47% reduction in time - zfs inhert -r sharenfs -- 99.01% reduction in time Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Bryant G. Ly <[email protected]> Signed-off-by: George Wilson <[email protected]> External-Issue: DLPX-68690 Closes #1603 Closes #7692 Closes #7943 Closes #10300
* Unconditionally enable debugging for libzpoolSerapheim Dimitropoulos2020-07-104-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | We already enable -DDEBUG unconditionally (meaning regardless of this is a debug build or a performance build) for zdb and ztest as they are mostly used for development and debugging. This patch enables -DDEBUG for libzpool extending the debugging checks for zdb, ztest, and a couple of other test utilities. In addition to passing -DDEBUG we also enable -DZFS_DEBUG so all assertion checks work s expected. We do so not only in libzpool but in every utility that links to it, even if the utility doesn't directly use any functionality wrapped in ZFS_DEBUG macro definitions. The reason is that these utilities may still include headers that contain structs that have more fields when ZFS_DEBUG is defined. This can be a problem as enabling that flag for libzpool but not for zdb can lead into random problems (e.g. segmentation faults) as zdb may be have an incorrect view of a struct passed to it by libzpool. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #10549
* Use abs_top_builddir when referencing librariesArvind Sankar2020-07-1011-33/+33
| | | | | | | | | | | | | | | | | | | | | libtool stores absolute paths in the dependency_libs component of the .la files. If the Makefile for a dependent library refers to the libraries by relative path, some libraries end up duplicated on the link command line. As an example, libzfs specifies libzfs_core, libnvpair and libuutil as dependencies to be linked in. The .la file for libzfs_core also specifies libnvpair, but using an absolute path, with the result that libnvpair is present twice in the linker command line for producing libzfs. While the only thing this causes is to slightly slow down the linking, we can avoid it by using absolute paths everywhere, including for convenience libraries just for consistency. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10538
* Add config.rpath for AM_GNU_GETTEXTArvind Sankar2020-07-103-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | Commit e8864b1b28c2 ("config: libintl/libiconv for gettext() detection") added an empty config.rpath with a comment that the real one doesn't work with libtool. However, an empty config.rpath doesn't really work: eg. on FreeBSD, where libintl is in /usr/local/lib, configure thinks that gettext doesn't exist and NLS should be disabled, which currently isn't supported in the source, and hence requires manual workaround to directly link -lintl without relying on configure. config.rpath is essential to let it be detected either in --prefix or using --with-libintl-prefix. I also don't see the mentioned issue with libtool flags applied to compilation, it seems to work fine to pass LTLIBINTL to libtool. It's unnecessary to include LTLIBICONV as the configure test will automatically append that to LTLIBINTL if it is necessary to link with libiconv. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10538