aboutsummaryrefslogtreecommitdiffstats
path: root/lib
Commit message (Collapse)AuthorAgeFilesLines
* zpool: speed up importing large pools (#11469)Alan Somers2021-01-211-1/+1
| | | | | | | | | | | | | | | | | The ZFS_IOC_POOL_TRYIMPORT ioctl returns an nvlist from the kernel to a preallocated buffer in userland. Userland must guess how large the buffer should be. If it undersizes it, it must reallocate and try again. That can cost a lot of time for large pools. OpenZFS commit 28b40c8a6e3 set the guess at "zc.zc_nvlist_conf_size * 4" without explanation. On my system, that is too small. From experiment, x 32 is a better multiplier. But I don't know how to calculate it theoretically. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes #11469
* libzutil: optimize zpool_read_label with AIOAlan Somers2021-01-212-11/+45
| | | | | | | | | | | | | | | Read all labels in parallel instead of sequentially. Originally committed as https://cgit.freebsd.org/src/commit/?id=b49e9abcf44cafaf5cfad7029c9a6adbb28346e8 Obtained from: FreeBSD Sponsored by: Spectra Logic, Axcient Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes #11467
* libzutil: don't read extraneous data in zpool_read_labelAlan Somers2021-01-211-5/+6
| | | | | | | | | | | | | | | | | | zpool_read_label doesn't need the full labels including uberblocks. It only needs the vdev_phys_t. This reduces by half the amount of data read to check for a label, speeding up "zpool import", "zpool labelclear", etc. Originally committed as https://cgit.freebsd.org/src/commit/?id=63f8025d6acab1b334373ddd33f940a69b3b54cc Obtained from: FreeBSD Sponsored by: Spectra Logic Corp, Axcient Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes #11467
* Extending FreeBSD UIO StructBrian Atkinson2021-01-201-22/+22
| | | | | | | | | | | | | | In FreeBSD the struct uio was just a typedef to uio_t. In order to extend this struct, outside of the definition for the struct uio, the struct uio has been embedded inside of a uio_t struct. Also renamed all the uio_* interfaces to be zfs_uio_* to make it clear this is a ZFS interface. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Brian Atkinson <[email protected]> Closes #11438
* libzfs_sendrecv: Use fnv* to verify nvlist/nvpair*Ryan Moeller2021-01-141-110/+95
| | | | | | | | Use verified variants of nvlist/nvpair functions where applicable. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11460
* nvlist leaked in zpool_find_config()Matthew Ahrens2020-12-281-3/+4
| | | | | | | | | | | | | | | | | In `zpool_find_config()`, the `pools` nvlist is leaked. Part of it (a sub-nvlist) is returned in `*configp`, but the callers also leak that. Additionally, in `zdb.c:main()`, the `searchdirs` is leaked. The leaks were detected by ASAN (`configure --enable-asan`). This commit resolves the leaks. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #11396
* Linux 5.10 compat: use iov_iter in uio structureBrian Behlendorf2020-12-183-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of the 5.10 kernel the generic splice compatibility code has been removed. All filesystems are now responsible for registering a ->splice_read and ->splice_write callback to support this operation. The good news is the VFS provided generic_file_splice_read() and iter_file_splice_write() callbacks can be used provided the ->iter_read and ->iter_write callback support pipes. However, this is currently not the case and only iovecs and bvecs (not pipes) are ever attached to the uio structure. This commit changes that by allowing full iov_iter structures to be attached to uios. Ever since the 4.9 kernel the iov_iter structure has supported iovecs, kvecs, bvevs, and pipes so it's desirable to pass the entire thing when possible. In conjunction with this the uio helper functions (i.e uiomove(), uiocopy(), etc) have been updated to understand the new UIO_ITER type. Note that using the kernel provided uio_iter interfaces allowed the existing Linux specific uio handling code to be simplified. When there's no longer a need to support kernel's older than 4.9, then it will be possible to remove the iovec and bvec members from the uio structure and always use a uio_iter. Until then we need to maintain all of the existing types for older kernels. Some additional refactoring and cleanup was included in this change: - Added checks to configure to detect available iov_iter interfaces. Some are available all the way back to the 3.10 kernel and are used when available. In particular, uio_prefaultpages() now always uses iov_iter_fault_in_readable() which is available for all supported kernels. - The unused UIO_USERISPACE type has been removed. It is no longer needed now that the uio_seg enum is platform specific. - Moved zfs_uio.c from the zcommon.ko module to the Linux specific platform code for the zfs.ko module. This gets it out of libzfs where it was never needed and keeps this Linux specific code out of the common sources. - Removed unnecessary O_APPEND handling from zfs_iter_write(), this is redundant and O_APPEND is already handled in zfs_write(); Reviewed-by: Colin Ian King <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11351
* FreeBSD libzfs: gcc requires __thread after staticRyan Libby2020-12-141-1/+1
| | | | | | | | | | | | Building libzfs with gcc on FreeBSD failed because gcc is picky about the order of keywords in declarations with __thread, whereas clang is more relaxed. https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Ryan Libby <[email protected]> Closes #11331
* zpool: Dryrun fails to list some devicesAttila Fülöp2020-12-041-1/+23
| | | | | | | | | | | | `zpool create -n` fails to list cache and spare vdevs. `zpool add -n` fails to list spare devices. `zpool split -n` fails to list `special` and `dedup` labels. `zpool add -n` and `zpool split -n` shouldn't list hole devices. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes #11122 Closes #11167
* libzfsbootenv: do not depend on libnvpairAntonio Russo2020-11-221-1/+1
| | | | | | | | | We do not build libnvpair.pc. Moreover, it is automatically pulled in by libzfs.pc, so no additional specific dependency is required. Reviewed by: Toomas Soome <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #11227
* Include the ABI with dist tarballBrian Behlendorf2020-11-215-1/+16
| | | | | | | The ABI should be included when generating the `make dist` tarball since it's required by the `make checkabi` target. Signed-off-by: Brian Behlendorf <[email protected]> Closes #11225
* Add ABI snapshotBrian Behlendorf2020-11-175-0/+12324
| | | | | | | | | | | | Add a snapshot of the current ABI using libabigail-1.7-2. The included ABI passes `make checkabi` for CentOS 7, Fedora 33, Debian 10, and Ubuntu 20.04. This covers a fairly wide range of glibc, gcc, and libabigail versions plus other changes which are platform specific. Reviewed-by: Antonio Russo <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11144
* Library ABI tracking with abigailAntonio Russo2020-11-1711-1/+57
| | | | | | | | | | | | | | | Provide two make targets: checkabi and storeabi. storeabi uses libabigail to generate a reference copy of the ABI for the public libraries. checkabi compares such a reference to the compiled version, failing if they are not compatible. No ABI is generated for libzpool.so, it is only used by ztest and zdb and not external consumers. Co-authored-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #11144
* zpool: correctly align columns with -pнаб2020-11-161-4/+4
| | | | | | | | | | zpool_expand_proplist() now ignores pl_fixed if its new literal argument is true. The rest is a consequence of needing to pass that down. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiao?=~Dska <[email protected]> Closes #11202
* Distributed Spare (dRAID) FeatureBrian Behlendorf2020-11-134-28/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid|raidz|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <[email protected]> Co-authored-by: Mark Maybee <[email protected]> Co-authored-by: Don Brady <[email protected]> Co-authored-by: Matthew Ahrens <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10102
* Fix pointer-is-uint64_t-sized assumption in the ioctl pathAdrian Chadd2020-11-101-2/+2
| | | | | | | | | | | This shows up when compiling freebsd-head on amd64 using gcc-6.4. The lib32 compat build ends up tripping over this assumption. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: adrian chadd <[email protected]> Closes #11068 Closes #11069
* Synchronize library ABI levelsAntonio Russo2020-11-035-17/+6
| | | | | | | | | | | Bump library SOVERSION under Linux to match FreeBSD's. Additionally, this bump properly accounts for the ABI changes relative to ZoL 0.8.5 for the Linux build. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Issue #11144
* Remove UIO_ZEROCOPY functions structuresMatthew Macy2020-10-301-39/+0
| | | | | | | | | | The original xuio zero copy functionality has always been unused on Linux and FreeBSD. Remove this disabled code to avoid any confusion and improve readability. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #11124
* Update references to nonexistent man pages in codeRyan Moeller2020-10-301-5/+5
| | | | | | | | Refer to the correct section or alternative for FreeBSD and Linux. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11132
* Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSDMatthew Macy2020-10-211-1/+0
| | | | | | | | | | The zfs_fsync, zfs_read, and zfs_write function are almost identical between Linux and FreeBSD. With a little refactoring they can be moved to the common code which is what is done by this commit. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #11078
* Cross-platform acltypeRyan Moeller2020-10-131-25/+41
| | | | | | | | | | | | | | | | | The acltype property is currently hidden on FreeBSD and does not reflect the NFSv4 style ZFS ACLs used on the platform. This makes it difficult to observe that a pool imported from FreeBSD on Linux has a different type of ACL that is being ignored, and vice versa. Add an nfsv4 acltype and expose the property on FreeBSD. Make the default acltype nfsv4 on FreeBSD. Setting acltype to an unhanded style is treated the same as setting it to off. The ACLs will not be removed, but they will be ignored. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10520
* FreeBSD: make adjustments for the standalone environmentWarner Losh2020-10-131-4/+0
| | | | | | | | | | | In FreeBSD, there are three compile environments that are supported: user land, the kernel and the bootloader / standalone. Adjust the headers to compile in the standalone environment. Limit kernel-only items from view when _STANDALONE is defined. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Warner Losh <[email protected]> Closes #10998
* FreeBSD: Improve libzfs_error_init messagesRyan Moeller2020-10-133-11/+17
| | | | | | | | | | | | | | | | | | It is a common mistake to have failed to autoload the module due to permission issues when running a ZFS command as a user. "Operation not permitted" is an unhelpfully vague error message. Use a thread-local message buffer to format a nicer error message. We can infer that loading the kernel module failed if the module is not loaded. This can be extended with heuristics for other errors in the future. While looking at this stuff, remove an unused thread-local message buffer found in libspl and remove some inaccurate verbiage from the comment on libzfs_load_module. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11033
* Replace ZFS on Linux references with OpenZFSBrian Behlendorf2020-10-082-4/+4
| | | | | | | | | | | | | This change updates the documentation to refer to the project as OpenZFS instead ZFS on Linux. Web links have been updated to refer to https://github.com/openzfs/zfs. The extraneous zfsonlinux.org web links in the ZED and SPL sources have been dropped. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11007
* libzfs_sendrecv: zfs_send: remove unused pipefd and tid variablesChristian Schwarz2020-10-081-18/+1
| | | | | | | | | | | fixup of 196bee4 On gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), the code removed caused `-Wmaybe-uninitialized` errors. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Christian Schwarz <[email protected]> Closes #11021
* Fix EIO after resuming receive of new dataset over an existing oneAlan Somers2020-10-021-2/+4
| | | | | | | | | | | | | | | | | | | When resuming an interrupted ZFS send stream that creates a new dataset with the same name as an existing dataset, if the existing dataset is accessed after the failed receive, then after the subsequent successful receive it will return EIO. This happens because nothing mounts the new dataset, leaving the old, no longer valid dataset still mounted. This commit fixes zfs receive to always unmount and remount the destination, regardless of whether the stream is a new stream or a resumed stream. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Alan Somers <[email protected]> External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249579 Closes #10995 Closes #10999
* zpool command complains about /etc/exports.dGeorge Wilson2020-09-252-43/+64
| | | | | | | | | | | | | | | If the /etc/exports.d directory does not exist, then we should only create it when we're performing an action which already requires root privileges. This commit moves the directory creation to the enable/disable code path which ensures that we have the appropriate privileges. Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Wilson <[email protected]> Closes #10785 Closes #10934
* FreeBSD: Add support for procfs_listMatthew Macy2020-09-231-0/+1
| | | | | | | | | | The procfs_list interface is required by several kstats. Implement this functionality for FreeBSD to provide access to these kstats. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10890
* libzfs: Don't leak buf if nvlist is too largeAllan Jude2020-09-182-0/+4
| | | | | | | | | | | | | Resolves FreeBSD Coverity defect: CID 1432398: Resource leaks (RESOURCE_LEAK) libzfs: don't leak hdl if there is an error reading env var Resolves FreeBSD Coverity defect: CID 1432395: Resource leaks (RESOURCE_LEAK) Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #10882
* pool may become suspended during device expansionGeorge Wilson2020-09-172-33/+129
| | | | | | | | | | | | | | | | | | | | When expanding a device zfs needs to rescan the partition table to get the correct size. This can only happen when we're in the kernel and requires the device to be closed. As part of the rescan, udev is notified and the device links are removed and recreated. This leave a window where the vdev code may try to reopen the device before udev has recreated the link. If that happens, then the pool may end up in a suspended state. To correct this, we leverage the BLKPG_RESIZE_PARTITION ioctl which allows the partition information to be modified even while it's in use. This ioctl also does not remove the device link associated with the zfs data partition so it eliminates the race condition that can occur in the kernel. Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Wilson <[email protected]> Closes #10897
* libzfsbootenv: lzbe_nvlist_set needs to store bootenv version VB_NVLISTToomas Soome2020-09-171-0/+18
| | | | | | | | A small bug did slip into initial libzfsbootenv; while storing nvlist in nvlist, we should make sure the bootenv is using VB_NVLIST format. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Toomas Soome <[email protected]> Closes #10937
* zfs label bootenv should store data as nvlistToomas Soome2020-09-1510-22/+598
| | | | | | | | | | | | | nvlist does allow us to support different data types and systems. To encapsulate user data to/from nvlist, the libzfsbootenv library is provided. Reviewed-by: Arvind Sankar <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Toomas Soome <[email protected]> Closes #10774
* libzutil depends on libnvpairMatthew Ahrens2020-09-122-2/+7
| | | | | | | | | | | | libzutil depends on libnvpair, but this dependency is undeclared in the build system. Therefore it isn't possible to make a new command that depends on libzutil, but does not (directly) depend on libnvpair. This commit makes this dependency explicit. Reviewed-by: Brian Behlendorf <[email protected]> Reivewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10915
* Display pbkdf2iters property as plain numberFabio Buso2020-09-081-2/+3
| | | | | | | | | The pbkdf2iters property is an iteration counter and should be displayed as plain number rather than in binary unit. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fabio Buso <[email protected]> Closes #10871
* libshare: Add missing headers for nfs.calaviss2020-09-041-0/+1
| | | | | | | | | | On musl libc, zfs failed to compile due to the missing <fcntl.h> include, which is required for `open()` per POSIX. This commit add the missing <fcntl.h> include. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Hiếu Lê <[email protected]> Closes #10880
* Spruce up pkg-config files for libzfs/libzfs_coreRyan Moeller2020-09-042-4/+4
| | | | | | | | | | | | Several of the listed library dependencies are not relevant on FreeBSD. Have ./configure save libraries that are found via pkg-config as ${LIB}_PC and use the configured automake variables instead of hard coded names so we only get what was actually needed. While here, update the URL to point at the OpenZFS Github repo. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10869
* Fixes for running FreeBSD buildworld on Linux/macOS hostsAlexander Richardson2020-09-031-0/+3
| | | | | | | | | | | Adding an #ifdef __FreeBSD__ to a FreeBSD-specific header may seem odd, but these headers are used on non-FreeBSD systems during the bootstrap tools phase. Originally submitted downstream as https://reviews.freebsd.org/D26193 Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alex Richardson <[email protected]> Closes #10863
* Fix -Werror,-Wmacro-redefined in limits.hAlexander Richardson2020-09-011-0/+5
| | | | | | | | | | | Those macros are also defined by the compiler-provided float.h which will be included later on (at least in the FreeBSD buildworld case) and triggers these -Werror warnings. Including <float.h> first and only defining the macros when DBL_DIG/FLT_DIG is missing fixes this problem. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alex Richardson <[email protected]> Closes #10864
* Add 'zfs rename -u' to rename without remountingRyan Moeller2020-09-013-11/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | Allow to rename file systems without remounting if it is possible. It is possible for file systems with 'mountpoint' property set to 'legacy' or 'none' - we don't have to change mount directory for them. Currently such file systems are unmounted on rename and not even mounted back. This introduces layering violation, as we need to update 'f_mntfromname' field in statfs structure related to mountpoint (for the dataset we are renaming and all its children). In my opinion it is worth it, as it allow to update FreeBSD in even cleaner way - in ZFS-only configuration root file system is ZFS file system with 'mountpoint' property set to 'legacy'. If root dataset is named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone it (system/oldrootfs), update FreeBSD and if it doesn't boot we can boot back from system/oldrootfs and rename it back to system/rootfs while it is mounted as /. Before it was not possible, because unmounting / was not possible. Authored by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported by: Matt Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10839
* libspl: Provide platform-specific zone implementationsRyan Moeller2020-08-314-43/+50
| | | | | | | | | | | | | | | | | FreeBSD has the concept of jails, a precursor to Solaris's zones, which can be mapped to the required zones interface with relative ease. The previous ZFS implementation in FreeBSD did so, and we should continue to provide an appropriate implementation in OpenZFS as well. Move lib/libspl/zone.c into platform code and adopt the correct implementation for FreeBSD. While here, prune unused code. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10851
* Fix definition of BLKGETSIZE64 on FreeBSDAlexander Richardson2020-08-271-5/+1
| | | | | | | | | The matching ioctl is DIOCGMEDIASIZE. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Alex Richardson <[email protected]> Closes #10818
* zpool: Change base URL for ZFS messages to openzfs-docsRyan Moeller2020-08-261-1/+2
| | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Kjeld Schouten <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10820
* Don't assert on nvlists larger than SPA_MAXBLOCKSIZEAllan Jude2020-08-252-2/+43
| | | | | | | | | | | | | | | | | | | | | | | | | Originally we asserted that all reads are less than SPA_MAXBLOCKSIZE However, nvlists are not ZFS records, and are not limited to SPA_MAXBLOCKSIZE. Add a new environment variable, ZFS_SENDRECV_MAX_NVLIST, to allow the user to specify the maximum size of the nvlist that can be sent or received. Default value: 4 * SPA_MAXBLOCKSIZE (64 MB) Modify libzfs send routines to return a useful error if the send stream will generate an nvlist that is beyond the maximum size. Modify libzfs recv routines to add an explicit error message if the nvlist is too large, rather than abort()ing. Move the change the assert() to only trigger on data records Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Kjeld Schouten <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #9616
* Avoid symbol collision with in-kernel zstdlibSebastian Gottschall2020-08-241-0/+6
| | | | | | | | | | | For Linux, when zfs is compiled as an in kernel static variant and the in kernel zstd library is compiled statically into the kernel a symbol collision will occur. This wrapper header renames all of the relevant zstd functions to avoid this problem. Reviewed-by: Kjeld Schouten <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Sebastian Gottschall <[email protected]> Closes #10775
* Appease GCC sprintf warnings found on Fedora 32/GCC 10.0.1Chris McDonough2020-08-242-2/+2
| | | | | | | | | Increase the size of DDT_NAMELEN and MNT_LINE_MAX to appease GCC snprintf truncation warnings. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris McDonough <[email protected]> Closes #10712 Closes #10766
* libzstd: Don't warn about stack frame size in userspaceRyan Moeller2020-08-231-11/+3
| | | | | | | | | | | | | | With the current way CFLAGS are modified in libzstd, CFLAGS passed on the make command line will cause the CFLAGS in the Makefile for zstd.c to be discarded, but not AM_CFLAGS. This causes a smaller frame size limit to be used, and the build fails. We don't need to worry about stack frame sizes in userspace. Drop the extra flags. Reviewed-by: Kjeld Schouten <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10773
* Import vdev ashift optimization from FreeBSDRyan Moeller2020-08-211-35/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many modern devices use physical allocation units that are much larger than the minimum logical allocation size accessible by external commands. Two prevalent examples of this are 512e disk drives (512b logical sector, 4K physical sector) and flash devices (512b logical sector, 4K or larger allocation block size, and 128k or larger erase block size). Operations that modify less than the physical sector size result in a costly read-modify-write or garbage collection sequence on these devices. Simply exporting the true physical sector of the device to ZFS would yield optimal performance, but has two serious drawbacks: 1. Existing pools created with devices that have different logical and physical block sizes, but were configured to use the logical block size (e.g. because the OS version used for pool construction reported the logical block size instead of the physical block size) will suddenly find that the vdev allocation size has increased. This can be easily tolerated for active members of the array, but ZFS would prevent replacement of a vdev with another identical device because it now appears that the smaller allocation size required by the pool is not supported by the new device. 2. The device's physical block size may be too large to be supported by ZFS. The optimal allocation size for the vdev may be quite large. For example, a RAID controller may export a vdev that requires read-modify-write cycles unless accessed using 64k aligned/sized requests. ZFS currently has an 8k minimum block size limit. Reporting both the logical and physical allocation sizes for vdevs solves these problems. A device may be used so long as the logical block size is compatible with the configuration. By comparing the logical and physical block sizes, new configurations can be optimized and administrators can be notified of any existing pools that are sub-optimal. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Matthew Macy <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10619
* Add zstd support to zfsMichael Niewöhner2020-08-204-3/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Co-authored-by: Sebastian Gottschall <[email protected]> Co-authored-by: Kjeld Schouten-Lebbing <[email protected]> Co-authored-by: Michael Niewöhner <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Sebastian Gottschall <[email protected]> Signed-off-by: Kjeld Schouten-Lebbing <[email protected]> Signed-off-by: Michael Niewöhner <[email protected]> Closes #6247 Closes #9024 Closes #10277 Closes #10278
* libzfs_core: Initialize fail_ioc_cmd to ZFS_IOC_LASTRyan Moeller2020-08-181-2/+2
| | | | | | | | | FreeBSD numbers `ZFS_IOC_*` starting at 0, so pick a different sentinel value to avoid unintentionally messing with `ZFS_IOC_POOL_CREATE` ioctls. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10729
* Remove unused `zpool_is_bootable`George Melikov2020-08-181-11/+0
| | | | | | | | | | | | | Otherwise compiler errors with: ``` libzfs_pool.c:449:1: error: 'zpool_is_bootable' defined but not used [-Werror=unused-function] ``` Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #10734