aboutsummaryrefslogtreecommitdiffstats
path: root/lib
Commit message (Collapse)AuthorAgeFilesLines
* Fix handling of DNS names with '-' in them for sharenfsrmacklem2024-09-171-1/+3
| | | | | | | | | | | | | | An old FreeBSD bugzilla report PR#168158 notes that DNS names with '-'s in them cannot be used for the sharenfs property. This patch fixes the parsing of these DNS names. The only negative affect this patch might have is that, if a user has incorrectly separated options with a '-' the sharenfs setting will no longer work once this patch is applied. Reviewed by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rick Macklem <[email protected]> Closes #16529
* Adding Direct IO SupportBrian Atkinson2024-09-144-283/+436
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding O_DIRECT support to ZFS to bypass the ARC for writes/reads. O_DIRECT support in ZFS will always ensure there is coherency between buffered and O_DIRECT IO requests. This ensures that all IO requests, whether buffered or direct, will see the same file contents at all times. Just as in other FS's , O_DIRECT does not imply O_SYNC. While data is written directly to VDEV disks, metadata will not be synced until the associated TXG is synced. For both O_DIRECT read and write request the offset and request sizes, at a minimum, must be PAGE_SIZE aligned. In the event they are not, then EINVAL is returned unless the direct property is set to always (see below). For O_DIRECT writes: The request also must be block aligned (recordsize) or the write request will take the normal (buffered) write path. In the event that request is block aligned and a cached copy of the buffer in the ARC, then it will be discarded from the ARC forcing all further reads to retrieve the data from disk. For O_DIRECT reads: The only alignment restrictions are PAGE_SIZE alignment. In the event that the requested data is in buffered (in the ARC) it will just be copied from the ARC into the user buffer. For both O_DIRECT writes and reads the O_DIRECT flag will be ignored in the event that file contents are mmap'ed. In this case, all requests that are at least PAGE_SIZE aligned will just fall back to the buffered paths. If the request however is not PAGE_SIZE aligned, EINVAL will be returned as always regardless if the file's contents are mmap'ed. Since O_DIRECT writes go through the normal ZIO pipeline, the following operations are supported just as with normal buffered writes: Checksum Compression Encryption Erasure Coding There is one caveat for the data integrity of O_DIRECT writes that is distinct for each of the OS's supported by ZFS. FreeBSD - FreeBSD is able to place user pages under write protection so any data in the user buffers and written directly down to the VDEV disks is guaranteed to not change. There is no concern with data integrity and O_DIRECT writes. Linux - Linux is not able to place anonymous user pages under write protection. Because of this, if the user decides to manipulate the page contents while the write operation is occurring, data integrity can not be guaranteed. However, there is a module parameter `zfs_vdev_direct_write_verify` that controls the if a O_DIRECT writes that can occur to a top-level VDEV before a checksum verify is run before the contents of the I/O buffer are committed to disk. In the event of a checksum verification failure the write will return EIO. The number of O_DIRECT write checksum verification errors can be observed by doing `zpool status -d`, which will list all verification errors that have occurred on a top-level VDEV. Along with `zpool status`, a ZED event will be issues as `dio_verify` when a checksum verification error occurs. ZVOLs and dedup is not currently supported with Direct I/O. A new dataset property `direct` has been added with the following 3 allowable values: disabled - Accepts O_DIRECT flag, but silently ignores it and treats the request as a buffered IO request. standard - Follows the alignment restrictions outlined above for write/read IO requests when the O_DIRECT flag is used. always - Treats every write/read IO request as though it passed O_DIRECT and will do O_DIRECT if the alignment restrictions are met otherwise will redirect through the ARC. This property will not allow a request to fail. There is also a module parameter zfs_dio_enabled that can be used to force all reads and writes through the ARC. By setting this module parameter to 0, it mimics as if the direct dataset property is set to disabled. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Atkinson <[email protected]> Co-authored-by: Mark Maybee <[email protected]> Co-authored-by: Matt Macy <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Closes #10018
* sys/types32.h: Remove struct timeval32 from libspl's header (#16491)Jessica Clarke2024-09-091-5/+0
| | | | | | | | | | | | | | macOS Sequoia's sys/sockio.h, as included by various bootstrap tools whilst building FreeBSD, has started to include net/if.h, which then includes sys/_types/_timeval32.h and provide a conflicting definition for struct timeval32. Since this type is entirely unused within OpenZFS, simply delete the type rather than adding in some kind of OS detection. This fixes building FreeBSD on macOS Sequoia (Beta). Signed-off-by: Jessica Clarke <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* libzstd: also build with LIBZPOOL_CPPFLAGSRob Norris2024-09-091-0/+2
| | | | | | | | | | | | | | | | | | libzstd now also allocates its own abd_t, and so has the same issue as zstream did, so this applies the same workaround: compile it with ZFS_DEBUG. See 92fca1c2d. This looks weird, because libzstd doesn't appear to look related to the ZFS kernel, but there is already a cross-dependency there: zstd needs zfs_lz4_compress, and zfs needs zfs_zstd_compress (and others), so the two can never really be separated without more work. Another job for another time. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed by: Brian Behlendorf <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16489
* value strings: pretty printers for flags and enumsRob Norris2024-09-053-0/+53
| | | | | | | | | | | This adds zfs_valstr, a collection of pretty printers for bitfields and enums. These are useful in debugging, logging and other display contexts where raw values are difficult for the untrained (or even trained!) eye to decipher. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
* Add DDT prune commandDon Brady2024-09-044-12/+120
| | | | | | | | | | | | | | | | Requires the new 'flat' physical data which has the start time for a class entry. The amount to prune can be based on a target percentage of the unique entries or based on the age (i.e., every entry older than N days). Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #16277
* build: rename FORCEDEBUG_CPPFLAGS to LIBZPOOL_CPPFLAGSRob Norris2024-08-271-1/+1
| | | | | | | | | | | | | | | This is just a very small attempt to make it more obvious that these flags aren't optional for libzpool-using programs, by not making it seem like there's an option to say "well, I don't _want_ to force debugging". Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Rich Ercolani <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Issue #16476 Closes #16477
* zpool: Provide GUID to zpool-reguid(8) with -g (#16239)Mateusz Piotrowski2024-08-262-4/+43
| | | | | | | | | | | | | | This commit extends the zpool-reguid(8) command with a -g flag, which allows the user to specify the GUID to set. This change also adds some general tests for zpool-reguid(8). Sponsored-by: Wasabi Technology, Inc. Sponsored-by: Klara, Inc. Signed-off-by: Mateusz Piotrowski <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* libzpool/abd_os: iovec-based scatter abdRob Norris2024-08-212-300/+173
| | | | | | | | | | | | | This is intended to be a simple userspace scatter abd based on struct iovec. It's not very sophisticated as-is, but sets a base for something much more interesting. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16253
* abd_os: break out platform-specific header partsRob Norris2024-08-214-0/+94
| | | | | | | | | | | | Removing the platform #ifdefs from shared headers in favour of per-platform headers. Makes abd_t much leaner, among other things. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16253
* abd_os: split userspace and Linux kernel codeRob Norris2024-08-212-1/+493
| | | | | | | | | | | | | | The Linux abd_os.c serves double-duty as the userspace scatter abd implementation, by carrying an emulation of kernel scatterlists. This commit lifts common and userspace-specific parts out into a separate abd_os.c for libzpool. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16253
* ddt: dedup logRob Norris2024-08-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds a log/journal to dedup. At the end of txg, instead of writing the entry directly to the ZAP, instead its adding to an in-memory tree and appended to an on-disk object. The on-disk object is only read at import, to reload the in-memory tree. Lookups first go the the log tree before going to the ZAP, so recently-used entries will remain close by in memory. This vastly reduces overhead from dedup IO, as it will not have to do so many read/update/write cycles on ZAP leaf nodes. A flushing facility is added at end of txg, to push logged entries out to the ZAP. There's actually two separate "logs" (in-memory tree and on-disk object), one active (recieving updated entries) and one flushing (writing out to disk). These are swapped (ie flushing begins) based on memory used by the in-memory log trees and time since we last flushed something. The flushing facility monitors the amount of entries coming in and being flushed out, and calibrates itself to try to flush enough each txg to keep up with the ingest rate without competing too much with other IO. Multiple tuneables are provided to control the flushing facility. All the histograms and stats are update to accomodate the log as a separate entry store. zdb gains knowledge of how to count them and dump them. Documentation included! Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Allan Jude <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15895
* ddt: add FDT feature and support for legacy and new on-disk formatsRob Norris2024-08-161-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the supporting infrastructure for the upcoming dedup features. Traditionally, dedup objects live directly in the MOS root. While their details vary (checksum, type and class), they are all the same "kind" of thing - a store of dedup entries. The new features are more varied than that, and are better thought of as a set of related stores for the overall state of a dedup table. This adds a new feature flag, SPA_FEATURE_FAST_DEDUP. Enabling this will cause new DDTs to be created as a ZAP in the MOS root, named DDT-<checksum>. The is used as the root object for the normal type/class store objects, but will also be a place for any storage required by new features. This commit adds two new fields to ddt_t, for version and flags. These are intended to describe the structure and features of the overall dedup table, and are stored as-is in the DDT root. In this commit, flags are always zero, but the intent is that they can be used to hang optional logic or state onto for new dedup features. Version is always 1. For a "legacy" dedup table, where no DDT root directory exists, the version will be 0. ddt_configure() is expected to determine the version and flags features currently in operation based on whether or not the fast_dedup feature is enabled, and from what's available on disk. In this way, its possible to support both old and new tables. This also provides a migration path. A legacy setup can be upgraded to FDT by creating the DDT root ZAP, moving the existing objects into it, and setting version and flags appropriately. There's no support for that here, but it would be straightforward to add later and allows the possibility that newer features could be applied to existing dedup tables. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Allan Jude <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15892
* Fix incorrect error report on vdev attach/replaceAmeer Hamza2024-08-151-0/+7
| | | | | | | | | Report the correct error message in libzfs when attaching/replacing a vdev with a higher ashift. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #16449
* Fix memory corruption during parallel zpool import with -o cachefile (#16419)Alan Somers2024-08-071-9/+16
| | | | | | | | | | | | | | | When importing multiple pools, the nvlist of properties given with "-o" is shared amongst the several threads. So no thread should modify it. Previously, in the course of validating the cachefile property, the zpool_valid_proplist function would temporarily modify the value, and then change it back. Now it will operate on a clone of the value. Sponsored by: Axcient Fixes #16405 Signed-off-by: Alan Somers <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Alexander Motin <[email protected]>
* JSON: Fix class values for mirrored special vdevsTony Hutter2024-08-062-13/+57
| | | | | | | | | | | | | | | | | | | | | This fixes things so mirrored special vdevs report themselves as "class=special" rather than "class=normal". This happens due to the way the vdev nvlists are constructed: mirrored special devices - The 'mirror' vdev has allocation bias as "special" and it's leaf vdevs are "normal" single or RAID0 special devices - Leaf vdevs have allocation bias as "special". This commit adds in code to check if a leaf's parent is a "special" vdev to see if it should also report "special". Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Umer Saleem <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #16217
* JSON output support for zpool statusUmer Saleem2024-08-066-33/+144
| | | | | | | | | | | | | This commit adds support for zpool status command to displpay status of ZFS pools in JSON format using '-j' option. Status information is collected in nvlist which is later dumped on stdout in JSON format. Existing options for zpool status work with '-j' flag. man page for zpool status is updated accordingly. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #16217
* JSON output support for zpool listUmer Saleem2024-08-062-0/+24
| | | | | | | | | | | | | | | This commit adds support for zpool list command to output the list of ZFS pools in JSON format using '-j' option.. Information about available pools is collected in nvlist which is later printed to stdout in JSON format. Existing options for zfs list command work with '-j' flag. man page for zpool list is updated accordingly. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #16217
* JSON output support for zpool getUmer Saleem2024-08-061-0/+20
| | | | | | | | | | | This commit adds support for zpool get command to output the list of properties for ZFS Pools and VDEVS in JSON format using '-j' option. Man page for zpool get is updated to include '-j' option. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #16217
* JSON output support for zfs version and zfs getUmer Saleem2024-08-061-0/+134
| | | | | | | | | | | | | | | | This commit adds support for JSON output for zfs version and zfs get commands. '-j' flag can be used to get output in JSON format. Information is collected in nvlist objects which is later printed in JSON format. Existing options that work for zfs get and zfs version also work with '-j' flag. man pages for zfs get and zfs version are updated accordingly. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #16217
* ddt: add support for prefetching tables into the ARCAllan Jude2024-07-265-8/+132
| | | | | | | | | | | | | | | | | | | | | | This change adds a new `zpool prefetch -t ddt $pool` command which causes a pool's DDT to be loaded into the ARC. The primary goal is to remove the need to "warm" a pool's cache before deduplication stops slowing write performance. It may also provide a way to reload portions of a DDT if they have been flushed due to inactivity. Sponsored-by: iXsystems, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Klara, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Will Andrews <[email protected]> Signed-off-by: Fred Weigel <[email protected]> Signed-off-by: Rob Norris <[email protected]> Signed-off-by: Don Brady <[email protected]> Co-authored-by: Will Andrews <[email protected]> Co-authored-by: Don Brady <[email protected]> Closes #15890
* ddt: dedup table quota enforcementAllan Jude2024-07-253-1/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds two new pool properties: - dedup_table_size, the total size of all DDTs on the pool; and - dedup_table_quota, the maximum possible size of all DDTs in the pool When set, quota will be enforced by checking when a new entry is about to be created. If the pool is over its dedup quota, the entry won't be created, and the corresponding write will be converted to a regular non-dedup write. Note that existing entries can be updated (ie their refcounts changed), as that reuses the space rather than requiring more. dedup_table_quota can be set to 'auto', which will set it based on the size of the devices backing the "dedup" allocation device. This makes it possible to limit the DDTs to the size of a dedup vdev only, such that when the device fills, no new blocks are deduplicated. Sponsored-by: iXsystems, Inc. Sponsored-By: Klara Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Signed-off-by: Don Brady <[email protected]> Co-authored-by: Don Brady <[email protected]> Co-authored-by: Rob Wing <[email protected]> Co-authored-by: Sean Eric Fagan <[email protected]> Closes #15889
* Add support for multiple lines to the sharenfs property for FreeBSD (#16338)rmacklem2024-07-231-20/+51
| | | | | | | | | | | | There has been a bugzilla PR#147881 requesting this for a long time (14 years!). It extends the syntax of the ZFS shanenfs property (for FreeBSD only) to allow multiple sets of options for different hosts/nets, separated by ';'s. Signed-off-by: Rick Macklem <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Add some missing vdev properties (#16346)Don Brady2024-07-232-1/+11
| | | | | | | | | Sponsored-by: Klara, Inc. Sponsored-By: Wasabi Technology, Inc. Signed-off-by: Don Brady <[email protected]> Co-authored-by: Don Brady <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Fix printf typo for `zfs receive -cv` (#16295)glibg10b2024-07-171-1/+1
| | | | | | | | | | | Current output: > receiving correctivefull stream of a into b New output: > receiving corrective full stream of a into b Signed-off-by: glibg10b <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* zdb: detect cachefile automatically otherwise force importAmeer Hamza2024-06-032-0/+12
| | | | | | | | | | | | If a pool is created with the cache file located in a non-default path /etc/default/zpool.cache, removed, or the cachefile property is set to none, zdb fails to show the pool unless we specify the cache file or use the -e option. This PR automates this process. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Akash B <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #16071
* icp: remove skein moduleRob Norris2024-05-311-1/+0
| | | | | | | | | | Nothing calls it through the KCF interface, so this is all unused. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16209
* icp: brutally remove unused AES modesRob Norris2024-05-311-3/+0
| | | | | | | | | | Still retaining the struture, for now. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16209
* spl: fix compilation without HAVE_BACKTRACEMartin Matuška2024-05-291-0/+2
| | | | | | | The __maybe_unused macro is defined in spl/sys/debug.h Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Martin Matuska <[email protected]> Closes #16229
* Use memset to zero stack allocations containing unionsRob N2024-05-241-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | C99 6.7.8.17 says that when an undesignated initialiser is used, only the first element of a union is initialised. If the first element is not the largest within the union, how the remaining space is initialised is up to the compiler. GCC extends the initialiser to the entire union, while Clang treats the remainder as padding, and so initialises according to whatever automatic/implicit initialisation rules are currently active. When Linux is compiled with CONFIG_INIT_STACK_ALL_PATTERN, -ftrivial-auto-var-init=pattern is added to the kernel CFLAGS. This flag sets the policy for automatic/implicit initialisation of variables on the stack. Taken together, this means that when compiling under CONFIG_INIT_STACK_ALL_PATTERN on Clang, the "zero" initialiser will only zero the first element in a union, and the rest will be filled with a pattern. This is significant for aes_ctx_t, which in aes_encrypt_atomic() and aes_decrypt_atomic() is initialised to zero, but then used as a gcm_ctx_t, which is the fifth element in the union, and thus gets pattern initialisation. Later, it's assumed to be zero, resulting in a hang. As confusing and undiscoverable as it is, by the spec, we are at fault when we initialise a structure containing a union with the zero initializer. As such, this commit replaces these uses with an explicit memset(0). Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16135 Closes #16206
* backtrace: rework for signal safetyRob Norris2024-05-143-28/+65
| | | | | | | | | | Mostly, try a lot harder to not allocate anything. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16181
* libspl: lift backtrace into a separate fileRob Norris2024-05-1410-77/+422
| | | | | | | | | | | If it's going to be used directly by zdb/ztest, then it sort of doesn't make sense to carry it with the assert code. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16181
* zdb/ztest: use libspl backtrace for crashesRob Norris2024-05-142-3/+8
| | | | | | | | | | We can show much nicer backtraces these days, lets use them. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16181
* Better control the thread pool size when mounting datasetsAlan Somers2024-05-142-13/+15
| | | | | | | | | | | | | | | | | | | | Ever since a10d50f999, ZFS has mounted file systems in parallel when importing a pool. It uses a fixed size of 512 for the thread pool. But since c183d164aa1, it has also imported pools in parallel. So the total number of threads at one time is 513 * npools + 1. That can easily exceed the system's limit on the number of threads per process, which will cause one or more pools to be unable to allocate any worker threads, forcing them to fallback to slow serial mounting . To forestall that, manage the threadpool size in /sbin/zpool, not libzfs. Use the same size (512), but divided by the number of pools. This is a backwards-incompatible change to the libzfs abi. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes #16178
* libzfs: Fix mounting datasets under thread limit pressureAlan Somers2024-05-141-1/+4
| | | | | | | | | | | | | | | During parallel zpool import, /sbin/zpool will create a separate thread pool for each pool, used to mount that pool's datasets. If the total thread count exceed's the system's limit on threads per process, then tpool_dispatch may fail. If it does, directly execute the mount operation instead. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes #16178 Fixes #16172
* tpool_dispatch: fail if it cannot start at least 1 worker.Alan Somers2024-05-141-8/+20
| | | | | | | | Sponsored by: Axcient Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes #16178
* Replace P2ALIGN with P2ALIGN_TYPED and delete P2ALIGN.chenqiuhao19972024-05-103-4/+6
| | | | | | | | | | | | In P2ALIGN, the result would be incorrect when align is unsigned integer and x is larger than max value of the type of align. In that case, -(align) would be a positive integer, which means high bits would be zero and finally stay zero after '&' when align is converted to a larger integer type. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Youzhong Yang <[email protected]> Signed-off-by: Qiuhao Chen <[email protected]> Closes #15940
* libspl_assert: always link -lpthread on FreeBSDRob N2024-05-091-0/+4
| | | | | | | | | | | The pthread_* functions are in -lpthread on FreeBSD. Some of them are implicitly linked through libc, but on FreeBSD 13 at least pthread_getname_np() is not. Just be explicit, since -lpthread is the documented interface anyway. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16168
* Unbreak FreeBSD cross-build on MacOS broken in 051460b8bMartin Matuška2024-05-091-1/+20
| | | | | | | | | | | | | | MacOS used FreeBSD-compatible getprogname() and pthread_getname_np(). But pthread_getthreadid_np() does not exist on MacOS. This implements libspl_gettid() using pthread_threadid_np() to get the thread id of the current thread. Tested with FreeBSD GitHub actions freebsd-src/.github/workflows/cross-bootstrap-tools.yml Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Martin Matuska <[email protected]> Closes #16167
* libspl/assert: use libunwind for backtrace when availableRob Norris2024-05-012-3/+34
| | | | | | | | | | | libunwind seems to do a better job of resolving a symbols than backtrace(), and is also useful on platforms that don't have backtrace() (eg musl). If it's available, use it. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140
* libspl/assert: dump backtrace in assertRob Norris2024-05-012-0/+22
| | | | | | | | | | Adds a check for the backtrace() function. If available, uses it to show a stack backtrace in the assertion output. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140
* libspl/assert: add lock around assertion outputRob Norris2024-05-011-0/+6
| | | | | | | | | | | | | | | | | | | | | If multiple threads trip an assertion at the same moment (quite common), they can be printing at the same time, and their output gets messy. This adds a simple lock around the whole thing, to prevent a second task printing assert output before the first has finished. Additionally, if libspl_assert_ok is not set, abort() is called without dropping the lock, so that any other asserting tasks will be killed before starting any output, rather than only getting part-way through. This is a tradeoff; it's assumed that multiple threads asserting at the same moment are likely the same fault in different instances of a thread, and so there won't be any more useful information from the other tasks anyway. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140
* libspl/assert: show process/task details in assert outputRob Norris2024-05-011-2/+34
| | | | | | | | | | | | | | Makes it much easier to see what thing complained. Getting thread id, program name and thread name vary wildly between Linux and FreeBSD, so those are set up in macros. pthread_getname_np() did not appear in musl until very recently, but the same info has always been available via prctl(PR_GET_NAME), so we use that instead. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140
* libzpool: set thread namesRob Norris2024-05-012-3/+6
| | | | | | | | | | | | | Arrange for the thread/task name to be set when new threads are created. This makes them visible in the process table etc. pthread_setname_np() is generally available in glibc, musl and FreeBSD, so no test is required. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140
* Overflowing refreservation is badRich Ercolani2024-04-291-1/+14
| | | | | | | | | | | Someone came to me and pointed out that you could pretty readily cause the refreservation calculation to exceed 2**64, given the 2**17 multiplier in it, and produce refreservations wildly less than the actual volsize in cases where it should have failed. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #15996
* GCC: Fixes for gcc 14 on Fedora 40Tony Hutter2024-04-291-4/+10
| | | | | | | | | | | - Workaround dangling pointer in uu_list.c (#16124) - Fix calloc() transposed arguments in zpool_vdev_os.c - Make some temp variables unsigned to prevent triggering a '-Werror=alloc-size-larger-than' error. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #16124 Closes #16125
* return NULL at end of send_progress_threadJason Lee2024-04-101-0/+1
| | | | | | Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jason Lee <[email protected]> Closes #16074
* Add custom debug printing for your assertsRich Ercolani2024-04-101-0/+97
| | | | | | | | | Being able to print custom debug information on assert trip seems useful. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #15792
* Fix locale-specific timeMaxim Filimonov2024-04-081-2/+4
| | | | | | | | | | | | In `zpool status -t`, scrub date/time is reported using the C locale, while trim time is reported using the current one. This is inconsistent. This patch fixes that. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Maxim Filimonov <[email protected]> Closes #15878 Closes #15879
* Add ashift validation when adding devices to a poolGeorge Wilson2024-03-293-17/+72
| | | | | | | | | | | | | | | | | | | | | | | | | Currently, zpool add allows users to add top-level vdevs that have different ashifts but doing so prevents users from being able to perform a top-level vdev removal. Often times consumers may not realize that they have mismatched ashifts until the top-level removal fails. This feature adds ashift validation to the zpool add command and will fail the operation if the sector size of the specified vdev does not match the existing pool. This behavior can be disabled by using the -f flag. In addition, new flags have been added to provide fine-grained control to disable specific checks. These flags are: --allow-in-use --allow-ashift-mismatch --allow-replicaton-mismatch The force flag will disable all of these checks. Reviewed by: Brian Behlendorf <[email protected]> Reviewed by: Alexander Motin <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Signed-off-by: George Wilson <[email protected]> Closes #15509