aboutsummaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* Add support for zpool user propertiesAllan Jude2023-04-211-0/+2
| | | | | | | | | | | | | | | | Usage: zpool set org.freebsd:comment="this is my pool" poolname Tests are based on zfs_set's user property tests. Also stop truncating property values at MAXNAMELEN, use ZFS_MAXPROPLEN. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Mateusz Piotrowski <[email protected]> Sponsored-by: Beckhoff Automation GmbH & Co. KG. Sponsored-by: Klara Inc. Closes #11680
* Linux: Suppress -Wordered-compare-function-pointers in tracepoint codeRichard Yao2023-04-201-0/+4
| | | | | | | | | | | | Clang points out that there is a comparison against -1, but we cannot fix it because that is from the kernel headers, which we must support. We can workaround this by using a pragma. Sponsored-By: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Youzhong Yang <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14738
* Create zap for root vdevrob-wing2023-04-203-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And add it to the AVZ, this is not backwards compatible with older pools due to an assertion in spa_sync() that verifies the number of ZAPs of all vdevs matches the number of ZAPs in the AVZ. Granted, the assertion only applies to #DEBUG builds - still, a feature flag is introduced to avoid the assertion, com.klarasystems:vdev_zaps_v2 Notably, this allows to get/set properties on the root vdev: % zpool set user:prop=value <pool> root-0 Before this commit, it was already possible to get/set properties on top-level vdevs with the syntax <type>-<vdev_id> (e.g. mirror-0): % zpool set user:prop=value <pool> mirror-0 This syntax also applies to the root vdev as it is is of type 'root' with a vdev_id of 0, root-0. The keyword 'root' as an alias for 'root-0'. The following tests have been added: - zpool get all properties from root vdev - zpool set a property on root vdev - verify root vdev ZAP is created Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Wing <[email protected]> Sponsored-by: Seagate Technology Submitted-by: Klara, Inc. Closes #14405
* Allow MMP to bypass waiting for other threadsHerb Wartens2023-04-191-0/+2
| | | | | | | | | | | | | | | At our site we have seen cases when multi-modifier protection is enabled (multihost=on) on our pool and the pool gets suspended due to a single disk that is failing and responding very slowly. Our pools have 90 disks in them and we expect disks to fail. The current version of MMP requires that we wait for other writers before moving on. When a disk is responding very slowly, we observed that waiting here was bad enough to cause the pool to suspend. This change allows the MMP thread to bypass waiting for other threads and reduces the chances the pool gets suspended. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Herb Wartens <[email protected]> Closes #14659
* Fix "Detach spare vdev in case if resilvering does not happen"Ameer Hamza2023-04-191-0/+1
| | | | | | | | | | | | | | | Spare vdev should detach from the pool when a disk is reinserted. However, spare detachment depends on the completion of resilvering, and if resilver does not schedule, the spare vdev keeps attached to the pool until the next resilvering. When a zfs pool contains several disks (25+ mirror), resilvering does not always happen when a disk is reinserted. In this patch, spare vdev is manually detached from the pool when resilvering does not occur and it has been tested on both Linux and FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #14722
* Silence clang warning of flexible array not at endyouzhongyang2023-04-181-0/+7
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Youzhong Yang <[email protected]> Closes #14764
* Linux 6.3 compat: Fix memcpy "detected field-spanning write" erroryouzhongyang2023-04-131-1/+9
| | | | | | | | Add a new union member of flexible array to dnode_phys_t and use it in the macro so we can silence the memcpy() fortify error. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Youzhong Yang <[email protected]> Closes #14737
* Linux 6.3 compat: idmapped mount API changesyouzhongyang2023-04-1010-42/+97
| | | | | | | | | Linux kernel 6.3 changed a bunch of APIs to use the dedicated idmap type for mounts (struct mnt_idmap), we need to detect these changes and make zfs work with the new APIs. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Youzhong Yang <[email protected]> Closes #14682
* module: freebsd: fix aarch64 fpu handlingKyle Evans2023-04-101-2/+12
| | | | | | | | | | | | Just like x86, aarch64 needs to use the fpu_kern(9) API around FPU usage, otherwise we panic promptly at boot as soon as ZFS attempts to do checksum benchmarking. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Kyle Evans <[email protected]> Closes #14715
* libzfs: add v2 iterator interfacesRob N2023-04-101-7/+19
| | | | | | | | | | | | | | | | | | | | | | | | | f6a0dac84 modified the zfs_iter_* functions to take a new "flags" parameter, and introduced a variety of flags to ask the kernel to limit the results in various ways, reducing the amount of work the caller needed to do to filter out things they didn't need. Unfortunately this change broke the ABI for existing clients (read: older versions of the `zfs` program), and was reverted 399b98198. dc95911d2 reintroduced the original patch, with the understanding that a backwards-compatible fix would be made before the 2.2 release branch was tagged. This commit is that fix. This introduces zfs_iter_*_v2 functions that have the new flags argument, and reverts the existing functions to not have the flags parameter, as they were before. The old functions are now reimplemented in terms of the new, with flags set to 0. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Original-patch-by: George Wilson <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Closes #14597
* Miscellaneous FreBSD compilation bugfixesMartin Matuška2023-04-064-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | Add missing machine/md_var.h to spl/sys/simd_aarch64.h and spl/sys/simd_arm.h In spl/sys/simd_x86.h, PCB_FPUNOSAVE exists only on amd64, use PCB_NPXNOSAVE on i386 In FreeBSD sys/elf_common.h redefines AT_UID and AT_GID on FreeBSD, we need a hack in vnode.h similar to Linux. sys/simd.h needs to be included early. In zfs_freebsd_copy_file_range() we pass a (size_t *)lenp to zfs_clone_range() that expects a (uint64_t *) Allow compiling armv6 world by limiting ARM macros in sha256_impl.c and sha512_impl.c to __ARM_ARCH > 6 Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Signed-off-by: WHR <[email protected]> Signed-off-by: Martin Matuska <[email protected]> Closes #14674
* Fix "Add colored output to zfs list"Tino Reichardt2023-04-051-0/+1
| | | | | | | | | | | Running `zfs list -o avail rpool` resulted in a core dump. This commit will fix this. Run the needed overhead only, when `use_color()` is true. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #14712
* linux 6.3 compat: needs REQ_PREFLUSH | REQ_OP_WRITEyouzhongyang2023-03-311-1/+1
| | | | | | | | | Modify bio_set_flush() so if kernel version is >= 4.10, flags REQ_PREFLUSH and REQ_OP_WRITE are set together. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Youzhong Yang <[email protected]> Closes #14695
* Fixes in persistent error logGeorge Amanakis2023-03-283-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Address the following bugs in persistent error log: 1) Check nested clones, eg "fs->snap->clone->snap2->clone2". 2) When deleting files containing error blocks in those clones (from "clone" the example above), do not break the check chain. 3) When deleting files in the originating fs before syncing the errlog to disk, do not break the check chain. This happens because at the time of introducing the error block in the error list, we do not have its birth txg and the head filesystem. If the original file is deleted before the error list is synced to the error log (which is when we actually lookup the birth txg and the head filesystem), then we do not have access to this info anymore and break the check chain. The most prominent change is related to achieving (3). We expand the spa_error_entry_t structure to accommodate the newly introduced zbookmark_err_phys_t structure (containing the birth txg of the error block).Due to compatibility reasons we cannot remove the zbookmark_phys_t structure and we also need to place the new structure after se_avl, so it is not accounted for in avl_find(). Then we modify spa_log_error() to also provide the birth txg of the error block. With these changes in place we simplify the previously introduced function get_head_and_birth_txg() (now named get_head_ds()). We chose not to follow the same approach for the head filesystem (thus completely removing get_head_ds()) to avoid introducing new lock contentions. The stack sizes of nested functions (as measured by checkstack.pl in the linux kernel) are: check_filesystem [zfs]: 272 (was 912) check_clones [zfs]: 64 We also introduced two new tests covering the above changes. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #14633
* Fix short-lived txg caused by autotrimKevin Jin2023-03-282-0/+2
| | | | | | | | | | | | | | | | | | | | | Current autotrim causes short-lived txg through: 1. calling txg_wait_synced() in metaslab_enable() 2. calling txg_wait_open() with should_quiesce = true This patch addresses all the issues mentioned above. A new cv, vdev_autotrim_kick_cv is added to kick autotrim activity. It will be signaled once a txg is synced so that it does not change the original autotrim pace. Also because it is a cv, the wait is interruptible which speeds up the vdev_autotrim_stop_wait() call. Finally, combining big zfs_txg_timeout, txg_wait_open() also causes delay when exporting a pool. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: jxdking <[email protected]> Issue #8993 Closes #12194
* linux 6.3 compat: add another bdev_io_acct caseRich Ercolani2023-03-271-2/+8
| | | | | | | | | Linux 6.3+, and backports from it (6.2.8+), changed the signatures on bdev_io_{start,end}_acct. Add a case for it. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #14658 Closes #14668
* Drop lying to the compiler in the fletcher4 code Rich Ercolani2023-03-241-20/+4
| | | | | | | | | | | | | | | | | | | This is probably the uncontroversial part of #13631, which fixes a real problem people are having. There's still things to improve in our code after this is merged, but it should stop the breakage that people have reported, where we lie about a type always being aligned and then pass in stack objects with no alignment requirement and hope for the best. Of course, our SIMD code was written with unaligned accesses, so it doesn't care if we drop this...but some auto-vectorized code that gcc emits sure does, since we told it it can assume they're aligned. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #14649
* panic loop when removing slog deviceGeorge Wilson2023-03-241-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a window in the slog removal code where a panic loop could ensue if the system crashes during that operation. The original design of slog removal did not persisted any state because the removal happened synchronously. This was changed by a later commit which persisted the vdev_removing flag and exposed this bug. If a slog removal is in progress and happens to crash after persisting the vdev_removing flag to the label but before the vdev is removed from the spa config, then the pool will continue to panic on import. Here's a sample of the panic: [ 134.387411] VERIFY0(0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp)) failed (0 == 22) [ 134.393865] PANIC at dmu.c:1135:dmu_write() [ 134.396035] Kernel panic - not syncing: VERIFY0(0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp)) failed (0 == 22) [ 134.397857] CPU: 2 PID: 5914 Comm: txg_sync Kdump: loaded Tainted: P OE 5.4.0-1100-dx2023020205-b3751f8c2-azure #106 [ 134.407938] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 [ 134.407938] Call Trace: [ 134.407938] dump_stack+0x57/0x6d [ 134.407938] panic+0xfb/0x2d7 [ 134.407938] spl_panic+0xcf/0x102 [spl] [ 134.407938] ? traverse_impl+0x1ca/0x420 [zfs] [ 134.407938] ? dmu_object_alloc_impl+0x3b4/0x3c0 [zfs] [ 134.407938] ? dnode_hold+0x1b/0x20 [zfs] [ 134.407938] dmu_write+0xc3/0xd0 [zfs] [ 134.407938] ? space_map_alloc+0x55/0x80 [zfs] [ 134.407938] metaslab_sync+0x61a/0x830 [zfs] [ 134.407938] ? queued_spin_unlock+0x9/0x10 [zfs] [ 134.407938] vdev_sync+0x72/0x190 [zfs] [ 134.407938] spa_sync_iterate_to_convergence+0x160/0x250 [zfs] [ 134.407938] spa_sync+0x2f7/0x670 [zfs] [ 134.407938] txg_sync_thread+0x22d/0x2d0 [zfs] [ 134.407938] ? txg_dispatch_callbacks+0xf0/0xf0 [zfs] [ 134.407938] thread_generic_wrapper+0x83/0xa0 [spl] [ 134.407938] kthread+0x104/0x140 [ 134.407938] ? kasan_check_write.constprop.0+0x10/0x10 [spl] [ 134.407938] ? kthread_park+0x90/0x90 [ 134.457802] ret_from_fork+0x1f/0x40 This change no longer persists the vdev_removing flag when removing slog devices and also cleans up some code that was added which is not used. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Signed-off-by: George Wilson <[email protected]> Closes #14652
* Add more ANSI colors to libzfsTino Reichardt2023-03-241-0/+6
| | | | | | | | Reviewed-by: WHR <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ethan Coe-Renner <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #14621
* Fix cloning into already dirty dbufs.Pawel Jakub Dawidek2023-03-241-0/+1
| | | | | | | | | | | | Undirty the dbuf and destroy its buffer when cloning into it. Coverity ID: CID-1535375 Reported-by: Richard Yao Reported-by: Benjamin Coddington Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14655
* Remove unused constant EdonR256_BLOCK_BITSIZETino Reichardt2023-03-221-2/+0
| | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #14650
* spl: cmn_err_once() should be usable in brace-less if else statementsAttila Fülöp2023-03-152-12/+12
| | | | | | | | | | | Commit 11913870 (#14567) added cmn_err_once() by #define'ing a compound statement but failed to consider usage in a single statement brace-less if else. Fix the problem by using the common "do {} while (0)" construct. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes #14629
* Refactor CONFIG_SPE check on Linux/powerpcWHR2023-03-151-18/+10
| | | | | | | | | | | | Commit 5401472 adds a check to call enable_kernel_spe and disable_kernel_spe only if CONFIG_SPE is defined. Refactor this check in a way similar to what CONFIG_ALTIVEC and CONFIG_VSX are checked, in order to remove redundant kfpu_begin() and kfpu_end() implementations. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: WHR <[email protected]> Closes #14623
* Fix missing semicolons in commit 1f196e3WHR2023-03-151-4/+4
| | | | | | | Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: WHR <[email protected]> Closes #14623
* Remove unused Edon-R variantsTino Reichardt2023-03-141-29/+10
| | | | | | | | This commit removes the edonr_byteorder.h file and all unused variants of Edon-R. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13618
* nvpair: Use flexible array member for nvpair name stringsRichard Yao2023-03-141-2/+2
| | | | | | | | | | | | | | | | Coverity reported possible out-of-bounds reads from doing `((char *)(nvp) + sizeof (nvpair_t))` to get the nvpair name string. These were initially marked as false positives, but since we are now using C99 flexible array members elsewhere, we could use them here too as cleanup to make the code easier to understand. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Reported-by: Coverity (CID-977165) Reported-by: Coverity (CID-1524109) Reported-by: Coverity (CID-1524642) Closes #14612
* nvpair: Constify string functionsRichard Yao2023-03-144-15/+19
| | | | | | | | | | | | | | After addressing coverity complaints involving `nvpair_name()`, the compiler started complaining about dropping const. This lead to a rabbit hole where not only `nvpair_name()` needed to be constified, but also `nvpair_value_string()`, `fnvpair_value_string()` and a few other static functions, plus variable pointers throughout the code. The result became a fairly big change, so it has been split out into its own patch. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14612
* Silence clang static analyzer warnings about stored stack addressesRichard Yao2023-03-141-3/+0
| | | | | | | | | | | | | | Clang's static analyzer complains that nvs_xdr() and nvs_native() functions return pointers to stack memory. That is technically true, but the pointers are stored in stack memory from the caller's stack frame, are not read by the caller and are deallocated when the caller returns, so this is harmless. We set the pointers to NULL to silence the warnings. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14612
* Replace dead opensolaris.org license linksTino Reichardt2023-03-142-2/+2
| | | | | | | | | | | The commit replaces all findings of the link: http://www.opensolaris.org/os/licensing with this one: https://opensource.org/licenses/CDDL-1.0 Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: WHR <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #14625
* zcommon: Refactor FPU state handling in fletcher4Attila Fülöp2023-03-141-1/+2
| | | | | | | | | | | | | | | | | | Currently calls to kfpu_begin() and kfpu_end() are split between the init() and fini() functions of the particular SIMD implementation. This was done in #14247 as an optimization measure for the ABD adapter. Unfortunately the split complicates FPU handling on platforms that use a local FPU state buffer, like Windows and macOS. To ease porting, we introduce a boolean struct member in fletcher_4_ops_t, indicating use of the FPU, and move the FPU state handling from the SIMD implementations to the call sites. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes #14600
* Implementation of block cloning for ZFSPawel Jakub Dawidek2023-03-1019-25/+235
| | | | | | | | | | | | | | | Block Cloning allows to manually clone a file (or a subset of its blocks) into another (or the same) file by just creating additional references to the data blocks without copying the data itself. Those references are kept in the Block Reference Tables (BRTs). The whole design of block cloning is documented in module/zfs/brt.c. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Christian Schwarz <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rich Ercolani <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #13392
* Workaround for Linux PowerPC GPL-only cpu_has_feature()Low-power2023-03-102-0/+26
| | | | | | | | | | | | | | | | | | | Linux since 4.7 makes interface 'cpu_has_feature' to use jump labels on powerpc if CONFIG_JUMP_LABEL_FEATURE_CHECKS is enabled, in this case however the inline function references GPL-only symbol 'cpu_feature_keys'. ZFS currently uses 'cpu_has_feature' either directly or indirectly from several places; while it is unknown how this issue didn't break ZFS on 64-bit little-endian powerpc, it is known to break ZFS with many Linux versions on both 32-bit and 64-bit big-endian powerpc. Until this issue is fixed in Linux, we have to workaround it by overriding affected inline functions without depending on 'cpu_feature_keys'. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: WHR <[email protected]> Closes #14590
* Suppress Clang Static Analyzer warning about SNPRINTF_BLKPTR()Richard Yao2023-03-081-0/+1
| | | | | | | | | | | Clang's static analyzer pointed out that if we can pass a -1 array index to copyname[copies] if there are no valid DVAs. This is an absurd situation, but it suggests that we are missing an assertion, so we add it. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* More adaptive ARC evictionAlexander Motin2023-03-082-14/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traditionally ARC adaptation was limited to MRU/MFU distribution. But for years people with metadata-centric workload demanded mechanisms to also manage data/metadata distribution, that in original ZFS was just a FIFO. As result ZFS effectively got separate states for data and metadata, minimum and maximum metadata limits etc, but it all required manual tuning, was not adaptive and in its heart remained a bad FIFO. This change removes most of existing eviction logic, rewriting it from scratch. This makes MRU/MFU adaptation individual for data and meta- data, same as the distribution between data and metadata themselves. Since most of required states separation was already done, it only required to make arcs_size state field specific per data/metadata. The adaptation logic is still based on previous concept of ghost hits, just now it balances ARC capacity between 4 states: MRU data, MRU metadata, MFU data and MFU metadata. To simplify arc_c changes instead of arc_p measured in bytes, this code uses 3 variable arc_meta, arc_pd and arc_pm, representing ARC balance between metadata and data, MRU and MFU for data, and MRU and MFU for metadata respectively as 32-bit fixed point fractions. Since we care about the math result only when need to evict, this moves all the logic from arc_adapt() to arc_evict(), that reduces per-block overhead, since per-block operations are limited to stats collection, now moved from arc_adapt() to arc_access() and using cheaper wmsums. This also allows to remove ugly ARC_HDR_DO_ADAPT flag from many places. This change also removes number of metadata specific tunables, part of which were actually not functioning correctly, since not all metadata are equal and some (like L2ARC headers) are not really evictable. Instead it introduced single opaque knob zfs_arc_meta_balance, tuning ARC's reaction on ghost hits, allowing administrator give more or less preference to metadata without setting strict limits. Some of old code parts like arc_evict_meta() are just removed, because since introduction of ABD ARC they really make no sense: only headers referenced by small number of buffers are not evictable, and they are really not evictable no matter what this code do. Instead just call arc_prune_async() if too much metadata appear not evictable. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Allan Jude <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14359
* Fix build for Linux/powerpc without CONFIG_ALTIVEC or CONFIG_VSXLow-power2023-03-071-8/+22
| | | | | | | | | This fixes building ZFS for Linux 4.7+ powerpc* architecture, where Linux was configured without CONFIG_ALTIVEC or CONFIG_VSX. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: WHR <[email protected]> Closes #14591
* Better handling for future crypto parametersRob N2023-03-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The intent is that this is like ENOTSUP, but specifically for when something can't be done because we have no support for the requested crypto parameters; eg unlocking a dataset or receiving a stream encrypted with a suite we don't support. Its not intended to be recoverable without upgrading ZFS itself. If the request could be made to work by enabling a feature or modifying some other configuration item, then some other code should be used. load-key: In the future we might have more crypto suites (ie new values for the `encryption` property. Right now trying to load a key on such a future crypto suite will look up suite parameters off the end of the crypto table, resulting in misbehaviour and/or crashes (or, with debug enabled, trip the assertion in `zio_crypt_key_unwrap`). Instead, lets check the value we got from the dataset, and if we can't handle it, abort early. recv: When receiving a raw stream encrypted with an unknown crypto suite, `zfs recv` would report a generic `invalid backup stream` (EINVAL). While technically correct, its not super helpful, so lets ship a more specific error code and message. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #14577
* spl: Add cmn_err_once() to log a message only on the first callAttila Fülöp2023-03-072-0/+50
| | | | | | | Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes #14567
* Update BLAKE3 for using the new impl handlingTino Reichardt2023-03-021-22/+4
| | | | | | | | | | | This commit changes the BLAKE3 implementation handling and also the calls to it from the ztest command. Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13741
* Add generic implementation handling and SHA2 implTino Reichardt2023-03-024-3/+201
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The skeleton file module/icp/include/generic_impl.c can be used for iterating over different implementations of algorithms. It is used by SHA256, SHA512 and BLAKE3 currently. The Solaris SHA2 implementation got replaced with a version which is based on public domain code of cppcrypto v0.10. These assembly files are taken from current openssl master: - sha256-x86_64.S: x64, SSSE3, AVX, AVX2, SHA-NI (x86_64) - sha512-x86_64.S: x64, AVX, AVX2 (x86_64) - sha256-armv7.S: ARMv7, NEON, ARMv8-CE (arm) - sha512-armv7.S: ARMv7, NEON (arm) - sha256-armv8.S: ARMv7, NEON, ARMv8-CE (aarch64) - sha512-armv8.S: ARMv7, ARMv8-CE (aarch64) - sha256-ppc.S: Generic PPC64 LE/BE (ppc64) - sha512-ppc.S: Generic PPC64 LE/BE (ppc64) - sha256-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64) - sha512-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64) Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13741
* Add SHA2 SIMD feature tests for LinuxTino Reichardt2023-03-026-13/+169
| | | | | | | | | | | | | | | These are added: - zfs_neon_available() for arm and aarch64 - zfs_sha256_available() for arm and aarch64 - zfs_sha512_available() for aarch64 - zfs_shani_available() for x86_64 Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Co-Authored-By: Sebastian Gottschall <[email protected]> Closes #13741
* Add SHA2 SIMD feature tests for FreeBSDTino Reichardt2023-03-027-28/+205
| | | | | | | | | | | | | | | | | These are added: - zfs_neon_available() for arm and aarch64 - zfs_sha256_available() for arm and aarch64 - zfs_sha512_available() for aarch64 - zfs_shani_available() for x86_64 Changes: - simd_powerpc.h: change license from CDDL to BSD Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13741
* Remove old or redundant SHA2 filesTino Reichardt2023-03-024-347/+0
| | | | | | | | | | | | | | | | | We had three sha2.h headers in different places. The FreeBSD version, the Linux version and the generic solaris version. The only assembly used for acceleration was some old x86-64 openssl implementation for sha256 within the icp module. For FreeBSD the whole SHA2 files of FreeBSD were copied into OpenZFS, these files got removed also. Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13741
* System-wide speculative prefetch limit.Alexander Motin2023-03-011-0/+1
| | | | | | | | | | | | | | | With some pathological access patterns it is possible to make ZFS accumulate almost unlimited amount of speculative prefetch ZIOs. Combined with linear ABD allocations in RAIDZ code, it appears to be possible to exhaust system KVA, triggering kernel panic. Address this by introducing a system-wide counter of active prefetch requests and blocking prefetch distance doubling per stream hits if the number of active requests is higher that ~6% of ARC size. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14516
* Fix data race between zil_commit() and zil_suspend()Richard Yao2023-03-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | openzfsonwindows/openzfs#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_suspend_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (#11097), so we can safely disregard that issue for now. Reported-by: Arun KV <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14514
* Linux: Assert mutex is held in mutex_exit()Richard Yao2023-02-281-0/+1
| | | | | | | | | | | | | | A spurious mutex_exit() in a development branch caused weird issues until I identified it. An assertion prior to mutex_exit() would have caught it. Rather than adding assertions before invocations of mutex_exit() in the code, let us simply add an assertion to mutex_exit(). It is cheap and will likely improve developer productivity. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Richard Yao <[email protected]> Sponsored-By: Wasabi Technology, Inc. Closes #14541
* Revert zfeature_active() to staticGeorge Amanakis2023-02-281-1/+0
| | | | | | | | | Commit 34ce4c4 made zfeature_active() non-static. This is not required. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #14546
* Skip memory allocation when compressing holesRichard Yao2023-02-271-1/+1
| | | | | | | | | | | Hole detection in the zio compression code allows us to opportunistically skip compression on holes. We can go a step further by not doing memory allocations on holes either. Reviewed-by: Brian Atkinson <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Richard Yao <[email protected]> Sponsored-by: Wasabi Technology, Inc. Closes #14500
* Use .section .rodata instead of .rodata on FreeBSDDimitry Andric2023-02-241-1/+1
| | | | | | | | | | | | | | | In commit 0a5b942d4 the FreeBSD SECTION_STATIC macro was set to ".rodata". This assembler directive is supported by LLVM (as a convenience alias for ".section .rodata") by not by GNU as. This caused the FreeBSD builds that are done with gcc to fail. Therefore, use ".section .rodata" instead, similar to the other asm_linkage.h headers. Reviewed-by: Mateusz Guzik <[email protected]> Reviewed-by: Attila Fülöp <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Dimitry Andric <[email protected]> Closes #14526
* Fix buffered/direct/mmap I/O raceBrian Behlendorf2023-02-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a page is faulted in for memory mapped I/O the page lock may be dropped before it has been read and marked up to date. If a buffered read encounters such a page in mappedread() it must wait until the page has been updated. Failure to do so will result in a panic on debug builds and incorrect data on production builds. The critical part of this change is in mappedread() where pages which are not up to date are now handled. Additionally, it includes the following simplifications. - zfs_getpage() and zfs_fillpage() could be passed an array of pages. This could be more efficient if it was used but in practice only a single page was ever provided. These interfaces were simplified to acknowledge that. - update_pages() was modified to correctly set the PG_error bit on a page when it cannot be read by dmu_read(). - Setting PG_error and PG_uptodate was moved to zfs_fillpage() from zpl_readpage_common(). This is consistent with the handling in update_pages() and mappedread(). - Minor additional refactoring to comments and variable declarations to improve readability. - Add a test case to exercise concurrent buffered, direct, and mmap IO to the same file. - Reduce the mmap_sync test case default run time. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #13608 Closes #14498
* zdb: zero-pad checksum output follow upBrian Behlendorf2023-02-151-1/+1
| | | | | | | | | | | Apply zero padding for checksums consistently. The SNPRINTF_BLKPTR macro was not updated in commit ac7648179c8 which results in the `cli_root/zdb/zdb_checksum.ksh` test case reliably failing. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Akash B <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #14497