aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Implementation of block cloning for ZFSPawel Jakub Dawidek2023-03-1051-120/+3480
| | | | | | | | | | | | | | | Block Cloning allows to manually clone a file (or a subset of its blocks) into another (or the same) file by just creating additional references to the data blocks without copying the data itself. Those references are kept in the Block Reference Tables (BRTs). The whole design of block cloning is documented in module/zfs/brt.c. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Christian Schwarz <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rich Ercolani <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #13392
* Fix incremental receive silently failing for recursive sends Paul Dagnelie2023-03-105-23/+105
| | | | | | | | | | | | | | The problem occurs because dmu_recv_begin pulls in the payload and next header from the input stream in order to use the contents of the begin record's nvlist. However, the change to do that before the other checks in dmu_recv_begin occur caused a regression where an empty send stream in a recursive send could have its END record consumed by this, which broke the logic of recv_skip. A test is also included to protect against this case in the future. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #12661 Closes #14568
* Workaround for Linux PowerPC GPL-only cpu_has_feature()Low-power2023-03-105-0/+93
| | | | | | | | | | | | | | | | | | | Linux since 4.7 makes interface 'cpu_has_feature' to use jump labels on powerpc if CONFIG_JUMP_LABEL_FEATURE_CHECKS is enabled, in this case however the inline function references GPL-only symbol 'cpu_feature_keys'. ZFS currently uses 'cpu_has_feature' either directly or indirectly from several places; while it is unknown how this issue didn't break ZFS on 64-bit little-endian powerpc, it is known to break ZFS with many Linux versions on both 32-bit and 64-bit big-endian powerpc. Until this issue is fixed in Linux, we have to workaround it by overriding affected inline functions without depending on 'cpu_feature_keys'. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: WHR <[email protected]> Closes #14590
* txg_sync should handle write errors in ZILRichard Yao2023-03-101-1/+0
| | | | | | | | | | | | | The txg_sync thread will see certain buffers in a DR_IN_DMU_SYNC state when ZIL is writing them out. Then it waits until the state changes, but has an assertion to check that they were not DR_NOT_OVERRIDDEN. If the data write failed with an error, ZIL will put it into the DR_NOT_OVERRIDDEN state. It looks like the code will handle that state without an issue, so we can just delete the assertion. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Sponsored-By: Wasabi Technology, Inc. Closes #14283
* Suppress clang static analyzer warning in vdev_stat_update()Richard Yao2023-03-081-0/+6
| | | | | | | | | | | | | | 63652e154643cfe596fe077c13de0e7be34dd863 added unnecessary branches in `vdev_stat_update()` to suppress an ASAN false positive the breaks ztest. This had the downside of causing false positive reports in both Coverity and Clang's static analyzer. vd is never NULL, so we add a preprocessor check to only apply the workaround when compiling with ASAN support. Reported-by: Coverity (CID-1524583) Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* Refactor loop in dump_histogram()Richard Yao2023-03-081-2/+4
| | | | | | | | | | | | | The current loop triggers a complaint that we are using an array offset prior to a range check from cpp/offset-use-before-range-check when we are actually calculating maximum and minimum values. I was about to file a false positive report with CodeQL, but after looking at how the code is structured, I really cannot blame CodeQL for mistaking this for a range check. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* Suppress static analyzer warning in dmu_objset_create_impl_dnstats()Richard Yao2023-03-081-0/+1
| | | | | | | | | | | ae7e7006500ca37c471dd625cd5cbfc590b49885 added an assertion to suppress a complaint from Clang's static analyzer. Unfortunately, it missed another way for Clang to complain about this function. This adds another assertion to handle that. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* Linux: Fix octal detection in define_ddi_strtox()Richard Yao2023-03-081-1/+1
| | | | | | | | | | | | | Clang Tidy reported this as a misc-redundant-expression because writing `8` instead of `'8'` meant that the condition could never be true. The only place where we have a chance of this being a bug would be in nvlist_lookup_nvpair_ei_sep(). I am not sure if we ever pass an octal to that, but if we ever do, it should work properly now instead of failing. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* Linux: Suppress clang static analyzer warning in zfs_remove()Richard Yao2023-03-081-0/+1
| | | | | | | | | | | | | | | | | Clang's static analyzer points out that if we fail to find an extended attribute directory, but somehow find it when calculating delete_now and delete_now is true, we will have a NULL pointer dereference when we try to unlink the extended attribute directory. I am not sure if this is possible, but if it is, I do not see a sane way of handling this other than rolling back the transaction and retrying. For now, let us do an VERIFY_IMPLY(). If this trips, it will stop the transaction from committing, which will prevent an attribute directory leak. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* Linux: Silence static analyzer warning in crypto_create_ctx_template()Richard Yao2023-03-081-1/+1
| | | | | | | | | | | | | | | | | | A CodeChecker report from Clang's CTU analysis indicated that we were assigning uninitialized values in crypto_create_ctx_template() when we call it from zio_crypt_key_init(). This occurs because the ->cm_param and ->cm_param_len fields are uninitialized. Thankfully, the uninitialized values are only used in the skein via KCF_PROV_CREATE_CTX_TEMPLATE() -> skein_create_ctx_template() -> skein_mac_ctx_build() -> skein_get_digest_bitlen(), but that should not be called from here. We fix this to avoid a possible trap should this code change in the future. The FreeBSD version of zio_crypt_key_init() is unaffected. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* Suppress Clang Static Analyzer warning in bpobj_enqueue()Richard Yao2023-03-081-0/+1
| | | | | | | | | | scan-build does not do cross translation unit analysis to realize that `dmu_buf_hold()` will always set `bpo->bpo_cached_dbuf` to a non-NULL pointer, so we add an assertion to make it realize this. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* Suppress Clang Static Analyzer warning in dsl_dir_rename_sync()Richard Yao2023-03-081-0/+2
| | | | | | | | | | | Clang's static analyzer reports that if we try to rename a root dataset in `dsl_dir_rename_sync()`, we will have a NULL pointer passed to strlcpy(). This is impossible because `dsl_dir_rename_check()` will prevent us from doing this. We add an assertion to silence this warning. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* Cleanup: Remove constant comparisons reported by CodeQLRichard Yao2023-03-084-15/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CodeQL's cpp/constant-comparison query from its security-and-extended query set reported 4 instances where we have comparions that always evaluate the same way. In `draid_config_by_type()`, we have an early `if (nparity == 0)` check that returns `EINVAL`, making a later `if (nparity == 0 || nparity > VDEV_DRAID_MAXPARITY)` partially redundant. The later check prints an error message when parity is 0, but the early check does not. This is not useful feedback, so we move the later check to the place where the early check runs to replace the early check. In `perform_thread_merge()`, we return when `num_threads == 0`. After that block, we do `if (num_threads > 0) {`, which will always be true. We remove the `if` statement. In `sa_modify_attrs()`, we have a loop condition that is `k != 2`, but at the end of the loop, we have `if (k == 0 && hdl->sa_spill)` followed by an else that does a break. The result is that k != 2 will never be evaluated when it is false. We drop the comparison. In `zap_leaf_array_read()`, we have a for loop condition that is `i < ZAP_LEAF_ARRAY_BYTES && len > 0`. However, that loop itself is in a loop that is `while (len > 0)` and while the value of len is decremented inside the loop, when `len == 0`, it will return, such that `len > 0` inside the loop condition will always be true. We drop that part of the condition. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* Linux cleanup: zvol_discard() should only call blk_queue_io_stat() onceRichard Yao2023-03-081-2/+1
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* Suppress Clang Static Analyzer warning in dbuf_dnode_findbp()Richard Yao2023-03-081-0/+1
| | | | | | | | | | | | | | | | Clang's static analyzer reports that if a `blkid == DMU_SPILL_BLKID` is passed, then we can have a NULL pointer dereference when either ->dn_have_spill or `DNODE_FLAG_SPILL_BLKPTR` is not set. This should not happen. We add an `ASSERT()` to suppress reports about NULL pointer dereferences. Originally, I wanted to use one or two IMPLY statements on pre-conditions before the call to `dbuf_findbp()`, but Clang's static analyzer did not understand it. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* Suppress Clang Static Analyzer warning in vdev_split()Richard Yao2023-03-081-0/+4
| | | | | | | | | | | | | | | | | | | Clang's static analyzer pointed out that we can have a NULL pointer dereference if we ever attempt to split a vdev that has only 1 child. If that happens, we are left with zero children, but then try to access a non-existent child. Calling vdev_split() on a vdev with only 1 child should be impossible due to how the code is structured. If this ever happens, it would be best to stop execution immediately even in a production environment to allow for the best possible chance of recovery by an expert, so we use `VERIFY3U()` instead of `ASSERT3U()`. Unfortunately, while that defensive assertion will prevent execution from ever reaching the NULL pointer dereference, Clang's static analyzer does not realize that, so we add an `ASSERT()` to inform it of this. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* Suppress Clang Static Analyzer warning about SNPRINTF_BLKPTR()Richard Yao2023-03-081-0/+1
| | | | | | | | | | | Clang's static analyzer pointed out that if we can pass a -1 array index to copyname[copies] if there are no valid DVAs. This is an absurd situation, but it suggests that we are missing an assertion, so we add it. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* Suppress Clang Static Analyzer false positive in the AVL tree code.Richard Yao2023-03-081-1/+5
| | | | | | | | | | This has been filed as llvm/llvm-project#60694. Switching from a copy through a C pointer dereference to an explicit memcpy() is a workaround that prevents a false positive. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* Suppress Clang Static Analyzer defect report in abd_get_size()Richard Yao2023-03-081-0/+2
| | | | | | | | | | | Clang's static analyzer reports a possible NULL pointer dereference in abd_get_size() when called from vdev_draid_map_alloc_write() called from vdev_draid_map_alloc_row() and vdc->vdc_nparity == 0. This should be impossible, so we add an assertion to silence the defect report. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* Fix TOCTOU race in zpool_do_labelclear()Richard Yao2023-03-081-14/+20
| | | | | | | | | | | | | | | | | | Coverity reported a TOCTOU race in `zpool_do_labelclear()`. This is not believed to be a real security issue, but fixing it reduces the number of syscalls we do and will prevent other static analyzers from complaining about this. The code is expected to be equivalent. However, under rare circumstances, such as ELOOP, ENAMETOOLONG, ENOMEM, ENOTDIR and EOVERFLOW, we will display the error message that we currently display for the `open()` syscall rather than the one that we currently display for the `stat()` syscall. This is considered to be an improvement. Reported-by: Coverity (CID-1524188) Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
* More adaptive ARC evictionAlexander Motin2023-03-0810-792/+485
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traditionally ARC adaptation was limited to MRU/MFU distribution. But for years people with metadata-centric workload demanded mechanisms to also manage data/metadata distribution, that in original ZFS was just a FIFO. As result ZFS effectively got separate states for data and metadata, minimum and maximum metadata limits etc, but it all required manual tuning, was not adaptive and in its heart remained a bad FIFO. This change removes most of existing eviction logic, rewriting it from scratch. This makes MRU/MFU adaptation individual for data and meta- data, same as the distribution between data and metadata themselves. Since most of required states separation was already done, it only required to make arcs_size state field specific per data/metadata. The adaptation logic is still based on previous concept of ghost hits, just now it balances ARC capacity between 4 states: MRU data, MRU metadata, MFU data and MFU metadata. To simplify arc_c changes instead of arc_p measured in bytes, this code uses 3 variable arc_meta, arc_pd and arc_pm, representing ARC balance between metadata and data, MRU and MFU for data, and MRU and MFU for metadata respectively as 32-bit fixed point fractions. Since we care about the math result only when need to evict, this moves all the logic from arc_adapt() to arc_evict(), that reduces per-block overhead, since per-block operations are limited to stats collection, now moved from arc_adapt() to arc_access() and using cheaper wmsums. This also allows to remove ugly ARC_HDR_DO_ADAPT flag from many places. This change also removes number of metadata specific tunables, part of which were actually not functioning correctly, since not all metadata are equal and some (like L2ARC headers) are not really evictable. Instead it introduced single opaque knob zfs_arc_meta_balance, tuning ARC's reaction on ghost hits, allowing administrator give more or less preference to metadata without setting strict limits. Some of old code parts like arc_evict_meta() are just removed, because since introduction of ABD ARC they really make no sense: only headers referenced by small number of buffers are not evictable, and they are really not evictable no matter what this code do. Instead just call arc_prune_async() if too much metadata appear not evictable. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Allan Jude <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14359
* ICP: AES-GCM: Unify gcm_init_ctx() and gmac_init_ctx()Attila Fülöp2023-03-081-104/+103
| | | | | | | | | | | | | | | | | gmac_init_ctx() duplicates most of the code in gcm_int_ctx() while it just needs to set its own IV length and AAD tag length. Introduce gcm_init_ctx_impl() which handles the GCM and GMAC differences while reusing the duplicated code. While here, fix a flaw where the AVX implementation would accept a context using a byte swapped key schedule which it could not handle. Also constify the IV and AAD pointers passed to gcm_init{,_avx}(). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes #14529
* Do not hold spa_config in ZIL while blocked on IORichard Yao2023-03-071-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise, we can get a deadlock that looks like this: 1. fsync() grabs spa_config_enter(zilog->zl_spa, SCL_STATE, lwb, RW_READER) as part of zil_lwb_write_issue() . It then blocks on the txg_sync when a flush fails from a drive power cycling. 2. The txg_sync then blocks on the pool suspending due to the loss of too many disks. 3. zpool clear then blocks on spa_config_enter(spa, SCL_STATE | SCL_L2ARC | SCL_ZIO, spa, RW_WRITER) because it is a writer. The disks cannot be brought online due to fsync() holding that lock and the user gets upset since fsync() is uninterruptibly blocked inside the kernel. We need to grab the lock for vdev_lookup_top(), but we do not need to hold it while there is outstanding IO. This fixes a regression introduced by 1ce23dcaff6c3d777cb0d9a4a2cf02b43f777d78. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Sponsored-By: Wasabi Technology, Inc. Closes #14519
* Fix build for Linux/powerpc without CONFIG_ALTIVEC or CONFIG_VSXLow-power2023-03-071-8/+22
| | | | | | | | | This fixes building ZFS for Linux 4.7+ powerpc* architecture, where Linux was configured without CONFIG_ALTIVEC or CONFIG_VSX. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: WHR <[email protected]> Closes #14591
* Better handling for future crypto parametersRob N2023-03-076-3/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The intent is that this is like ENOTSUP, but specifically for when something can't be done because we have no support for the requested crypto parameters; eg unlocking a dataset or receiving a stream encrypted with a suite we don't support. Its not intended to be recoverable without upgrading ZFS itself. If the request could be made to work by enabling a feature or modifying some other configuration item, then some other code should be used. load-key: In the future we might have more crypto suites (ie new values for the `encryption` property. Right now trying to load a key on such a future crypto suite will look up suite parameters off the end of the crypto table, resulting in misbehaviour and/or crashes (or, with debug enabled, trip the assertion in `zio_crypt_key_unwrap`). Instead, lets check the value we got from the dataset, and if we can't handle it, abort early. recv: When receiving a raw stream encrypted with an unknown crypto suite, `zfs recv` would report a generic `invalid backup stream` (EINVAL). While technically correct, its not super helpful, so lets ship a more specific error code and message. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #14577
* Fix a typo in ac2038aGeorge Amanakis2023-03-071-1/+1
| | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #14585 Closes #14592
* [FreeBSD] fix false assert in cache_vop_rmdir when replaying ZILAndriy Gapon2023-03-071-1/+2
| | | | | | | | | | | | | | The assert is enabled when DEBUG_VFS_LOCKS kernel option is set. The exact panic is: panic: condition seqc_in_modify(_vp->v_seqc) not met It happens because seqc protocol is not followed for ZIL replay. But we actually do not need to make any namecache calls at that stage, because the namecache use is not enabled until after the replay is completed. Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Andriy Gapon <[email protected]> Closes #14566
* spl: Add cmn_err_once() to log a message only on the first callAttila Fülöp2023-03-073-0/+84
| | | | | | | Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes #14567
* initramfs: fix zpool get argument orderq662023-03-061-3/+3
| | | | | | | | | | | | | | | When using the zfs initramfs scripts on my system, I get various errors at initramfs stage, such as: cannot open '-o': name must begin with a letter My zfs binaries are compiled with musl libc, which may be why this happens. In any case, fix the argument order to make the zpool binary happy, and to match its --help output. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Daniel Kolesa <[email protected]> Closes #14572
* Fix detection of IBM Power8 machines (ISA 2.07)Tino Reichardt2023-03-062-6/+6
| | | | | | | | | | | | | | | | An IBM POWER7 system with Power ISA 2.06 tried to execute zfs_sha256_power8() - which should only be run on ISA 2.07 machines. The detection is implemented via the zfs_isa207_available() call, but this check was not used. This pull request will fix this. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Signed-off-by: Low-power <[email protected]> Closes #14576
* [FreeBSD] zfs_znode_alloc: lock the vnode earlierAndriy Gapon2023-03-061-6/+7
| | | | | | | | | | | | | | | | This is needed because of a possible error path where zfs_vnode_forget() is called. That function calls vgone() and vput(), the former requires the vnode to be exclusively locked and the latter expects it to be locked. It should be safe to lock the vnode as early as possible because it is not yet visible, so there is no interaction with other locks. While here, remove a tautological assignment to 'vp'. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Andriy Gapon <[email protected]> Closes #14565
* Optimize the is_l2cacheable functionsGeorge Amanakis2023-03-062-44/+44
| | | | | | | | | | by placing the most common use case (no special vdevs) first and avoid allocating new variables. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #14494 Closes #14563
* Fix memory leak in ztestRichard Yao2023-03-061-1/+4
| | | | | | | | | This is tripping LeakSanitizer, which causes zloop test failures on pull requests. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14583
* Add missing increment to dsl_deadlist_move_bpobj()Richard Yao2023-03-061-1/+1
| | | | | | | | | | | | | | | | | | dc5c8006f684b1df3f2d4b6b8c121447d2db0017 was recently merged to prefetch up to 128 deadlists. Unfortunately, a loop was missing an increment, such that it will prefetch all deadlists. The performance properties of that patch probably should be re-evaluated. This was caught by CodeQL's cpp/constant-comparison check in an experimental branch where I am testing the security-and-extended queries. It complained about the `i < 128` part of the loop condition always evaluating to the same thing. The standard CodeQL configuration we use missed this because it does not include that check. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14573
* SHA2Init() should use signed assertions when checking an enumRichard Yao2023-03-061-2/+2
| | | | | | | | | | | | | | | The recent 4c5fec01a48acc184614ab8735e6954961990235 commit caused Coverity to report that ASSERT3U(algotype, >=, SHA256_MECH_INFO_TYPE); is always true. That is because the signed algotype and signed SHA256_MECH_INFO_TYPE values were cast to unsigned types. To fix this, we switch the assertions to use ASSERT3S(), which retains the signedness of the original values for the comparison. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Richard Yao <[email protected]> Reported-by: Coverity (CID-1535300) Closes #14573
* Restore ASMABI and other Unify workJorgen Lundman2023-03-065-37/+56
| | | | | | | | | | | Make sure all SHA2 transform function has wrappers For ASMABI to work, it is required the calling convention is consistent. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Joergen Lundman <[email protected]> Closes #14569
* Use SECTION_STATIC macro for sha2 x86_64 assemblyTino Reichardt2023-03-022-2/+2
| | | | | | | | | | - instead of ".section .rodata" we should use SECTION_STATIC Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13741
* Update BLAKE3 for using the new impl handlingTino Reichardt2023-03-028-498/+261
| | | | | | | | | | | This commit changes the BLAKE3 implementation handling and also the calls to it from the ztest command. Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13741
* Add generic implementation handling and SHA2 implTino Reichardt2023-03-0230-97/+27987
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The skeleton file module/icp/include/generic_impl.c can be used for iterating over different implementations of algorithms. It is used by SHA256, SHA512 and BLAKE3 currently. The Solaris SHA2 implementation got replaced with a version which is based on public domain code of cppcrypto v0.10. These assembly files are taken from current openssl master: - sha256-x86_64.S: x64, SSSE3, AVX, AVX2, SHA-NI (x86_64) - sha512-x86_64.S: x64, AVX, AVX2 (x86_64) - sha256-armv7.S: ARMv7, NEON, ARMv8-CE (arm) - sha512-armv7.S: ARMv7, NEON (arm) - sha256-armv8.S: ARMv7, NEON, ARMv8-CE (aarch64) - sha512-armv8.S: ARMv7, ARMv8-CE (aarch64) - sha256-ppc.S: Generic PPC64 LE/BE (ppc64) - sha512-ppc.S: Generic PPC64 LE/BE (ppc64) - sha256-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64) - sha512-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64) Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13741
* Add SHA2 SIMD feature tests for libsplTino Reichardt2023-03-022-24/+99
| | | | | | | | | | | | | | | | These are added via HWCAP interface: - zfs_neon_available() for arm and aarch64 - zfs_sha256_available() for arm and aarch64 - zfs_sha512_available() for aarch64 This one via cpuid() call: - zfs_shani_available() for x86_64 Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13741
* Add SHA2 SIMD feature tests for LinuxTino Reichardt2023-03-026-13/+169
| | | | | | | | | | | | | | | These are added: - zfs_neon_available() for arm and aarch64 - zfs_sha256_available() for arm and aarch64 - zfs_sha512_available() for aarch64 - zfs_shani_available() for x86_64 Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Co-Authored-By: Sebastian Gottschall <[email protected]> Closes #13741
* Add SHA2 SIMD feature tests for FreeBSDTino Reichardt2023-03-027-28/+205
| | | | | | | | | | | | | | | | | These are added: - zfs_neon_available() for arm and aarch64 - zfs_sha256_available() for arm and aarch64 - zfs_sha512_available() for aarch64 - zfs_shani_available() for x86_64 Changes: - simd_powerpc.h: change license from CDDL to BSD Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13741
* Add ARM architecture to OpenZFS buildsystemTino Reichardt2023-03-021-1/+5
| | | | | | | | Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13741
* Remove old or redundant SHA2 filesTino Reichardt2023-03-0218-7312/+3
| | | | | | | | | | | | | | | | | We had three sha2.h headers in different places. The FreeBSD version, the Linux version and the generic solaris version. The only assembly used for acceleration was some old x86-64 openssl implementation for sha256 within the icp module. For FreeBSD the whole SHA2 files of FreeBSD were copied into OpenZFS, these files got removed also. Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13741
* zdb: add decryption supportRob N2023-03-027-15/+262
| | | | | | | | | | | | | | The approach is straightforward: for dataset ops, if a key was offered, find the encryption root and the various encryption parameters, derive a wrapping key if necessary, and then unlock the encryption root. After that all the regular dataset ops will return unencrypted data, and that's kinda the whole thing. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #11551 Closes #12707 Closes #14503
* System-wide speculative prefetch limit.Alexander Motin2023-03-012-5/+25
| | | | | | | | | | | | | | | With some pathological access patterns it is possible to make ZFS accumulate almost unlimited amount of speculative prefetch ZIOs. Combined with linear ABD allocations in RAIDZ code, it appears to be possible to exhaust system KVA, triggering kernel panic. Address this by introducing a system-wide counter of active prefetch requests and blocking prefetch distance doubling per stream hits if the number of active requests is higher that ~6% of ARC size. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14516
* Accommodate debug-kernel stack frame sizeBrian Behlendorf2023-03-011-2/+2
| | | | | | | | | | | | | | The blk_queue_discard() and blk_queue_sector_erase() functions slightly exceed the allowed 4096 maximum stack frame size when building with the RedHat debug kernel which causes their configure checks to fail. Add an exception for these two tests so the interfaces are correctly detected. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #14540
* Fix data race between zil_commit() and zil_suspend()Richard Yao2023-03-012-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | openzfsonwindows/openzfs#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_suspend_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (#11097), so we can safely disregard that issue for now. Reported-by: Arun KV <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14514
* Remove bad kmem_free() oversight from previous zfsdev_state_list patchRichard Yao2023-03-011-1/+0
| | | | | | | | | | | | I forgot to remove the corresponding kmem_free() from zfs_kmod_fini() in 9a14ce43c3d6a9939804215bbbe66de5115ace42. Clang's static analyzer did not complain, but the Coverity scan that was run after the patch was merged did. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Richard Yao <[email protected]> Reported-by: Coverity (CID-1535275) Closes #14556
* Linux: zfs_fillpage() should handle partial pages from end of fileRichard Yao2023-03-011-1/+5
| | | | | | | | | | | | | | | | | | | | | | | After 89cd2197b94986d315b9b1be707b645baf59af4f was merged, Clang's static analyzer began complaining about a dead assignment in `zfs_fillpage()`. Upon inspection, I noticed that the dead assignment was because we are not using the calculated io_len that we should use to avoid asking the DMU to read past the end of a file. This should result in `dmu_buf_hold_array_by_dnode()` calling `zfs_panic_recover()`. This issue predates 89cd2197b94986d315b9b1be707b645baf59af4f, but its simplification of zfs_fillpage() eliminated the only use of the assignment to io_len, which made Clang's static analyzer complain about the issue. Also, as a precaution, we add an assertion that io_offset < i_size. If this ever fails, bad things will happen. Otherwise, we are blindly trusting the kernel not to give us invalid offsets. We continue to blindly trust it on non-debug kernels. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14534