aboutsummaryrefslogtreecommitdiffstats
path: root/module
Commit message (Collapse)AuthorAgeFilesLines
* Fix GCC 12 compilation errorsszubersk2022-11-303-2/+20
| | | | | | | | | Squelch false positives reported by GCC 12 with UBSan. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: szubersk <[email protected]> Closes #14150
* zstd: Refactor prefetching for the decoding loopYann Collet2022-11-291-22/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | Following facebook/zstd#2545, I noticed that one field in `seq_t` is optional, and only used in combination with prefetching. (This may have contributed to static analyzer failure to detect correct initialization). I then wondered if it would be possible to rewrite the code so that this optional part is handled directly by the prefetching code rather than delegated as an option into `ZSTD_decodeSequence()`. This resulted into this refactoring exercise where the prefetching responsibility is better isolated into its own function and `ZSTD_decodeSequence()` is streamlined to contain strictly Sequence decoding operations. Incidently, due to better code locality, it reduces the need to send information around, leading to simplified interface, and smaller state structures. Port of facebook/zstd@f5434663ea2a79b287ab4cd299179342f64a23a7 Reported-by: Coverity (CID 1462271) Reviewed-by: Damian Szuberski <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Ported-by: Richard Yao <[email protected]> Closes #14212
* zstd: [superblock] Add defensive assert and bounds checkNick Terrell2022-11-291-6/+10
| | | | | | | | | | | | | | | | The bound check condition should always be met because we selected `set_basic` as our encoding type. But that code is very far away, so assert it is true so if it is ever false we can catch it, and add a bounds check. Port of facebook/zstd@1047097dadd5f8dfd47e25af0738eeb4bd82f6ec Reported-by: Coverity (CID 1524446) Reviewed-by: Damian Szuberski <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Ported-by: Richard Yao <[email protected]> Closes #14212
* Fix NULL pointer dereference in dbuf_prefetch_indirect_done()Richard Yao2022-11-291-2/+2
| | | | | | | | | | | | | | When ZFS is built with assertions, a prefetch is done on a redacted blkptr and `dpa->dpa_dnode` is NULL, we will have a NULL pointer dereference in `dbuf_prefetch_indirect_done()`. Both Coverity and Clang's Static Analyzer caught this. Reported-by: Coverity (CID 1524671) Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14210
* Cleanup: Delete dead code from send_merge_thread()Richard Yao2022-11-291-2/+0
| | | | | | | | | | | | | | | | | | range is always deferenced before it reaches this check, such that the kmem_zalloc() call is never executed. A previously version of this had erronously also pruned the `range->eos_marker = B_TRUE` line, but it must be set whenever we encounter an error or are cancelled early. Coverity incorrectly complained about a potential NULL pointer dereference because of this. Reported-by: Coverity (CID 1524550) Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14210
* Fix the last two CFI callback prototype mismatchesAlexander2022-11-294-5/+5
| | | | | | | | | | | | | | | | | There was the series from me a year ago which fixed most of the callback vs implementation prototype mismatches. It was based on running the CFI-enabled kernel (in permissive mode -- warning instead of panic) and performing a full ZTS cycle, and then fixing all of the problems caught by CFI. Now, Clang 16-dev has new warning flag, -Wcast-function-type-strict, which detect such mismatches at compile-time. It allows to find the remaining issues missed by the first series. There are only two of them left: one for the secpolicy_vnode_setattr() callback and one for taskq_dispatch(). The fix is easy, since they are not used anywhere else. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Closes #14207
* Lua: Fix bad bitshift in lua_strx2number()Richard Yao2022-11-291-1/+1
| | | | | | | | | | | | | | | | The port of lua to OpenZFS modified lua to use int64_t for numbers instead of double. As part of this, a function for calculating exponentiation was replaced with a bit shift. Unfortunately, it did not handle negative values. Also, it only supported exponents numbers with 7 digits before before overflow. This supports exponents up to 15 digits before overflow. Clang's static analyzer reported this as "Result of operation is garbage or undefined" because the exponent was negative. Reviewed-by: Damian Szuberski <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14204
* Remove few pointer dereferences in dbuf_read()Alexander Motin2022-11-292-13/+4
| | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #14199
* FreeBSD: stop using buffer cache-only routines on syncMateusz Guzik2022-11-292-5/+0
| | | | | | | | | Both vop_fsync and vfs_stdsync are effectively just costly no-ops as they only act on ->v_bufobj.bo_dirty et al, which are unused by zfs. Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #14157
* Switch dnode stats to wmsumsAlexander Motin2022-11-291-0/+127
| | | | | | | | | | I've noticed that some of those counters are used in hot paths like dnode_hold_impl(), and results of this change is visible in profiler. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #14198
* Micro-optimize zrl_remove()Alexander Motin2022-11-291-4/+4
| | | | | | | | | atomic_dec_32() should be a bit lighter than atomic_dec_32_nv(). Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #14200
* zed: unclean disk attachment faults the vdevAmeer Hamza2022-11-291-2/+2
| | | | | | | | | | | | | | | If the attached disk already contains a vdev GUID, it means the disk is not clean. In such a scenario, the physical path would be a match that makes the disk faulted when trying to online it. So, we would only want to proceed if either GUID matches with the last attached disk or the disk is in a clean state. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #14181
* Convert some sprintf() calls to kmem_scnprintf()Richard Yao2022-11-285-11/+17
| | | | | | | | | | | | | | | | | | | These `sprintf()` calls are used repeatedly to write to a buffer. There is no protection against overflow other than reviewers explicitly checking to see if the buffers are big enough. However, such issues are easily missed during review and when they are missed, we would rather stop printing rather than have a buffer overflow, so we convert these functions to use `kmem_scnprintf()`. The Linux kernel provides an entire page for module parameters, so we are safe to write up to PAGE_SIZE. Removing `sprintf()` from these functions removes the last instances of `sprintf()` usage in our platform-independent kernel code. This improves XNU kernel compatibility because the XNU kernel does not support (removed support for?) `sprintf()`. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14209
* Avoid a null pointer dereference in zfs_mount() on FreeBSDAllan Jude2022-11-281-1/+2
| | | | | | | | | | | | | | When mounting the root filesystem, vfs_t->mnt_vnodecovered is null This will cause zfsctl_is_node() to dereference a null pointer when mounting, or updating the mount flags, on the root filesystem, both of which happen during the boot process. Reported-by: Martin Matuska <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #14218
* Remove atomics from zh_refcountAlexander Motin2022-11-281-8/+6
| | | | | | | | | | It is protected by z_hold_locks, so we do not need more serialization, simple integer math should be fine. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #14196
* zed: post a udev change event from spa_vdev_attach()Ameer Hamza2022-11-181-1/+1
| | | | | | | | | | | | | | | In order for zed to process the removal event correctly, udev change event needs to be posted to sync the blkid information. spa_create() and spa_config_update() posts the event already through spa_write_cachefile(). Doing the same for spa_vdev_attach() that handles the case for vdev attachment and replacement. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #14172
* Fix setting the large_block feature after receiving a snapshotGeorge Amanakis2022-11-181-0/+15
| | | | | | | | | | | | | We are not allowed to dirty a filesystem when done receiving a snapshot. In this case the flag SPA_FEATURE_LARGE_BLOCKS will not be set on that filesystem since the filesystem is not on dp_dirty_datasets, and a subsequent encrypted raw send will fail. Fix this by checking in dsl_dataset_snapshot_sync_impl() if the feature needs to be activated and do so if appropriate. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #13699 Closes #13782
* Handle and detect #13709's unlock regression (#14161)Rich Ercolani2022-11-151-2/+16
| | | | | | | | | | | | | | | | In #13709, as in #11294 before it, it turns out that 63a26454 still had the same failure mode as when it was first landed as d1d47691, and fails to unlock certain datasets that formerly worked. Rather than reverting it again, let's add handling to just throw out the accounting metadata that failed to unlock when that happens, as well as a test with a pre-broken pool image to ensure that we never get bitten by this again. Fixes: #13709 Signed-off-by: Rich Ercolani <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Fix arc_p aggressive increaseshodanshok2022-11-111-2/+3
| | | | | | | | | | | | | | | | | | | The original ARC paper called for an initial 50/50 MRU/MFU split and this is accounted in various places where arc_p = arc_c >> 1, with further adjustment based on ghost lists size/hit. However, in current code both arc_adapt() and arc_get_data_impl() aggressively grow arc_p until arc_c is reached, causing unneeded pressure on MFU and greatly reducing its scan-resistance until ghost list adjustments kick in. This patch restores the original behavior of initially having arc_p as 1/2 of total ARC, without preventing MRU to use up to 100% total ARC when MFU is empty. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Gionatan Danti <[email protected]> Closes #14137 Closes #14120
* Cleanup: Suppress Coverity dereference before/after NULL check reportsRichard Yao2022-11-101-1/+1
| | | | | | | | | | | | | | | | f224eddf922a33ca4b86d83148e9e6fa155fc290 began dereferencing a NULL checked pointer in zpl_vap_init(), which made Coverity complain because either the dereference is unsafe or the NULL check is unnecessary. Upon inspection, this pointer is guaranteed to never be NULL because it is from the Linux kernel VFS. The calls into ZFS simply would not make sense if this pointer were NULL, so the NULL check is unnecessary. Reported-by: Coverity (CID 1527260) Reported-by: Coverity (CID 1527262) Reviewed-by: Mariusz Zaborski <[email protected]> Reviewed-by: Youzhong Yang <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14170
* Fix potential NULL pointer dereference regressionRichard Yao2022-11-101-1/+1
| | | | | | | | | | | | | | | | 945b407486a0072ec7dd117a0bde2f72d52eb445 neglected to `NULL` check `tx->tx_objset`, which is already done in the function. This upset Coverity, which complained about a "dereference after null check". Upon inspection, it was found that whenever `dmu_tx_create_dd()` is called followed by `dmu_tx_assign()`, such as in `dsl_sync_task_common()`, `tx->tx_objset` will be `NULL`. Reported-by: Coverity (CID 1527261) Reviewed-by: Mariusz Zaborski <[email protected]> Reviewed-by: Youzhong Yang <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14170
* Allow to control failfastMariusz Zaborski2022-11-103-2/+63
| | | | | | | | | | | | | | | | | | | | | Linux defaults to setting "failfast" on BIOs, so that the OS will not retry IOs that fail, and instead report the error to ZFS. In some cases, such as errors reported by the HBA driver, not the device itself, we would wish to retry rather than generating vdev errors in ZFS. This new property allows that. This introduces a per vdev option to disable the failfast option. This also introduces a global module parameter to define the failfast mask value. Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Allan Jude <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Mariusz Zaborski <[email protected]> Sponsored-by: Seagate Technology LLC Submitted-by: Klara, Inc. Closes #14056
* quota: disable quota check for ZVOLMariusz Zaborski2022-11-081-4/+18
| | | | | | | | | | | | | | | | | | | The quota for ZVOLs is set to the size of the volume. When the quota reaches the maximum, there isn't an excellent way to check if the new writers are overwriting the data or if they are inserting a new one. Because of that, when we reach the maximum quota, we wait till txg is flushed. This is causing a significant fluctuation in bandwidth. In the case of ZVOL, the quota is enforced by the volsize, so we can omit it. This commit adds a sysctl thats allow to control if the quota mechanism should be enforced or not. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mariusz Zaborski <[email protected]> Sponsored-by: Zededa Inc. Sponsored-by: Klara Inc. Closes #13838
* Optionally skip zil_close during zvol_create_minor_implAlan Somers2022-11-083-11/+20
| | | | | | | | | | | | | | | If there were no zil entries to replay, skip zil_close. zil_close waits for a transaction to sync. That can take several seconds, for example during pool import of a resilvering pool. Skipping zil_close can cut the time for "zpool import" from 2 hours to 45 seconds on a resilvering pool with a thousand zvols. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Sponsored-by: Axcient Closes #13999 Closes #14015
* Support idmapped mount in user namespaceyouzhongyang2022-11-0811-56/+146
| | | | | | | | | | | | | | | | | | Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". Update the OpenZFS accordingly to improve the idmapped mount support. This pull request contains the following changes: - xattr setter functions are fixed to take mnt_ns argument. Without this, cp -p would fail for an idmapped mount in a user namespace. - idmap_util is enhanced/fixed for its use in a user ns context. - One test case added to test idmapped mount in a user ns. Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Youzhong Yang <[email protected]> Closes #14097
* dsl_prop_known_index(): check for invalid propDamian Szuberski2022-11-082-1/+28
| | | | | | | | Resolve UBSAN array-index-out-of-bounds error in zprop_desc_t. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: szubersk <[email protected]> Closes #14142 Closes #14147
* freebsd: add ifdefs around legacy ioctl supportBrooks Davis2022-11-072-2/+16
| | | | | | | | | | | | | | | | Require that ZFS_LEGACY_SUPPORT be defined for legacy ioctl support to be built. For now, define it in zfs_ioctl_compat.h so support is always built. This will allow systems that need never support pre-openzfs tools a mechanism to remove support at build time. This code should be removed once the need for tool compatability is gone. No functional change at this time. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Brooks Davis <[email protected]> Closes #14127
* freebsd: remove no-op vn_renamepath()Brooks Davis2022-11-071-5/+0
| | | | | | | | | | | | vn_renamepath() is a Solaris-ism that was defined away in the FreeBSD port. Now that the only use is in the FreeBSD zfs_vnops_os.c, drop it entierly. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Brooks Davis <[email protected]> Closes #14127
* zed: Prevent special vdev to be replaced by hot spareAmeer Hamza2022-11-041-2/+4
| | | | | | | | | | | | | Special vdevs should not be replaced by a hot spare. Log vdevs already support this, extending the functionality for special vdevs. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #14129
* icp: fix all !ENDBR objtool warnings in x86 Asm codeAlexander Lobakin2022-11-0410-50/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, only Blake3 x86 Asm code has signs of being ENDBR-aware. At least, under certain conditions it includes some header file and uses some custom macro from there. Linux has its own NOENDBR since several releases ago. It's defined in the same <asm/linkage.h>, so currently <sys/asm_linkage.h> already is provided with it. Let's unify those two into one %ENDBR macro. At first, check if it's present already. If so -- use Linux kernel version. Otherwise, try to go that second way and use %_CET_ENDBR from <cet.h> if available. If no, fall back to just empty definition. This fixes a couple more 'relocations to !ENDBR' across the module. And now that we always have the latest/actual ENDBR definition, use it at the entrance of the few corresponding functions that objtool still complains about. This matches the way how it's used in the upstream x86 core Asm code. Reviewed-by: Attila Fülöp <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Closes #14035
* icp: fix rodata being marked as text in x86 Asm codeAlexander Lobakin2022-11-043-5/+6
| | | | | | | | | | | | | | | | | | | | objtool properly complains that it can't decode some of the instructions from ICP x86 Asm code. As mentioned in the Makefile, where those object files were excluded from objtool check (but they can still be visible under IBT and LTO), those are just constants, not code. In that case, they must be placed in .rodata, so they won't be marked as "allocatable, executable" (ax) in EFL headers and this effectively prevents objtool from trying to decode this data. That reveals a whole bunch of other issues in ICP Asm code, as previously objtool was bailing out after that warning message. Reviewed-by: Attila Fülöp <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Closes #14035
* icp: properly fix all RETs in x86_64 Asm codeAlexander Lobakin2022-11-044-18/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 43569ee37420 ("Fix objtool: missing int3 after ret warning") addressed replacing all `ret`s in x86 asm code to a macro in the Linux kernel in order to enable SLS. That was done by copying the upstream macro definitions and fixed objtool complaints. Since then, several more mitigations were introduced, including Rethunk. It requires to have a jump to one of the thunks in order to work, so the RET macro was changed again. And, as ZFS code didn't use the mainline defition, but copied it, this is currently missing. Objtool reminds about it time to time (Clang 16, CONFIG_RETHUNK=y): fs/zfs/lua/zlua.o: warning: objtool: setjmp+0x25: 'naked' return found in RETHUNK build fs/zfs/lua/zlua.o: warning: objtool: longjmp+0x27: 'naked' return found in RETHUNK build Do it the following way: * if we're building under Linux, unconditionally include <linux/linkage.h> in the related files. It is available in x86 sources since even pre-2.6 times, so doesn't need any conftests; * then, if RET macro is available, it will be used directly, so that we will always have the version actual to the kernel we build; * if there's no such macro, we define it as a simple `ret`, as it was on pre-SLS times. This ensures we always have the up-to-date definition with no need to update it manually, and at the same time is safe for the whole variety of kernels ZFS module supports. Then, there's a couple more "naked" rets left in the code, they're just defined as: .byte 0xf3,0xc3 In fact, this is just: rep ret `rep ret` instead of just `ret` seems to mitigate performance issues on some old AMD processors and most likely makes no sense as of today. Anyways, address those rets, so that they will be protected with Rethunk and SLS. Include <sys/asm_linkage.h> here which now always has RET definition and replace those constructs with just RET. This wipes the last couple of places with unpatched rets objtool's been complaining about. Reviewed-by: Attila Fülöp <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Closes #14035
* FreeBSD: Fix out of bounds read in zfs_ioctl_ozfs_to_legacy()Richard Yao2022-11-041-1/+1
| | | | | | | | | | | | | | | | There is an off by 1 error in the check. Fortunately, this function does not appear to be used in kernel space, despite being compiled as part of the kernel module. However, it is used in userspace. Callers of lzc_ioctl_fd() likely will crash if they attempt to use the unimplemented request number. This was reported by FreeBSD's coverity scan. Reported-by: Coverity (CID 1432059) Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Damian Szuberski <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14135
* Expose zfs_vdev_open_timeout_ms as a tunableSerapheim Dimitropoulos2022-11-031-1/+4
| | | | | | | | | | | | | | | | | | | | | | Some of our customers have been occasionally hitting zfs import failures in Linux because udevd doesn't create the by-id symbolic links in time for zpool import to use them. The main issue is that the systemd-udev-settle.service that zfs-import-cache.service and other services depend on is racy. There is also an openzfs issue filed (see https://github.com/openzfs/zfs/issues/10891) outlining the problem and potential solutions. With the proper solutions being significant in terms of complexity and the priority of the issue being low for the time being, this patch exposes `zfs_vdev_open_timeout_ms` as a tunable so people that are experiencing this issue often can increase it as a workaround. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Don Brady <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #14133
* Allow mounting snapshots in .zfs/snapshot as a regular userAllan Jude2022-11-032-16/+274
| | | | | | | | | | | | | | | | | Rather than doing a terrible credential swapping hack, we just check that the thing being mounted is a snapshot, and the mountpoint is the zfsctl directory, then we allow it. If the mount attempt is from inside a jail, on an unjailed dataset (mounted from the host, not by the jail), the ability to mount the snapshot is controlled by a new per-jail parameter: zfs.mount_snapshot Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Ryan Moeller <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Signed-off-by: Allan Jude <[email protected]> Sponsored-by: Modirum MDPay Sponsored-by: Klara Inc. Closes #13758
* Cleanup: Remove branches that always evaluate the same wayRichard Yao2022-11-032-2/+1
| | | | | | | | | | Coverity reported that the ASSERT in taskq_create() is always true and the `*offp > MAXOFFSET_T` check in zfs_file_seek() is always false. We delete them as cleanup. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14130
* Remove an unused variableBrooks Davis2022-11-031-2/+0
| | | | | | | | | | Clang-16 detects this set-but-unused variable which is assigned and incremented, but never referenced otherwise. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brooks Davis <[email protected]> Closes #14125
* Address warnings about possible division by zero from clangsaRichard Yao2022-11-031-1/+2
| | | | | | | | | | | | | | | | * The complaint in ztest_replay_write() is only possible if something went horribly wrong. An assertion will silence this and if it goes off, we will know that something is wrong. * The complaint in spa_estimate_metaslabs_to_flush() is not impossible, but seems very unlikely. We resolve this by passing the value from the `MIN()` that does not go to infinity when the variable is zero. There was a third report from Clang's scan-build, but that was a definite false positive and disappeared when checked again through Clang's static analyzer with Z3 refution via CodeChecker. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14124
* Deny receiving into encrypted datasets if the keys are not loadedAttila Fülöp2022-11-031-1/+1
| | | | | | | | | | | | | | | | | | | Commit 68ddc06b611854560fefa377437eb3c9480e084b introduced support for receiving unencrypted datasets as children of encrypted ones but unfortunately got the logic upside down. This resulted in failing to deny receives of incremental sends into encrypted datasets without their keys loaded. If receiving a filesystem, the receive was done into a newly created unencrypted child dataset of the target. In case of volumes the receive made the target volume undeletable since a dataset was created below it, which we obviously can't handle. Incremental streams with embedded blocks are affected as well. We fix the broken logic to properly deny receives in such cases. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes #13598 Closes #14055 Closes #14119
* lua: cast through uintptr_t when return a pointerBrooks Davis2022-11-031-1/+1
| | | | | | | | | | Don't assume size_t can carry pointer provenance and use uintptr_t (identialy on all current platforms) instead. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brooks Davis <[email protected]> Closes #14131
* Use intptr_t when storing an integer in a pointerBrooks Davis2022-11-031-1/+1
| | | | | | | | | | | | Cast the integer type to (u)intptr_t before casting to "void *". In CHERI C/C++ we warn on bare casts from integers to pointers to catch attempts to create pointers our of thin air. We allow the warning to be supressed with a suitable cast through (u)intptr_t. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brooks Davis <[email protected]> Closes #14131
* zfs_onexit_add_cb: make action_handle point to a uintptr_tBrooks Davis2022-11-031-2/+2
| | | | | | | | | | Avoid assuming than a uint64_t can hold a pointer and reduce the number of casts in the process. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brooks Davis <[email protected]> Closes #14131
* acl: use uintptr_t for ace walker cookiesBrooks Davis2022-11-033-10/+10
| | | | | | | | | | Avoid assuming that a pointer can fit in a uint64_t and use uintptr_t instead. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brooks Davis <[email protected]> Closes #14131
* zil: Relax assertion in zil_parseRyan Moeller2022-11-011-5/+11
| | | | | | | | | | Rather than panic debug builds when we fail to parse a whole ZIL, let's instead improve the logging of errors and continue like in a release build. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #14116
* Avoid null pointer dereference in dsl_fs_ss_limit_check()Allan Jude2022-10-291-13/+12
| | | | | | | | | | Check for cr == NULL before dereferencing it in dsl_enforce_ds_ss_limits() to lookup the zone/jail ID. Reported-by: Coverity (CID 1210459) Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #14103
* Introduce kmem_scnprintf()Richard Yao2022-10-296-35/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `snprintf()` is meant to protect against buffer overflows, but operating on the buffer using its return value, possibly by calling it again, can cause a buffer overflow, because it will return how many characters it would have written if it had enough space even when it did not. In a number of places, we repeatedly call snprintf() by successively incrementing a buffer offset and decrementing a buffer length, by its return value. This is a potentially unsafe usage of `snprintf()` whenever the buffer length is reached. CodeQL complained about this. To fix this, we introduce `kmem_scnprintf()`, which will return 0 when the buffer is zero or the number of written characters, minus 1 to exclude the NULL character, when the buffer was too small. In all other cases, it behaves like snprintf(). The name is inspired by the Linux and XNU kernels' `scnprintf()`. The implementation was written before I thought to look at `scnprintf()` and had a good name for it, but it turned out to have identical semantics to the Linux kernel version. That lead to the name, `kmem_scnprintf()`. CodeQL only catches this issue in loops, so repeated use of snprintf() outside of a loop was not caught. As a result, a thorough audit of the codebase was done to examine all instances of `snprintf()` usage for potential problems and a few were caught. Fixes for them are included in this patch. Unfortunately, ZED is one of the places where `snprintf()` is potentially used incorrectly. Since using `kmem_scnprintf()` in it would require changing how it is linked, we modify its usage to make it safe, no matter what buffer length is used. In addition, there was a bug in the use of the return value where the NULL format character was not being written by pwrite(). That has been fixed. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14098
* Fix too few arguments to formatting functionRichard Yao2022-10-291-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | CodeQL reported that when the VERIFY3U condition is false, we do not pass enough arguments to `spl_panic()`. This is because the format string from `snprintf()` was concatenated into the format string for `spl_panic()`, which causes us to have an unexpected format specifier. A CodeQL developer suggested fixing the macro to have a `%s` format string that takes a stringified RIGHT argument, which would fix this. However, upon inspection, the VERIFY3U check was never necessary in the first place, so we remove it in favor of just calling `snprintf()`. Lastly, it is interesting that every other static analyzer run on the codebase did not catch this, including some that made an effort to catch such things. Presumably, all of them relied on header annotations, which we have not yet done on `spl_panic()`. CodeQL apparently is able to track the flow of arguments on their way to annotated functions, which llowed it to catch this when others did not. A future patch that I have in development should annotate `spl_panic()`, so the others will catch this too. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14098
* Revert "Cleanup: Delete dead code from send_merge_thread()"Brian Behlendorf2022-10-281-2/+3
| | | | | | | | | This reverts commit fb823de9f due to a regression. It is in fact possible for the range->eos_marker to be false on error. Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #14042 Closes #14104
* quota: extend quota for datasetMariusz Zaborski2022-10-281-1/+11
| | | | | | | | | | | | | | This patch relax the quota limitation for dataset by around 3%. What this means is that user can write more data then the quota is set to. However thanks to that we can get more stable bandwidth, in case when we are overwriting data in-place, and not consuming any additional space. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Mariusz Zaborski <[email protected]> Sponsored-by: Zededa Inc. Sponsored-by: Klara Inc. Closes #13839
* Fix ARC target collapse when zfs_arc_meta_limit_percent=100shodanshok2022-10-281-1/+1
| | | | | | | | | | | | | | | | | Reclaim metadata when arc_available_memory < 0 even if meta_used is not bigger than arc_meta_limit. As described in https://github.com/openzfs/zfs/issues/14054 if zfs_arc_meta_limit_percent=100 then ARC target can collapse to arc_min due to arc_purge not freeing any metadata. This patch lets arc_prune to do its work when arc_available_memory is negative even if meta_used is not bigger than arc_meta_limit, avoiding ARC target collapse. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Gionatan Danti <[email protected]> Closes #14054 Closes #14093