openzfs/zfs.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	nvpair: Constify string functions	Richard Yao	2023-03-14	15	-84/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	After addressing coverity complaints involving `nvpair_name()`, the compiler started complaining about dropping const. This lead to a rabbit hole where not only `nvpair_name()` needed to be constified, but also `nvpair_value_string()`, `fnvpair_value_string()` and a few other static functions, plus variable pointers throughout the code. The result became a fairly big change, so it has been split out into its own patch. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14612
*	Implementation of block cloning for ZFS	Pawel Jakub Dawidek	2023-03-10	2	-1/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Block Cloning allows to manually clone a file (or a subset of its blocks) into another (or the same) file by just creating additional references to the data blocks without copying the data itself. Those references are kept in the Block Reference Tables (BRTs). The whole design of block cloning is documented in module/zfs/brt.c. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Christian Schwarz <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rich Ercolani <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #13392
*	Refactor loop in dump_histogram()	Richard Yao	2023-03-08	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	The current loop triggers a complaint that we are using an array offset prior to a range check from cpp/offset-use-before-range-check when we are actually calculating maximum and minimum values. I was about to file a false positive report with CodeQL, but after looking at how the code is structured, I really cannot blame CodeQL for mistaking this for a range check. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
*	Cleanup: Remove constant comparisons reported by CodeQL	Richard Yao	2023-03-08	1	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CodeQL's cpp/constant-comparison query from its security-and-extended query set reported 4 instances where we have comparions that always evaluate the same way. In `draid_config_by_type()`, we have an early `if (nparity == 0)` check that returns `EINVAL`, making a later `if (nparity == 0 \|\| nparity > VDEV_DRAID_MAXPARITY)` partially redundant. The later check prints an error message when parity is 0, but the early check does not. This is not useful feedback, so we move the later check to the place where the early check runs to replace the early check. In `perform_thread_merge()`, we return when `num_threads == 0`. After that block, we do `if (num_threads > 0) {`, which will always be true. We remove the `if` statement. In `sa_modify_attrs()`, we have a loop condition that is `k != 2`, but at the end of the loop, we have `if (k == 0 && hdl->sa_spill)` followed by an else that does a break. The result is that k != 2 will never be evaluated when it is false. We drop the comparison. In `zap_leaf_array_read()`, we have a for loop condition that is `i < ZAP_LEAF_ARRAY_BYTES && len > 0`. However, that loop itself is in a loop that is `while (len > 0)` and while the value of len is decremented inside the loop, when `len == 0`, it will return, such that `len > 0` inside the loop condition will always be true. We drop that part of the condition. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
*	Fix TOCTOU race in zpool_do_labelclear()	Richard Yao	2023-03-08	1	-14/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Coverity reported a TOCTOU race in `zpool_do_labelclear()`. This is not believed to be a real security issue, but fixing it reduces the number of syscalls we do and will prevent other static analyzers from complaining about this. The code is expected to be equivalent. However, under rare circumstances, such as ELOOP, ENAMETOOLONG, ENOMEM, ENOTDIR and EOVERFLOW, we will display the error message that we currently display for the `open()` syscall rather than the one that we currently display for the `stat()` syscall. This is considered to be an improvement. Reported-by: Coverity (CID-1524188) Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14575
*	More adaptive ARC eviction	Alexander Motin	2023-03-08	2	-33/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Traditionally ARC adaptation was limited to MRU/MFU distribution. But for years people with metadata-centric workload demanded mechanisms to also manage data/metadata distribution, that in original ZFS was just a FIFO. As result ZFS effectively got separate states for data and metadata, minimum and maximum metadata limits etc, but it all required manual tuning, was not adaptive and in its heart remained a bad FIFO. This change removes most of existing eviction logic, rewriting it from scratch. This makes MRU/MFU adaptation individual for data and meta- data, same as the distribution between data and metadata themselves. Since most of required states separation was already done, it only required to make arcs_size state field specific per data/metadata. The adaptation logic is still based on previous concept of ghost hits, just now it balances ARC capacity between 4 states: MRU data, MRU metadata, MFU data and MFU metadata. To simplify arc_c changes instead of arc_p measured in bytes, this code uses 3 variable arc_meta, arc_pd and arc_pm, representing ARC balance between metadata and data, MRU and MFU for data, and MRU and MFU for metadata respectively as 32-bit fixed point fractions. Since we care about the math result only when need to evict, this moves all the logic from arc_adapt() to arc_evict(), that reduces per-block overhead, since per-block operations are limited to stats collection, now moved from arc_adapt() to arc_access() and using cheaper wmsums. This also allows to remove ugly ARC_HDR_DO_ADAPT flag from many places. This change also removes number of metadata specific tunables, part of which were actually not functioning correctly, since not all metadata are equal and some (like L2ARC headers) are not really evictable. Instead it introduced single opaque knob zfs_arc_meta_balance, tuning ARC's reaction on ghost hits, allowing administrator give more or less preference to metadata without setting strict limits. Some of old code parts like arc_evict_meta() are just removed, because since introduction of ABD ARC they really make no sense: only headers referenced by small number of buffers are not evictable, and they are really not evictable no matter what this code do. Instead just call arc_prune_async() if too much metadata appear not evictable. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Allan Jude <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14359
*	Better handling for future crypto parameters	Rob N	2023-03-07	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The intent is that this is like ENOTSUP, but specifically for when something can't be done because we have no support for the requested crypto parameters; eg unlocking a dataset or receiving a stream encrypted with a suite we don't support. Its not intended to be recoverable without upgrading ZFS itself. If the request could be made to work by enabling a feature or modifying some other configuration item, then some other code should be used. load-key: In the future we might have more crypto suites (ie new values for the `encryption` property. Right now trying to load a key on such a future crypto suite will look up suite parameters off the end of the crypto table, resulting in misbehaviour and/or crashes (or, with debug enabled, trip the assertion in `zio_crypt_key_unwrap`). Instead, lets check the value we got from the dataset, and if we can't handle it, abort early. recv: When receiving a raw stream encrypted with an unknown crypto suite, `zfs recv` would report a generic `invalid backup stream` (EINVAL). While technically correct, its not super helpful, so lets ship a more specific error code and message. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #14577
*	Fix a typo in ac2038a	George Amanakis	2023-03-07	1	-1/+1
\| \| \| \| \| \| \|	Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #14585 Closes #14592
*	Fix memory leak in ztest	Richard Yao	2023-03-06	1	-1/+4
\| \| \| \| \| \| \| \| \|	This is tripping LeakSanitizer, which causes zloop test failures on pull requests. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14583
*	Update BLAKE3 for using the new impl handling	Tino Reichardt	2023-03-02	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \|	This commit changes the BLAKE3 implementation handling and also the calls to it from the ztest command. Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13741
*	zdb: add decryption support	Rob N	2023-03-02	2	-11/+166
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The approach is straightforward: for dataset ops, if a key was offered, find the encryption root and the various encryption parameters, derive a wrapping key if necessary, and then unlock the encryption root. After that all the regular dataset ops will return unencrypted data, and that's kinda the whole thing. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #11551 Closes #12707 Closes #14503
*	Improve error message of zfs redact	Paul Dagnelie	2023-02-21	1	-3/+18
\| \| \| \| \| \| \| \| \| \| \|	We improve the error message of zfs redact by checking if the target snapshot exists, and if all the redaction snapshots exist. As a future improvement we could iterate over every snapshot provided and use that to determine which one specifically doesn't exist. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #11426 Closes #14496
*	zdb: zero-pad checksum output	Rob N ★	2023-02-07	1	-2/+5
\| \| \| \| \| \| \| \|	The leading zeroes are part of the checksum so we should show them. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #14464
*	Teach zdb about DMU_OT_ERROR_LOG objects	George Amanakis	2023-02-02	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \|	With the persistent error log feature we need to account for spa_errlog_{scrub, last} containing mappings to other error log objects, which need to be marked as in-use as well. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #14442 Closes #14434
*	zfs_main.c: fix unused variable error with GCC	rob-wing	2023-02-02	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	zfs_setproctitle_init() is stubbed out on FreeBSD. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Rob Wing <[email protected]> Closes #14441
*	Fix console progress reporting for recursive send	Ameer Hamza	2023-02-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	After commit 19d3961, progress reporting (-v) with replication flag enabled does not report the progress on the console. This commit fixes the issue by updating the logic to check for pa->progress instead of pa_verbosity in send_progress_thread(). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #14448
*	Improve resilver ETAs	Brian Behlendorf	2023-01-25	1	-11/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When resilvering the estimated time remaining is calculated using the average issue rate over the current pass. Where the current pass starts when a scan was started, or restarted, if the pool was exported/imported. For dRAID pools in particular this can result in wildly optimistic estimates since the issue rate will be very high while scanning when non-degraded regions of the pool are scanned. Once repair I/O starts being issued performance drops to a realistic number but the estimated performance is still significantly skewed. To address this we redefine a pass such that it starts after a scanning phase completes so the issue rate is more reflective of recent performance. Additionally, the zfs_scan_report_txgs module option can be set to reset the pass statistics more often. Reviewed-by: Akash B <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #14410
*	Reject streams that set ->drr_payloadlen to unreasonably large values	Richard Yao	2023-01-23	3	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the zstream code, Coverity reported: "The argument could be controlled by an attacker, who could invoke the function with arbitrary values (for example, a very high or negative buffer size)." It did not report this in the kernel. This is likely because the userspace code stored this in an int before passing it into the allocator, while the kernel code stored it in a uint32_t. However, this did reveal a potentially real problem. On 32-bit systems and systems with only 4GB of physical memory or less in general, it is possible to pass a large enough value that the system will hang. Even worse, on Linux systems, the kernel memory allocator is not able to support allocations up to the maximum 4GB allocation size that this allows. This had already been limited in userspace to 64MB by `ZFS_SENDRECV_MAX_NVLIST`, but we need a hard limit in the kernel to protect systems. After some discussion, we settle on 256MB as a hard upper limit. Attempting to receive a stream that requires more memory than that will result in E2BIG being returned to user space. Reported-by: Coverity (CID-1529836) Reported-by: Coverity (CID-1529837) Reported-by: Coverity (CID-1529838) Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14285
*	Configure zed's diagnosis engine with vdev properties	rob-wing	2023-01-23	1	-4/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce four new vdev properties: checksum_n checksum_t io_n io_t These properties can be used for configuring the thresholds of zed's diagnosis engine and are interpeted as <N> events in T <seconds>. When this property is set to a non-default value on a top-level vdev, those thresholds will also apply to its leaf vdevs. This behavior can be overridden by explicitly setting the property on the leaf vdev. Note that, these properties do not persist across vdev replacement. For this reason, it is advisable to set the property on the top-level vdev instead of the leaf vdev. The default values for zed's diagnosis engine (10 events, 600 seconds) remains unchanged. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Allan Jude <[email protected]> Signed-off-by: Rob Wing <[email protected]> Sponsored-by: Seagate Technology LLC Closes #13805
*	Use setproctitle to report progress of zfs send	Ameer Hamza	2023-01-17	1	-6/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows parsing of zfs send progress by checking the process title. Doing so requires some changes to the send code in libzfs_sendrecv.c; primarily these changes move some of the accounting around, to allow for the code to be verbose as normal, or set the process title. Unlike BSD, setproctitle() isn't standard in Linux; thus, borrowed it from libbsd with slight modifications. Authored-by: Sean Eric Fagan <[email protected]> Co-authored-by: Ryan Moeller <[email protected]> Co-authored-by: Ameer Hamza <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #14376
*	zpool-set: print error message when pool or vdev is not valid	Rob Wing	2023-01-17	1	-19/+17
\| \| \| \| \| \| \| \| \| \|	Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Wing <[email protected]> Sponsored-by: Seagate Technology Submitted-by: Klara, Inc. Closes #14310
*	zpool-set: update usage text	Rob Wing	2023-01-17	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Wing <[email protected]> Sponsored-by: Seagate Technology Submitted-by: Klara, Inc. Closes #14310
*	zpool: do guid-based comparison in is_vdev_cb()	rob-wing	2023-01-11	1	-11/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	is_vdev_cb() uses string comparison to find a matching vdev and will fallback to comparing the guid via a string. These changes drop the string comparison and compare the guids instead. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Allan Jude <[email protected]> Signed-off-by: Rob Wing <[email protected]> Co-authored-by: Rob Wing <[email protected]> Sponsored-by: Seagate Technology Submitted-by: Klara, Inc. Closes #14311
*	ztest: update ztest_dmu_snapshot_create_destroy()	Brian Behlendorf	2023-01-10	1	-4/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	ECHRNG is returned when the channel program encounters a runtime error. For example, this can happen when a snapshot doesn't exist. We handle this error the same way as the existing EEXIST and ENOENT error checks. Additionally, improve the internal debug message to include the error describing why a pool couldn't be opened. Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #14351
*	ztest: ztest_dsl_prop_set_uint64() ENOSPC consistency	Brian Behlendorf	2023-01-10	1	-7/+11
\| \| \| \| \| \| \| \| \|	It is possible for ztest_dsl_prop_set_uint64() to fail with ENOSPC and this needs to be handled consistently. Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #14351
*	ztest: reduce `zpool split` frequency	Brian Behlendorf	2023-01-10	1	-1/+1
\| \| \| \| \| \| \| \| \|	There's no need to so aggressively test splitting a pool. Reduce the occurence of this test to once every 10 seconds. Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #14351
*	ztest: update expectation for sparing a special device	Brian Behlendorf	2023-01-10	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \|	Commit c23738c70eb86a7f04f93292caef2ed977047608 modified the expected behavior of attach to prevent hot spares from being used as special vdev replacements. We update ztest's expectations accordingly to prevent it from failing when testing the updated behavior. Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #14351
*	zed: add hotplug support for spare vdevs	Ameer Hamza	2023-01-09	3	-13/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit supports for spare vdev hotplug. The spare vdev associated with all the pools will be marked as "Removed" when the drive is physically detached and will become "Available" when the drive is reattached. Currently, the spare vdev status does not change on the drive removal and the same is the case with reattachment. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #14295
*	Update arc_summary and arcstat outputs	Alexander Motin	2023-01-05	2	-76/+216
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recent ARC commits added new statistic counters, such as iohits, uncached state, etc. Represent those. Also some of previously reported numbers were confusing or even made no sense. Cleanup and restructure existing reports. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Issue #14115 Issue #14123 Issue #14243 Closes #14320
*	deadlock between spa_errlog_lock and dp_config_rwlock	Matthew Ahrens	2022-12-22	2	-27/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a lock order inversion deadlock between `spa_errlog_lock` and `dp_config_rwlock`: A thread in `spa_delete_dataset_errlog()` is running from a sync task. It is holding the `dp_config_rwlock` for writer (see `dsl_sync_task_sync()`), and waiting for the `spa_errlog_lock`. A thread in `dsl_pool_config_enter()` is holding the `spa_errlog_lock` (see `spa_get_errlog_size()`) and waiting for the `dp_config_rwlock` (as reader). Note that this was introduced by #12812. This commit address this by defining the lock ordering to be dp_config_rwlock first, then spa_errlog_lock / spa_errlist_lock. spa_get_errlog() and spa_get_errlog_size() can acquire the locks in this order, and then process_error_block() and get_head_and_birth_txg() can verify that the dp_config_rwlock is already held. Additionally, a buffer overrun in `spa_get_errlog()` is corrected. Many code paths didn't check if `*count` got to zero, instead continuing to overwrite past the beginning of the userspace buffer at `uaddr`. Tested by having some errors in the pool (via `zinject -t data /path/to/file`), one thread running `zpool iostat 0.001`, and another thread runs `zfs destroy` (in a loop, although it hits the first time). This reproduces the problem easily without the fix, and works with the fix. Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: George Amanakis <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #14239 Closes #14289
*	Revert "zdb: zdb_ddt_leak_init() reads uninitialized memory..."	Brian Behlendorf	2022-12-21	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit d30db519af44b905fc52b8c8ba34f6378aa03470. With this change applied zloop.sh fails reliably with the following ASSERT. zio_wait(zio_claim(NULL, zcb->zcb_spa, refcnt ? 0 : spa_min_claim_txg( zcb->zcb_spa), bp, NULL, NULL, ZIO_FLAG_CANFAIL)) == 0 (0x2 == 0x0) ASSERT at cmd/zdb/zdb.c:5452:zdb_count_block() Reviewed-by: George Melikov <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #14306
*	zfs list: Allow more fields in ZFS_ITER_SIMPLE mode	Allan Jude	2022-12-13	4	-46/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the fields to be listed and sorted by are constrained to those populated by dsl_dataset_fast_stat(), then zfs list is much faster, as it does not need to open each objset and reads its properties. A previous optimization by Pawel Dawidek (0cee24064a79f9c01fc4521543c37acea538405f) took advantage of this to make listing snapshot names sorted only by name much faster. However, it was limited to `-o name -s name`, this work extends this optimization to work with: - name - guid - createtxg - numclones - inconsistent - redacted - origin and could be further extended to any other properties supported by dsl_dataset_fast_stat() or similar, that do not require extra locking or reading from disk. This was committed before (9a9e2e343dfa2af28bf7910de77ae73aa006de62), but was reverted due to a regression when used with an older kernel. If the kernel does not populate zc->zc_objset_stats, we now fallback to getting the properties via the slower interface, to avoid problems with newer userland and older kernels. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #14110
*	Change ZEVENT_POOL_GUID to ZEVENT_POOL to display pool names	Marcel Menzel	2022-12-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Outgoing mails for ZFS pool events include the pool GUID, but not the actual pool name. Let's change this for better readability, as it is already done in the mails for finished pool resilvers. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by Richard Yao <[email protected]> Signed-off-by: Marcel Menzel <[email protected]> Closes #14272
*	Address theoretical uninitialized variable usage in zstream	Richard Yao	2022-12-12	3	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Coverity has long complained about the checksum being uninitialized if an END record is processed before its BEGIN record. This should not happen, but there was no code to check for it. I had left this unfixed since it was a low priority issue, but then 9f4ede63d23be4f43ba8dd0ca42c6a773a8eaa8d added another instance of this. I am making an effort to "hold the line" to keep new coverity defect reports from going unaddressed, so I find myself forced to fix this much earlier than I had originally planned to address it. The solution is to maintain a counter and a flag. Then use VERIFY statements to verify the following runtime constraints: * Every record either has a corresponding BEGIN record, is a BEGIN record or is the end of stream END record for replication streams. * BEGIN records cannot be nested. i.e. There must be an END record before another BEGIN record may be seen. Failure to meet these constraints will cause the program to exit. This is sufficient to ensure that the checksum is never accessed when uninitialized. Reported-by: Coverity (CID 1524578) Reported-by: Coverity (CID 1524633) Reported-by: Coverity (CID 1527295) Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Damian Szuberski <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14176
*	zdb: Handle theoretical buffer overflow when printing float	Richard Yao	2022-12-08	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CodeQL pointed out that for extreme floating point values, `sprintf()` will overwrite a 32 character buffer. It cited 1e304 as an example, which causes `sprintf()` to print 308 characters. In practice, the numbers should never exceed 100, so this should not happen. To silence the warning and also handle unexpected situations, we change the code to use `snprintf()`. This was missed during my audit of our use of `sprintf()`, since I did not think to consider extreme floating point representations. It also really should not happen, so this change is purely defensive programming. This was found by CodeQL's cpp/overrunning-write-with-float check. Reviewed-by: Damian Szuberski <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14264
*	zdb: zdb_ddt_leak_init() reads uninitialized memory when birth == 0	Richard Yao	2022-12-08	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was written by Jeff Bonick and was committed to OpenSolaris on November 1, 2009. It appears that Jeff meant to continue the outer loop iteration when `ddp->ddp_phys_birth == 0`, but put his check inside the inner loop. This causes a pointer to uninitialized memory to be passed to ddt_lookup() inside a VERIFY() statement whenever that condition is true. Reported-by: Coverity (CID 1524462) Reviewed-by: Damian Szuberski <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14264
*	ztest: comparisons against errno should not assign to it	Richard Yao	2022-12-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	888914486e26d81c8dbfd7a9d8091c05f9fc98ba introduced this regression. I used cscope to verify that there are no other instances of this in the codebase. This is the one of the few bugs that are extremely easy to identify using cscope. Reviewed-by: Damian Szuberski <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14264
*	Fix potential buffer overflow in zpool command	Richard Yao	2022-12-08	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ZPOOL_SCRIPTS_PATH environment variable can be passed here. This allows for arbitrarily long strings to be passed to sprintf(), which can overflow the buffer. I missed this in my earlier audit of the codebase. CodeQL's cpp/unbounded-write check caught this. Reviewed-by: Damian Szuberski <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14264
*	zdb: Fix big parameter passed by value	Richard Yao	2022-12-08	1	-19/+19
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is not in performance critical code, but static analyzers will complain about it, so lets switch to pass by pointer here. Reported-by: Coverity (CID-1524384) Reviewed-by: Damian Szuberski <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14263
*	Cleanup: zhack should not declare function prototypes in main()	Richard Yao	2022-12-08	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Instead, it should include the proper header. CodeQL caught this. Reviewed-by: Damian Szuberski <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14263
*	Fix GCC 12 compilation errors	szubersk	2022-11-30	2	-2/+2
\| \| \| \| \| \| \| \| \|	Squelch false positives reported by GCC 12 with UBSan. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: szubersk <[email protected]> Closes #14150
*	zdb: Silence Coverity complaint about verify_livelist_allocs()	Richard Yao	2022-11-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	svb is declared on the stack. We then set parts of svb.svb_dva with DVA_SET_VDEV(), DVA_SET_OFFSET() and DVA_SET_ASIZE(). However, the DVA contains other fields for pad, GRID and G. When setting the fields we use, we technically read uninitialized bits from the fields we do not use. This makes Coverity and Clang's Static Analyzer complain. Presumably, other static analyzers might complain too. There is no real bug here, but we are still technically reading undefined data and unless we stop doing that, static analyzers will complain about it in perpetuum and this could obscure real issues. We silence the static analyzer complaints by using a 0 struct initializer. Reported by: Coverity (CID 1524627) Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14210
*	zed: unclean disk attachment faults the vdev	Ameer Hamza	2022-11-29	1	-7/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the attached disk already contains a vdev GUID, it means the disk is not clean. In such a scenario, the physical path would be a match that makes the disk faulted when trying to online it. So, we would only want to proceed if either GUID matches with the last attached disk or the disk is in a clean state. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #14181
*	cmd: zfs: fix missing mention of zfs diff -h	наб	2022-11-28	1	-1/+1
\| \| \| \| \| \| \| \| \|	Fixes: 344bbc82e7054f61d5e7b3610b119820285fd2cb Reviewed-by: Damian Szuberski <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #14224
*	Add ability to recompress send streams with new compression algorithm	Paul Dagnelie	2022-11-10	4	-0/+362
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As new compression algorithms are added to ZFS, it could be useful for people to recompress data with new algorithms. There is currently no mechanism to do this aside from copying the data manually into a new filesystem with the new algorithm enabled. This tool allows the transformation to happen through zfs send, allowing it to be done efficiently to remote systems and in an incremental fashion. A new zstream command is added that decompresses WRITE records and then recompresses them with a provided algorithm, and then re-emits the modified send stream. It may also be possible to re-compress embedded block pointers, but that was not attempted for the initial version. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #14106
*	Adds the `-p` option to `zfs holds`	Mohamed Tawfik	2022-11-08	1	-9/+28
\| \| \| \| \| \| \| \| \| \| \| \|	This allows for printing a machine-readable, accurate to the second, hold creation time in the form of a unix epoch timestamp. Additionally, updates relevant documentation and man pages accordingly. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mohamed Tawfik <[email protected]> Closes #13690 Closes #14152
*	Address warnings about possible division by zero from clangsa	Richard Yao	2022-11-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* The complaint in ztest_replay_write() is only possible if something went horribly wrong. An assertion will silence this and if it goes off, we will know that something is wrong. * The complaint in spa_estimate_metaslabs_to_flush() is not impossible, but seems very unlikely. We resolve this by passing the value from the `MIN()` that does not go to infinity when the variable is zero. There was a third report from Clang's scan-build, but that was a definite false positive and disappeared when checked again through Clang's static analyzer with Z3 refution via CodeChecker. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14124
*	Use intptr_t when storing an integer in a pointer	Brooks Davis	2022-11-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Cast the integer type to (u)intptr_t before casting to "void *". In CHERI C/C++ we warn on bare casts from integers to pointers to catch attempts to create pointers our of thin air. We allow the warning to be supressed with a suitable cast through (u)intptr_t. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brooks Davis <[email protected]> Closes #14131
*	Introduce kmem_scnprintf()	Richard Yao	2022-10-29	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`snprintf()` is meant to protect against buffer overflows, but operating on the buffer using its return value, possibly by calling it again, can cause a buffer overflow, because it will return how many characters it would have written if it had enough space even when it did not. In a number of places, we repeatedly call snprintf() by successively incrementing a buffer offset and decrementing a buffer length, by its return value. This is a potentially unsafe usage of `snprintf()` whenever the buffer length is reached. CodeQL complained about this. To fix this, we introduce `kmem_scnprintf()`, which will return 0 when the buffer is zero or the number of written characters, minus 1 to exclude the NULL character, when the buffer was too small. In all other cases, it behaves like snprintf(). The name is inspired by the Linux and XNU kernels' `scnprintf()`. The implementation was written before I thought to look at `scnprintf()` and had a good name for it, but it turned out to have identical semantics to the Linux kernel version. That lead to the name, `kmem_scnprintf()`. CodeQL only catches this issue in loops, so repeated use of snprintf() outside of a loop was not caught. As a result, a thorough audit of the codebase was done to examine all instances of `snprintf()` usage for potential problems and a few were caught. Fixes for them are included in this patch. Unfortunately, ZED is one of the places where `snprintf()` is potentially used incorrectly. Since using `kmem_scnprintf()` in it would require changing how it is linked, we modify its usage to make it safe, no matter what buffer length is used. In addition, there was a bug in the use of the return value where the NULL format character was not being written by pwrite(). That has been fixed. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14098
*	Cleanup dump_bookmarks()	Richard Yao	2022-10-29	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Assertions are meant to check assumptions, but the way that this assertion is written does not check an assumption, since it is provably always true. Removing the assertion will cause a compiler warning (made into an error by -Werror) about printing up to 512 bytes to a 256-byte buffer, so instead, we change the assertion to verify the assumption that we never do a snprintf() that is truncated to avoid overrunning the 256-byte buffer. This was caught by an audit of the codebase to look for misuse of `snprintf()` after CodeQL reported that we had misused `snprintf()`. An explanation of how snprintf() can be misused is here: https://www.redhat.com/en/blog/trouble-snprintf This particular instance did not misuse `snprintf()`, but it was caught by the audit anyway. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14098