openzfs/zfs.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix concurrent resilvers initiated at same time	Akash B	2023-05-24	2	-2/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For draid vdevs it was possible to initiate both the sequential and healing resilver at same time. This fixes the following two scenarios. 1) There's a window where a sequential rebuild can be started via ZED even if a healing resilver has been scheduled. - This is fixed by adding additional check in spa_vdev_attach() for any scheduled resilver and return appropriate error code when a resilver is already in progress. 2) It was possible for zpool clear to start a healing resilver when it wasn't needed at all. This occurs because during a vdev_open() the device is presumed to be healthy not until the device is validated by vdev_validate() and it's set unavailable. However, by this point an async resilver will have already been requested if the DTL isn't empty. - This is fixed by cancelling the SPA_ASYNC_RESILVER request immediately at the end of vdev_reopen() when a resilver is unneeded. Finally, added a testcase in ZTS for verification. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Dipak Ghosh <[email protected]> Signed-off-by: Akash B <[email protected]> Closes #14881 Closes #14892
*	Linux 6.4 compat: reclaimed_slab renamed to reclaimed	youzhongyang	2023-05-24	2	-1/+8
\| \| \| \| \| \| \|	Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Youzhong Yang <[email protected]> Closes #14891
*	Hold db_mtx when updating db_state	Brian Atkinson	2023-05-19	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 555ef90 did some general code refactoring for dmu_buf_will_not_fill() and dmu_buf_will_fill(). However, the db_mtx was not held when update db->db_state in those code block. The rest of the dbuf code always holds the db_mtx when updating db_state. This is important because cv_wait() db_changed is used to check for db_state changes. Updating dmu_buf_will_not_fill() and dmu_buf_will_fill() to hold the db_mtx when updating db_state. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Brian Atkinson <[email protected]> Closes #14875
*	Probe vdevs before marking removed	Brian Behlendorf	2023-05-19	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before allowing the ZED to mark a vdev as REMOVED due to a hotplug event confirm that it is non-responsive with probe. Any device which can be successfully probed should be left ONLINE to prevent a healthy pool from being incorrectly SUSPENDED. This may occur for at least the following two scenarios. 1) Drive expansion (zpool online -e) in VMware environments. If, during the partition resize operation, a partition is removed and re-created then udev will send a removed event. 2) Re-scanning the namespaces of an NVMe device (nvme ns-rescan) may result in a udev remove and add event being delivered. Finally, update the ZED to only kick in a spare when the removal was successful. Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #14859 Closes #14861
*	Teach zpool scrub to scrub only blocks in error log	George Amanakis	2023-05-18	5	-35/+820
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added a flag '-e' in zpool scrub to scrub only blocks in error log. A user can pause, resume and cancel the error scrub by passing additional command line arguments -p -s just like a regular scrub. This involves adding a new flag, creating new libzfs interfaces, a new ioctl, and the actual iteration and read-issuing logic. Error scrubbing is executed in multiple txg to make sure pool performance is not affected. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Co-authored-by: TulsiJain [email protected] Signed-off-by: George Amanakis <[email protected]> Closes #8995 Closes #12355
*	Add the ability to uninitialize	Brian Behlendorf	2023-05-18	3	-3/+73
\| \| \| \| \| \| \| \| \| \| \| \|	zpool initialize functions well for touching every free byte...once. But if we want to do it again, we're currently out of luck. So let's add zpool initialize -u to clear it. Co-authored-by: Rich Ercolani <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12451 Closes #14873
*	Fix undefined behavior in spa_sync_props()	Richard Yao	2023-05-15	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	8eae2d214cfa53862833eeeda9a5c1e9d5ded47d caused Coverity to begin complaining about "Improper use of negative value" in two places in spa_sync_props() because Coverity correctly inferred from `prop == ZPOOL_PROP_INVAL` that prop could be -1 while both zpool_prop_to_name() and zpool_prop_get_type() use it an array index, which is undefined behavior. Assuming that the system does not panic from an attempt to read invalid memory, the case statement for ZPOOL_PROP_INVAL will ensure that only user properties will reach this code when prop is ZPOOL_PROP_INVAL, such that execution will continue safely. However, if we are unlucky enough to read invalid memory, then the system will panic. This issue predates the patch that caused coverity to begin complaining. Thankfully, our userland tools do not pass nonsense to us, so this bug should not be triggered unless a future userland tool attempts to set a property that we do not understand. Reported-by: Coverity (CID-1561129) Reported-by: Coverity (CID-1561130) Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14860
*	Fix use after free regression in spa_remove_healed_errors()	Richard Yao	2023-05-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	6839ec6f1098c28ff7b772f1b31b832d05e6b567 placed code in spa_remove_healed_errors() that uses a pointer after the kmem_free() call that frees it. Reported-by: Coverity (CID-1562375) Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14860
*	zil: Free lwb_buf after write completion.	Alexander Motin	2023-05-12	1	-12/+6
\| \| \| \| \| \| \| \| \|	There is no sense to keep that memory allocated during the flush. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14855
*	zil: Some micro-optimizations.	Alexander Motin	2023-05-12	1	-52/+23
\| \| \| \| \| \| \| \|	Should not cause functional changes. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14854
*	Make sure we are not trying to clone a spill block.	Pawel Jakub Dawidek	2023-05-11	1	-0/+2
\| \| \| \| \| \|	Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14825
*	Correct comment.	Pawel Jakub Dawidek	2023-05-11	1	-1/+1
\| \| \| \| \| \|	Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14825
*	Remove badly placed comment.	Pawel Jakub Dawidek	2023-05-11	1	-4/+0
\| \| \| \| \| \|	Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14825
*	Don't call zfs_exit_two() before zfs_enter_two().	Pawel Jakub Dawidek	2023-05-11	1	-8/+9
\| \| \| \| \| \|	Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14825
*	Don't use dmu_buf_is_dirty() for unassigned transaction.	Pawel Jakub Dawidek	2023-05-11	2	-13/+6
\| \| \| \| \| \| \| \| \| \| \|	The dmu_buf_is_dirty() call doesn't make sense here for two reasons: 1. txg is 0 for unassigned tx, so it was a no-op. 2. It is equivalent of checking if we have dirty records and we are doing this few lines earlier. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14825
*	Deny block cloning is dbuf size doesn't match BP size.	Pawel Jakub Dawidek	2023-05-11	2	-6/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I don't know an easy way to shrink down dbuf size, so just deny block cloning into dbufs that don't match our BP's size. This fixes the following situation: 1. Create a small file, eg. 1kB of random bytes. Its dbuf will be 1kB. 2. Create a larger file, eg. 2kB of random bytes. Its dbuf will be 2kB. 3. Truncate the large file to 0. Its dbuf will remain 2kB. 4. Clone the small file into the large file. Small file's BP lsize is 1kB, but the large file's dbuf is 2kB. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14825
*	Additional block cloning fixes.	Pawel Jakub Dawidek	2023-05-11	2	-35/+82
\| \| \| \| \| \| \| \| \| \|	Reimplement some of the block cloning vs dbuf logic, mostly to fix situation where we clone a block and in the same transaction group we want to partially overwrite the clone. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14825
*	zil: Don't expect zio_shrink() to succeed.	Alexander Motin	2023-05-11	1	-0/+1
\| \| \| \| \| \| \| \| \|	At least for RAIDZ zio_shrink() does not reduce zio size, but reduced wsz in that case likely results in writing uninitialized memory. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14853
*	Prevent panic during concurrent snapshot rollback and zvol read	Ameer Hamza	2023-05-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	Protect zvol_cdev_read with zv_suspend_lock to prevent concurrent release of the dnode, avoiding panic when a snapshot is rolled back in parallel during ongoing zvol read operation. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #14839
*	Add dmu_tx_hold_append() interface	Brian Behlendorf	2023-05-09	1	-0/+105
\| \| \| \| \| \| \| \| \| \|	Provides an interface which callers can use to declare a write when the exact starting offset in not yet known. Since the full range being updated is not available only the first L0 block at the provided offset will be prefetched. Reviewed-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #14819
*	Remove duplicate code in l2arc_evict()	George Amanakis	2023-05-09	1	-14/+8
\| \| \| \| \| \| \| \| \| \|	l2arc_evict() performs the adjustment of the size of buffers to be written on L2ARC unnecessarily. l2arc_write_size() is called right before l2arc_evict() and performs those adjustments. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #14828
*	Remove single parent assertion from zio_nowait().	Alexander Motin	2023-05-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	We only need to know if ZIO has any parent there. We do not care if it has more than one, but use of zio_unique_parent() == NULL asserts that. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14823
*	Enable the head_errlog feature to remove errors	George Amanakis	2023-05-09	2	-36/+141
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case check_filesystem() does not error out and does not report an error, remove that error block from error lists and logs without requiring a scrub. This can happen when the original file and all snapshots/clones referencing it have been removed. Otherwise zpool status will still report that "Permanent errors have been detected..." without actually reporting any of them. To implement this change the functions introduced in corrective receive were modified to take into account the head_errlog feature. Before this change: ============================= pool: test state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 /home/user/vdev_a ONLINE 0 0 2 errors: Permanent errors have been detected in the following files: ============================= After this change: ============================= pool: test state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 /home/user/vdev_a ONLINE 0 0 2 errors: No known data errors ============================= Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #14813
*	Fixes in head_errlog feature with encryption	George Amanakis	2023-05-08	1	-44/+32
\| \| \| \| \| \| \| \| \| \|	For the head_errlog feature use dsl_dataset_hold_obj_flags() instead of dsl_dataset_hold_obj() in order to enable access to the encryption keys (if loaded). This enables reporting of errors in encrypted filesystems which are not mounted but have their keys loaded. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #14837
*	Verify block pointers before writing them out	Matthew Ahrens	2023-05-08	5	-30/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a block pointer is corrupted (but the block containing it checksums correctly, e.g. due to a bug that overwrites random memory), we can often detect it before the block is read, with the `zfs_blkptr_verify()` function, which is used in `arc_read()`, `zio_free()`, etc. However, such corruption is not typically recoverable. To recover from it we would need to detect the memory error before the block pointer is written to disk. This PR verifies BP's that are contained in indirect blocks and dnodes before they are written to disk, in `dbuf_write_ready()`. This way, we'll get a panic before the on-disk data is corrupted. This will help us to diagnose what's causing the corruption, as well as being much easier to recover from. To minimize performance impact, only checks that can be done without holding the spa_config_lock are performed. Additionally, when corruption is detected, the raw words of the block pointer are logged. (Note that `dprintf_bp()` is a no-op by default, but if enabled it is not safe to use with invalid block pointers.) Reviewed-by: Rich Ercolani <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Zuchowski <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #14817
*	Fix two abd_gang_add_gang() issues.	Alexander Motin	2023-05-05	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \|	- There is no reason to assert that added gang is not empty. It may be weird to add an empty gang, but it is legal. - When moving chain list from the added gang clear its size, or it will trigger assertion in abd_verify() when that gang is freed. Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14816
*	Plug memory leak in zfsdev_state.	Pawel Jakub Dawidek	2023-05-05	1	-0/+2
\| \| \| \| \| \| \| \| \|	On kernel module unload, free all zfsdev state structures, except for zfsdev_state_listhead, which is statically allocated. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14824
*	zpool import -m also removing spare and cache when log device is missing	Ameer Hamza	2023-05-03	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	spa_import() relies on a pool config fetched by spa_try_import() for spare/cache devices. Import flags are not passed to spa_tryimport(), which makes it return early due to a missing log device and missing retrieving the cache device and spare eventually. Passing ZFS_IMPORT_MISSING_LOG to spa_tryimport() makes it fetch the correct configuration regardless of the missing log device. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #14794
*	Optimize check_filesystem() and process_error_log()	George Amanakis	2023-05-03	1	-54/+84
\| \| \| \| \| \| \| \| \| \| \| \| \|	Integrate check_clones() into check_filesystem() and implement a list instead of iterating recursively over the clones, thus eliminating the risk of a stack overflow. Also use kmem_zalloc() to allocate large structures in process_error_log() reducing its stack size from ~700 to ~128 bytes. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #14744
*	Use correct block pointer in block cloning case.	Pawel Jakub Dawidek	2023-05-02	1	-2/+1
\| \| \| \| \| \|	Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14806
*	blake3: fix up bogus checksums in face of cpu migration	Mateusz Guzik	2023-05-01	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \|	This is a temporary measure until a better fix is sorted out. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Rich Ercolani <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Sponsored by: Rubicon Communications, LLC ("Netgate") Closes #14785 Closes #14808
*	Correct ABD size for split block ZIOs	Serapheim Dimitropoulos	2023-05-01	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently when layering the ABD buffer of each split block on top of an indirect vdev's ZIO ABD we don't specify the split block's ABD. This results in those ABDs being incorrectly sized by inheriting the size of their parent ABD which is larger than what each split block needs. The above behavior isn't causing any bugs currently but can lead to unexpected ABD sizes for people analyzing and/or working on the ZIO codepath. This patch fixes this behavior by properly setting the ABD size for split block ZIOs. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #14804
*	powerpc64: Support ELFv2 asm on Big Endian	Justin Hibbits	2023-04-27	4	-0/+61
\| \| \| \| \| \| \| \| \| \| \|	FreeBSD/powerpc64 is all ELFv2 since FreeBSD 13, even big endian. The existing sha256 and sha512 asm code assumes that BE is all ELFv1, and LE is ELFv2. Minor changes to add ELFv2 in the BE side gets this working correctly on FreeBSD with latest OpenZFS import. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Justin Hibbits <[email protected]> Closes #14779
*	Mark TX_COMMIT transaction with TXG_NOTHROTTLE.	Alexander Motin	2023-04-27	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \|	TX_COMMIT has no on-disk representation and does not produce any more dirty data. It should not wait for anything, and even just skipping the checks if not waiting gives improvement noticeable in profiler. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14798
*	Fix BLAKE3 aarch64 assembly for FreeBSD and macOS	Tino Reichardt	2023-04-26	2	-4511/+4057
\| \| \| \| \| \| \| \| \| \|	The x18 register isn't useable within FreeBSD kernel space, so we have to fix the BLAKE3 aarch64 assembly for not using it. The source files are here: https://github.com/mcmilk/BLAKE3-tests Reviewed-by: Kyle Evans <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #14728
*	Fix checkstyle warning	Brian Behlendorf	2023-04-26	1	-1/+1
\| \| \| \| \| \| \| \| \|	Resolve a missed checkstyle warning. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Mateusz Guzik <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #14799
*	Fix positive ABD size assertion in abd_verify().	Alexander Motin	2023-04-26	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	Gang ABDs without childred are legal, and they do have zero size. For other ABD types zero size doesn't have much sense and likely not working correctly now. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14795
*	FreeBSD: fix up EINVAL from getdirentries on .zfs	Mateusz Guzik	2023-04-26	1	-0/+11
\| \| \| \| \| \| \| \| \| \|	Without the change: /.zfs /.zfs/snapshot find: /.zfs: Invalid argument Signed-off-by: Mateusz Guzik <[email protected]> Closes #14774
*	FreeBSD: add missing vn state transition for .zfs	Mateusz Guzik	2023-04-26	1	-0/+4
\| \| \| \| \|	Signed-off-by: Mateusz Guzik <[email protected]> Closes #14774
*	Revert "Fix data race between zil_commit() and zil_suspend()"	Brian Behlendorf	2023-04-25	1	-27/+0
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit 4c856fb333ac57d9b4a6ddd44407fd022a702f00 to resolve a newly introduced deadlock which in practice in more disruptive that the issue this commit intended to address. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #14775 Closes #14790
*	Add loongarch64 support	Han Gao	2023-04-25	3	-0/+86
\| \| \| \| \| \| \| \| \| \| \|	Add loongarch64 definitions & lua module setjmp asm LoongArch is a new RISC ISA, which is a bit like MIPS or RISC-V. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Han Gao <[email protected]> Signed-off-by: WANG Xuerui <[email protected]> Closes #13422
*	FreeBSD: add missing vop_fplookup assignments	Mateusz Guzik	2023-04-24	1	-0/+9
\| \| \| \| \| \| \| \| \| \|	It became illegal to not have them as of 5f6df177758b9dff88e4b6069aeb2359e8b0c493 ("vfs: validate that vop vectors provide all or none fplookup vops") upstream. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #14788
*	FreeBSD: try to fallback early if can't do optimized copy	Mateusz Guzik	2023-04-24	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	Not complete, but already shaves on some locking. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Sponsored by: Rubicon Communications, LLC ("Netgate") Closes #14723
*	FreeBSD: fix up EXDEV handling for clone_range	Mateusz Guzik	2023-04-24	1	-30/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	API contract requires VOPs to handle EXDEV internally, worst case by falling back to the generic copy routine. This broke with the recent changes. While here whack custom loop to lock 2 vnodes with vn_lock_pair, which provides the same functionality internally. write start/finish around it plays no role so got eliminated. One difference is that vn_lock_pair always takes an exclusive lock on both vnodes. I did not patch around it because current code takes an exclusive lock on the target vnode. zfs supports shared-locking for writes, so this serializes different calls to the routine as is, despite range locking inside. At the same time you may notice the source vnode can get some traffic if only shared-locked, thus once more this goes the safer route of exclusive-locking. Note this should be patched to use shared-locking for both once the feature is considered stable. Technically the switch to vn_lock_pair should be a separate change, but it would only introduce churn immediately whacked by the rest of the patch. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Sponsored by: Rubicon Communications, LLC ("Netgate") Closes #14723
*	FreeBSD: make zfs_vfs_held() definition consistent with declaration	Dimitry Andric	2023-04-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Noticed while attempting to change FreeBSD's boolean_t into an actual bool: in include/sys/zfs_ioctl_impl.h, zfs_vfs_held() is declared to return a boolean_t, but in module/os/freebsd/zfs/zfs_ioctl_os.c it is defined to return an int. Make the definition match the declaration. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Dimitry Andric <[email protected]> Closes #14776
*	Add support for zpool user properties	Allan Jude	2023-04-21	1	-41/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Usage: zpool set org.freebsd:comment="this is my pool" poolname Tests are based on zfs_set's user property tests. Also stop truncating property values at MAXNAMELEN, use ZFS_MAXPROPLEN. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Mateusz Piotrowski <[email protected]> Sponsored-by: Beckhoff Automation GmbH & Co. KG. Sponsored-by: Klara Inc. Closes #11680
*	Linux: zfs_zaccess_trivial() should always call generic_permission()	Richard Yao	2023-04-20	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Building with Clang on Linux generates a warning that err could be uninitialized if mnt_ns is a NULL pointer. However, mnt_ns should never be NULL, so there is no need to put this behind an if statement. Taking it outside of the if statement means that the possibility of err being uninitialized goes from being always zero in a way that the compiler could not realize to a way that is always zero in a way that the compiler can realize. Sponsored-By: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Youzhong Yang <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14738
*	Create zap for root vdev	rob-wing	2023-04-20	4	-4/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	And add it to the AVZ, this is not backwards compatible with older pools due to an assertion in spa_sync() that verifies the number of ZAPs of all vdevs matches the number of ZAPs in the AVZ. Granted, the assertion only applies to #DEBUG builds - still, a feature flag is introduced to avoid the assertion, com.klarasystems:vdev_zaps_v2 Notably, this allows to get/set properties on the root vdev: % zpool set user:prop=value <pool> root-0 Before this commit, it was already possible to get/set properties on top-level vdevs with the syntax <type>-<vdev_id> (e.g. mirror-0): % zpool set user:prop=value <pool> mirror-0 This syntax also applies to the root vdev as it is is of type 'root' with a vdev_id of 0, root-0. The keyword 'root' as an alias for 'root-0'. The following tests have been added: - zpool get all properties from root vdev - zpool set a property on root vdev - verify root vdev ZAP is created Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Wing <[email protected]> Sponsored-by: Seagate Technology Submitted-by: Klara, Inc. Closes #14405
*	Allow MMP to bypass waiting for other threads	Herb Wartens	2023-04-19	2	-4/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At our site we have seen cases when multi-modifier protection is enabled (multihost=on) on our pool and the pool gets suspended due to a single disk that is failing and responding very slowly. Our pools have 90 disks in them and we expect disks to fail. The current version of MMP requires that we wait for other writers before moving on. When a disk is responding very slowly, we observed that waiting here was bad enough to cause the pool to suspend. This change allows the MMP thread to bypass waiting for other threads and reduces the chances the pool gets suspended. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Herb Wartens <[email protected]> Closes #14659
*	Fix "Detach spare vdev in case if resilvering does not happen"	Ameer Hamza	2023-04-19	2	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spare vdev should detach from the pool when a disk is reinserted. However, spare detachment depends on the completion of resilvering, and if resilver does not schedule, the spare vdev keeps attached to the pool until the next resilvering. When a zfs pool contains several disks (25+ mirror), resilvering does not always happen when a disk is reinserted. In this patch, spare vdev is manually detached from the pool when resilvering does not occur and it has been tested on both Linux and FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #14722