aboutsummaryrefslogtreecommitdiffstats
path: root/module
Commit message (Collapse)AuthorAgeFilesLines
* Fix undefined behavior in spa_sync_props()Richard Yao2023-05-151-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | 8eae2d214cfa53862833eeeda9a5c1e9d5ded47d caused Coverity to begin complaining about "Improper use of negative value" in two places in spa_sync_props() because Coverity correctly inferred from `prop == ZPOOL_PROP_INVAL` that prop could be -1 while both zpool_prop_to_name() and zpool_prop_get_type() use it an array index, which is undefined behavior. Assuming that the system does not panic from an attempt to read invalid memory, the case statement for ZPOOL_PROP_INVAL will ensure that only user properties will reach this code when prop is ZPOOL_PROP_INVAL, such that execution will continue safely. However, if we are unlucky enough to read invalid memory, then the system will panic. This issue predates the patch that caused coverity to begin complaining. Thankfully, our userland tools do not pass nonsense to us, so this bug should not be triggered unless a future userland tool attempts to set a property that we do not understand. Reported-by: Coverity (CID-1561129) Reported-by: Coverity (CID-1561130) Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14860
* Fix use after free regression in spa_remove_healed_errors()Richard Yao2023-05-151-1/+1
| | | | | | | | | | | | 6839ec6f1098c28ff7b772f1b31b832d05e6b567 placed code in spa_remove_healed_errors() that uses a pointer after the kmem_free() call that frees it. Reported-by: Coverity (CID-1562375) Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14860
* zil: Free lwb_buf after write completion.Alexander Motin2023-05-121-12/+6
| | | | | | | | | There is no sense to keep that memory allocated during the flush. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14855
* zil: Some micro-optimizations.Alexander Motin2023-05-121-52/+23
| | | | | | | | Should not cause functional changes. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14854
* Make sure we are not trying to clone a spill block.Pawel Jakub Dawidek2023-05-111-0/+2
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14825
* Correct comment.Pawel Jakub Dawidek2023-05-111-1/+1
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14825
* Remove badly placed comment.Pawel Jakub Dawidek2023-05-111-4/+0
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14825
* Don't call zfs_exit_two() before zfs_enter_two().Pawel Jakub Dawidek2023-05-111-8/+9
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14825
* Don't use dmu_buf_is_dirty() for unassigned transaction.Pawel Jakub Dawidek2023-05-112-13/+6
| | | | | | | | | | | The dmu_buf_is_dirty() call doesn't make sense here for two reasons: 1. txg is 0 for unassigned tx, so it was a no-op. 2. It is equivalent of checking if we have dirty records and we are doing this few lines earlier. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14825
* Deny block cloning is dbuf size doesn't match BP size.Pawel Jakub Dawidek2023-05-112-6/+31
| | | | | | | | | | | | | | | | I don't know an easy way to shrink down dbuf size, so just deny block cloning into dbufs that don't match our BP's size. This fixes the following situation: 1. Create a small file, eg. 1kB of random bytes. Its dbuf will be 1kB. 2. Create a larger file, eg. 2kB of random bytes. Its dbuf will be 2kB. 3. Truncate the large file to 0. Its dbuf will remain 2kB. 4. Clone the small file into the large file. Small file's BP lsize is 1kB, but the large file's dbuf is 2kB. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14825
* Additional block cloning fixes.Pawel Jakub Dawidek2023-05-112-35/+82
| | | | | | | | | | Reimplement some of the block cloning vs dbuf logic, mostly to fix situation where we clone a block and in the same transaction group we want to partially overwrite the clone. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14825
* zil: Don't expect zio_shrink() to succeed.Alexander Motin2023-05-111-0/+1
| | | | | | | | | At least for RAIDZ zio_shrink() does not reduce zio size, but reduced wsz in that case likely results in writing uninitialized memory. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14853
* Prevent panic during concurrent snapshot rollback and zvol readAmeer Hamza2023-05-091-0/+2
| | | | | | | | | | | Protect zvol_cdev_read with zv_suspend_lock to prevent concurrent release of the dnode, avoiding panic when a snapshot is rolled back in parallel during ongoing zvol read operation. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #14839
* Add dmu_tx_hold_append() interfaceBrian Behlendorf2023-05-091-0/+105
| | | | | | | | | | Provides an interface which callers can use to declare a write when the exact starting offset in not yet known. Since the full range being updated is not available only the first L0 block at the provided offset will be prefetched. Reviewed-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #14819
* Remove duplicate code in l2arc_evict()George Amanakis2023-05-091-14/+8
| | | | | | | | | | l2arc_evict() performs the adjustment of the size of buffers to be written on L2ARC unnecessarily. l2arc_write_size() is called right before l2arc_evict() and performs those adjustments. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #14828
* Remove single parent assertion from zio_nowait().Alexander Motin2023-05-091-1/+1
| | | | | | | | | | We only need to know if ZIO has any parent there. We do not care if it has more than one, but use of zio_unique_parent() == NULL asserts that. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14823
* Enable the head_errlog feature to remove errorsGeorge Amanakis2023-05-092-36/+141
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case check_filesystem() does not error out and does not report an error, remove that error block from error lists and logs without requiring a scrub. This can happen when the original file and all snapshots/clones referencing it have been removed. Otherwise zpool status will still report that "Permanent errors have been detected..." without actually reporting any of them. To implement this change the functions introduced in corrective receive were modified to take into account the head_errlog feature. Before this change: ============================= pool: test state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 /home/user/vdev_a ONLINE 0 0 2 errors: Permanent errors have been detected in the following files: ============================= After this change: ============================= pool: test state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 /home/user/vdev_a ONLINE 0 0 2 errors: No known data errors ============================= Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #14813
* Fixes in head_errlog feature with encryptionGeorge Amanakis2023-05-081-44/+32
| | | | | | | | | | For the head_errlog feature use dsl_dataset_hold_obj_flags() instead of dsl_dataset_hold_obj() in order to enable access to the encryption keys (if loaded). This enables reporting of errors in encrypted filesystems which are not mounted but have their keys loaded. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #14837
* Verify block pointers before writing them outMatthew Ahrens2023-05-085-30/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a block pointer is corrupted (but the block containing it checksums correctly, e.g. due to a bug that overwrites random memory), we can often detect it before the block is read, with the `zfs_blkptr_verify()` function, which is used in `arc_read()`, `zio_free()`, etc. However, such corruption is not typically recoverable. To recover from it we would need to detect the memory error before the block pointer is written to disk. This PR verifies BP's that are contained in indirect blocks and dnodes before they are written to disk, in `dbuf_write_ready()`. This way, we'll get a panic before the on-disk data is corrupted. This will help us to diagnose what's causing the corruption, as well as being much easier to recover from. To minimize performance impact, only checks that can be done without holding the spa_config_lock are performed. Additionally, when corruption is detected, the raw words of the block pointer are logged. (Note that `dprintf_bp()` is a no-op by default, but if enabled it is not safe to use with invalid block pointers.) Reviewed-by: Rich Ercolani <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Zuchowski <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #14817
* Fix two abd_gang_add_gang() issues.Alexander Motin2023-05-051-1/+13
| | | | | | | | | | | - There is no reason to assert that added gang is not empty. It may be weird to add an empty gang, but it is legal. - When moving chain list from the added gang clear its size, or it will trigger assertion in abd_verify() when that gang is freed. Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14816
* Plug memory leak in zfsdev_state.Pawel Jakub Dawidek2023-05-051-0/+2
| | | | | | | | | On kernel module unload, free all zfsdev state structures, except for zfsdev_state_listhead, which is statically allocated. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14824
* zpool import -m also removing spare and cache when log device is missingAmeer Hamza2023-05-031-0/+10
| | | | | | | | | | | | | spa_import() relies on a pool config fetched by spa_try_import() for spare/cache devices. Import flags are not passed to spa_tryimport(), which makes it return early due to a missing log device and missing retrieving the cache device and spare eventually. Passing ZFS_IMPORT_MISSING_LOG to spa_tryimport() makes it fetch the correct configuration regardless of the missing log device. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #14794
* Optimize check_filesystem() and process_error_log()George Amanakis2023-05-031-54/+84
| | | | | | | | | | | | | Integrate check_clones() into check_filesystem() and implement a list instead of iterating recursively over the clones, thus eliminating the risk of a stack overflow. Also use kmem_zalloc() to allocate large structures in process_error_log() reducing its stack size from ~700 to ~128 bytes. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #14744
* Use correct block pointer in block cloning case.Pawel Jakub Dawidek2023-05-021-2/+1
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14806
* blake3: fix up bogus checksums in face of cpu migrationMateusz Guzik2023-05-011-2/+5
| | | | | | | | | | | This is a temporary measure until a better fix is sorted out. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Rich Ercolani <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Sponsored by: Rubicon Communications, LLC ("Netgate") Closes #14785 Closes #14808
* Correct ABD size for split block ZIOsSerapheim Dimitropoulos2023-05-011-3/+4
| | | | | | | | | | | | | | | | | | | | Currently when layering the ABD buffer of each split block on top of an indirect vdev's ZIO ABD we don't specify the split block's ABD. This results in those ABDs being incorrectly sized by inheriting the size of their parent ABD which is larger than what each split block needs. The above behavior isn't causing any bugs currently but can lead to unexpected ABD sizes for people analyzing and/or working on the ZIO codepath. This patch fixes this behavior by properly setting the ABD size for split block ZIOs. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #14804
* powerpc64: Support ELFv2 asm on Big EndianJustin Hibbits2023-04-274-0/+61
| | | | | | | | | | | FreeBSD/powerpc64 is all ELFv2 since FreeBSD 13, even big endian. The existing sha256 and sha512 asm code assumes that BE is all ELFv1, and LE is ELFv2. Minor changes to add ELFv2 in the BE side gets this working correctly on FreeBSD with latest OpenZFS import. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Justin Hibbits <[email protected]> Closes #14779
* Mark TX_COMMIT transaction with TXG_NOTHROTTLE.Alexander Motin2023-04-271-1/+8
| | | | | | | | | | | TX_COMMIT has no on-disk representation and does not produce any more dirty data. It should not wait for anything, and even just skipping the checks if not waiting gives improvement noticeable in profiler. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14798
* Fix BLAKE3 aarch64 assembly for FreeBSD and macOSTino Reichardt2023-04-262-4511/+4057
| | | | | | | | | | The x18 register isn't useable within FreeBSD kernel space, so we have to fix the BLAKE3 aarch64 assembly for not using it. The source files are here: https://github.com/mcmilk/BLAKE3-tests Reviewed-by: Kyle Evans <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #14728
* Fix checkstyle warningBrian Behlendorf2023-04-261-1/+1
| | | | | | | | | Resolve a missed checkstyle warning. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Mateusz Guzik <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #14799
* Fix positive ABD size assertion in abd_verify().Alexander Motin2023-04-261-1/+2
| | | | | | | | | | | Gang ABDs without childred are legal, and they do have zero size. For other ABD types zero size doesn't have much sense and likely not working correctly now. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14795
* FreeBSD: fix up EINVAL from getdirentries on .zfsMateusz Guzik2023-04-261-0/+11
| | | | | | | | | | Without the change: /.zfs /.zfs/snapshot find: /.zfs: Invalid argument Signed-off-by: Mateusz Guzik <[email protected]> Closes #14774
* FreeBSD: add missing vn state transition for .zfsMateusz Guzik2023-04-261-0/+4
| | | | | Signed-off-by: Mateusz Guzik <[email protected]> Closes #14774
* Revert "Fix data race between zil_commit() and zil_suspend()"Brian Behlendorf2023-04-251-27/+0
| | | | | | | | | | | This reverts commit 4c856fb333ac57d9b4a6ddd44407fd022a702f00 to resolve a newly introduced deadlock which in practice in more disruptive that the issue this commit intended to address. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #14775 Closes #14790
* Add loongarch64 supportHan Gao2023-04-253-0/+86
| | | | | | | | | | | Add loongarch64 definitions & lua module setjmp asm LoongArch is a new RISC ISA, which is a bit like MIPS or RISC-V. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Han Gao <[email protected]> Signed-off-by: WANG Xuerui <[email protected]> Closes #13422
* FreeBSD: add missing vop_fplookup assignmentsMateusz Guzik2023-04-241-0/+9
| | | | | | | | | | It became illegal to not have them as of 5f6df177758b9dff88e4b6069aeb2359e8b0c493 ("vfs: validate that vop vectors provide all or none fplookup vops") upstream. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #14788
* FreeBSD: try to fallback early if can't do optimized copyMateusz Guzik2023-04-241-0/+8
| | | | | | | | | | Not complete, but already shaves on some locking. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Sponsored by: Rubicon Communications, LLC ("Netgate") Closes #14723
* FreeBSD: fix up EXDEV handling for clone_rangeMateusz Guzik2023-04-241-30/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | API contract requires VOPs to handle EXDEV internally, worst case by falling back to the generic copy routine. This broke with the recent changes. While here whack custom loop to lock 2 vnodes with vn_lock_pair, which provides the same functionality internally. write start/finish around it plays no role so got eliminated. One difference is that vn_lock_pair always takes an exclusive lock on both vnodes. I did not patch around it because current code takes an exclusive lock on the target vnode. zfs supports shared-locking for writes, so this serializes different calls to the routine as is, despite range locking inside. At the same time you may notice the source vnode can get some traffic if only shared-locked, thus once more this goes the safer route of exclusive-locking. Note this should be patched to use shared-locking for both once the feature is considered stable. Technically the switch to vn_lock_pair should be a separate change, but it would only introduce churn immediately whacked by the rest of the patch. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Sponsored by: Rubicon Communications, LLC ("Netgate") Closes #14723
* FreeBSD: make zfs_vfs_held() definition consistent with declarationDimitry Andric2023-04-211-1/+1
| | | | | | | | | | | Noticed while attempting to change FreeBSD's boolean_t into an actual bool: in include/sys/zfs_ioctl_impl.h, zfs_vfs_held() is declared to return a boolean_t, but in module/os/freebsd/zfs/zfs_ioctl_os.c it is defined to return an int. Make the definition match the declaration. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Dimitry Andric <[email protected]> Closes #14776
* Add support for zpool user propertiesAllan Jude2023-04-211-41/+87
| | | | | | | | | | | | | | | | Usage: zpool set org.freebsd:comment="this is my pool" poolname Tests are based on zfs_set's user property tests. Also stop truncating property values at MAXNAMELEN, use ZFS_MAXPROPLEN. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Mateusz Piotrowski <[email protected]> Sponsored-by: Beckhoff Automation GmbH & Co. KG. Sponsored-by: Klara Inc. Closes #11680
* Linux: zfs_zaccess_trivial() should always call generic_permission()Richard Yao2023-04-201-2/+1
| | | | | | | | | | | | | | | | Building with Clang on Linux generates a warning that err could be uninitialized if mnt_ns is a NULL pointer. However, mnt_ns should never be NULL, so there is no need to put this behind an if statement. Taking it outside of the if statement means that the possibility of err being uninitialized goes from being always zero in a way that the compiler could not realize to a way that is always zero in a way that the compiler can realize. Sponsored-By: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Youzhong Yang <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14738
* Create zap for root vdevrob-wing2023-04-204-4/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And add it to the AVZ, this is not backwards compatible with older pools due to an assertion in spa_sync() that verifies the number of ZAPs of all vdevs matches the number of ZAPs in the AVZ. Granted, the assertion only applies to #DEBUG builds - still, a feature flag is introduced to avoid the assertion, com.klarasystems:vdev_zaps_v2 Notably, this allows to get/set properties on the root vdev: % zpool set user:prop=value <pool> root-0 Before this commit, it was already possible to get/set properties on top-level vdevs with the syntax <type>-<vdev_id> (e.g. mirror-0): % zpool set user:prop=value <pool> mirror-0 This syntax also applies to the root vdev as it is is of type 'root' with a vdev_id of 0, root-0. The keyword 'root' as an alias for 'root-0'. The following tests have been added: - zpool get all properties from root vdev - zpool set a property on root vdev - verify root vdev ZAP is created Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Wing <[email protected]> Sponsored-by: Seagate Technology Submitted-by: Klara, Inc. Closes #14405
* Allow MMP to bypass waiting for other threadsHerb Wartens2023-04-192-4/+27
| | | | | | | | | | | | | | | At our site we have seen cases when multi-modifier protection is enabled (multihost=on) on our pool and the pool gets suspended due to a single disk that is failing and responding very slowly. Our pools have 90 disks in them and we expect disks to fail. The current version of MMP requires that we wait for other writers before moving on. When a disk is responding very slowly, we observed that waiting here was bad enough to cause the pool to suspend. This change allows the MMP thread to bypass waiting for other threads and reduces the chances the pool gets suspended. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Herb Wartens <[email protected]> Closes #14659
* Fix "Detach spare vdev in case if resilvering does not happen"Ameer Hamza2023-04-192-3/+14
| | | | | | | | | | | | | | | Spare vdev should detach from the pool when a disk is reinserted. However, spare detachment depends on the completion of resilvering, and if resilver does not schedule, the spare vdev keeps attached to the pool until the next resilvering. When a zfs pool contains several disks (25+ mirror), resilvering does not always happen when a disk is reinserted. In this patch, spare vdev is manually detached from the pool when resilvering does not occur and it has been tested on both Linux and FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #14722
* Revert "ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()"Tony Hutter2023-04-181-16/+5
| | | | | | | | | | | | This reverts commit 4b3133e671b958fa2c915a4faf57812820124a7b. Users identified this commit as a possible source of data corruption: https://github.com/openzfs/zfs/issues/14753 Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Issue #14753 Closes #14761
* Fix VERIFY(!zil_replaying(zilog, tx)) panicPawel Jakub Dawidek2023-04-171-3/+1
| | | | | | | | | | | | | | The zfs_log_clone_range() function is never called from the zfs_clone_range_replay() function, so I assumed it is safe to assert that zil_replaying() is never TRUE here. It turns out zil_replaying() also returns TRUE when the sync property is set to disabled. Fix the problem by just returning if zil_replaying() returns TRUE. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reported by: Florian Smeets Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #14758
* Fix data corruption when cloning embedded blocksPawel Jakub Dawidek2023-04-121-2/+4
| | | | | | | | | | Don't overwrite blk_phys_birth, as for embedded blocks it is part of the payload. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Issue #13392 Closes #14739
* Fix in check_filesystem()George Amanakis2023-04-121-5/+5
| | | | | | | | | Fix the code in case of missing snapshots. Previously the check was in a conditional that would be executed if the filesystem had snapshots. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #14735
* Trim needless zeroes from checksum eventsAlan Somers2023-04-101-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | The ereport.fs.zfs.checksum event contains histograms of the bits that were wrongly set or cleared according to their bit position in a 64-bit word. So the maximum value that any histogram bucket could have would be 64. But ZFS currently uses a uint32_t to hold each bucket. As a result, the event report is full of needless zeroes. Change the bucket size to uint8_t, stripping 768 needless zeros from each event. Original event format: ``` class=ereport.fs.zfs.checksum ena=639460469834258433 pool=testpool.1933 pool_guid=4979719877084416563 pool_state=0 pool_context=0 pool_failmode=wait vdev_guid=4136721804819128578 vdev_type=file vdev_path=/tmp/kyua.1TxP3A/2/work/file1.1933 vdev_ashift=9 vdev_complete_ts=609837019678 vdev_delta_ts=33450 vdev_read_errors=0 vdev_write_errors=0 vdev_cksum_errors=20 vdev_delays=0 parent_guid=2751977006639883417 parent_type=raidz vdev_spare_guids= zio_err=0 zio_flags=1048752 zio_stage=4194304 zio_pipeline=65011712 zio_delay=0 zio_timestamp=0 zio_delta=0 zio_priority=4 zio_offset=702976 zio_size=1024 zio_objset=24 zio_object=0 zio_level=3 zio_blkid=0 bad_ranges=0000000000000400 bad_ranges_min_gap=8 bad_range_sets=0000079e bad_range_clears=00000854 bad_set_histogram=000000210000001a000000150000001d000000240000001b000000220000001b000000210000002100000018000000260000002300000025000000210000001e000000250000001b0000001d0000001e0000001600000025000000180000001b000000240000001b000000240000001b0000001c000000210000001b0000001e000000210000001a0000001e000000220000001d0000001b000000200000001f0000001a000000250000001f0000001d0000001b0000001d000000240000001d0000001b0000001b0000001f00000024000000190000001a0000001f0000001e000000240000001e0000002400000021000000200000001d0000001d00000021 bad_cleared_histogram=000000220000002700000021000000210000001b0000001a000000250000001f0000001c0000001e0000002400000022000000220000002400000022000000240000002200000021000000220000001b0000002100000021000000190000001b000000240000002400000020000000290000002a00000028000000250000002400000020000000270000002500000016000000270000001c000000210000001f000000240000001c0000002100000022000000240000002100000023000000210000002700000022000000240000001b00000022000000210000001c00000023000000150000002600000020000000270000001e0000001d0000002400000026 time=00000016806457270000000323406839 eid=458 ``` New format: ``` class=ereport.fs.zfs.checksum ena=96599319807790081 pool=testpool.1933 pool_guid=1236902063710799041 pool_state=0 pool_context=0 pool_failmode=wait vdev_guid=2774253874431514999 vdev_type=file vdev_path=/tmp/kyua.6Temlq/2/work/file1.1933 vdev_ashift=9 vdev_complete_ts=92124283803 vdev_delta_ts=46670 vdev_read_errors=0 vdev_write_errors=0 vdev_cksum_errors=20 vdev_delays=0 parent_guid=8090931855087882905 parent_type=raidz vdev_spare_guids= zio_err=0 zio_flags=1048752 zio_stage=4194304 zio_pipeline=65011712 zio_delay=0 zio_timestamp=0 zio_delta=0 zio_priority=4 zio_offset=1028608 zio_size=512 zio_objset=0 zio_object=0 zio_level=0 zio_blkid=4 bad_ranges=0000000000000200 bad_ranges_min_gap=8 bad_range_sets=0000061f bad_range_clears=000001f4 bad_set_histogram=1719161c1c1c101618171a151a1a19161e1c171d1816161c191f1a18192117191c131d171b1613151a171419161a1b1319101b14171b18151e191a1b141a1c17 bad_cleared_histogram=06090a0808070a0b020609060506090a01090a050a0a0509070609080d050d0607080d060507080c04070807070a0608020c080c080908040808090a05090a07 time=00000016806477050000000604157480 eid=62 ``` Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Alan Somers <[email protected]> Sponsored-by: Axcient Closes #14716
* Linux 6.3 compat: idmapped mount API changesyouzhongyang2023-04-1015-114/+192
| | | | | | | | | Linux kernel 6.3 changed a bunch of APIs to use the dedicated idmap type for mounts (struct mnt_idmap), we need to detect these changes and make zfs work with the new APIs. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Youzhong Yang <[email protected]> Closes #14682