aboutsummaryrefslogtreecommitdiffstats
path: root/module
Commit message (Collapse)AuthorAgeFilesLines
* Report dnodes with faulty bonuslenGeorge Amanakis2022-02-163-0/+17
| | | | | | | | | | | | | | | | | | | In files created/modified before 4254acb there may be a corruption of xattrs which is not reported during scrub and normal send/receive. It manifests only as an error when raw sending/receiving. This happens because currently only the raw receive path checks for discrepancies between the dnode bonus length and the spill pointer flag. In case we encounter a dnode whose bonus length is greater than the predicted one, we should report an error. Modify in this regard dnode_sync() with an assertion at the end, dump_dnode() to error out, dsl_scan_recurse() to report errors during a scrub, and zstream to report a warning when dumping. Also added a test to verify spill blocks are sent correctly in a raw send. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #12720 Closes #13014
* libzfs: add keylocation=https://, backed by fetch(3) or libcurlнаб2022-02-161-1/+5
| | | | | | | | | | | | Add support for http and https to the keylocation properly to allow encryption keys to be fetched from the specified URL. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Issue #9543 Closes #9947 Closes #11956
* Workaround Debian's fake System.map behaviorRich Ercolani2022-02-101-0/+8
| | | | | | | | | | | | | Debian ships fake System.map files by default, leading to the invocation of depmod with them to flood you with errors about missing symbols. Let's notice and not do that. Reviewed-by: Ahelenia Ziemiańska <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12862
* Proper support for DESTDIR and INSTALL_MOD_PATHJosé Luis Salvador Rufo2022-02-101-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The environment variables DESTDIR and INSTALL_MOD_PATH must be mutually exclusive. https://www.gnu.org/prep/standards/html_node/DESTDIR.html https://www.kernel.org/doc/Documentation/kbuild/modules.txt This issue was discussed in this Buildroot thread: https://lists.buildroot.org/pipermail/buildroot/2021-August/621350.html I saw this behavior in other different projects, as: - Yocto Project: https://www.yoctoproject.org/pipermail/meta-freescale/2013-August/004307.html - Google IA Coral: https://coral.googlesource.com/linux-imx-debian/+/refs/heads/master/debian/rules For the above reasons, INSTALL_MOD_PATH will be set as DESTDIR by default. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: José Luis Salvador Rufo <[email protected]> Signed-off-by: Romain Naour <[email protected]> Closes #12577
* Upstream: Add snapshot and zvol eventsJorgen Lundman2022-02-101-0/+52
| | | | | | | | | | | | | | | For kernel to send snapshot mount/unmount events to zed. For kernel to send symlink creates/removes on zvol plumbing. (/dev/run/dsk/zvol/$pool/$zvol -> /dev/diskX) If zed misses the ENODEV, all errors after are EINVAL. Treat any error as kernel module failure. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #12416
* Introduce a flag to skip comparing the local mac when raw sendingGeorge Amanakis2022-02-044-12/+40
| | | | | | | | | | | | | | | | | | | | | | | | | Raw receiving a snapshot back to the originating dataset is currently impossible because of user accounting being present in the originating dataset. One solution would be resetting user accounting when raw receiving on the receiving dataset. However, to recalculate it we would have to dirty all dnodes, which may not be preferable on big datasets. Instead, we rely on the os_phys flag OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is incomplete when raw receiving. Thus, on the next mount of the receiving dataset the local mac protecting user accounting is zeroed out. The flag is then cleared when user accounting of the raw received snapshot is calculated. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #12981 Closes #10523 Closes #11221 Closes #11294 Closes #12594 Issue #11300
* Linux <4.8 compat: submit_bio() rw argFinix19792022-02-041-1/+1
| | | | | | | | | | When using the two argument version of submit_bio() in kernel's prior to 4.8 the first argument should be specified. It's used by block dump to report the bio direction. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Finix Yan <[email protected]> Closes #13006
* Linux 5.17 compat: PDE_DATA() renamed to pde_data()наб2022-02-042-2/+2
| | | | | | | | | | | | | Upstream commit 359745d78351c6f5442435f81549f0207ece28aa ("proc: remove PDE_DATA() completely") Link: https://lore.kernel.org/all/[email protected]/T/#u Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #13004 Closes #12989
* Linux 5.17 compat: dequeue_signal() takes a 4th argumentнаб2022-02-041-0/+5
| | | | | | | | | | | | | Linux 5.17's dequeue_signal() takes an additional enum pid_type * output argument Upstream commit 5768d8906bc23d512b1a736c1e198aa833a6daa4 ("signal: Requeue signals in the appropriate queue") Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #12989
* Linux 5.17 compat: detect complete_and_exit() renameнаб2022-02-041-1/+1
| | | | | | | | | | | | | Linux 5.17 sees a rename from complete_and_exit() to kthread complete_and_exit() Upstream commit cead18552660702a4a46f58e65188fe5f36e9dfe ("exit: Rename complete_and_exit to kthread_complete_and_exit") Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #12989
* Add support for FALLOC_FL_ZERO_RANGERich Ercolani2022-02-041-2/+7
| | | | | | | | | | For us, I think it's always just FALLOC_FL_PUNCH_HOLE with a fake mustache on. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Coleman Kane <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12975
* Linux 5.16 compat: Added add_disk check for returnRich Ercolani2022-02-041-0/+4
| | | | | | | | | add_disk went from void to must-check int return. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Coleman Kane <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12975
* Fix handling of errors from dmu_write_uio_dbuf() on FreeBSDMark Johnston2022-02-031-4/+11
| | | | | | | | | | | | | | | | | | | | | FreeBSD's implementation of zfs_uio_fault_move() returns EFAULT when a page fault occurs while copying data in or out of user buffers. The VFS treats such errors specially and will retry the I/O operation (which may have made some partial progress). When the FreeBSD and Linux implementations of zfs_write() were merged, the handling of errors from dmu_write_uio_dbuf() changed such that EFAULT is not handled as a partial write. For example, when appending to a file, the z_size field of the znode is not updated after a partial write resulting in EFAULT. Restore the old handling of errors from dmu_write_uio_dbuf() to fix this. This should have no impact on Linux, which has special handling for EFAULT already. Reviewed-by: Andriy Gapon <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes #12964
* Avoid memory allocations in the ARC eviction threadMark Johnston2022-02-032-53/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | When the eviction thread goes to shrink an ARC state, it allocates a set of marker buffers used to hold its place in the state's sublists. This can be problematic in low memory conditions, since 1) the allocation can be substantial, as we allocate NCPU markers; 2) on at least FreeBSD, page reclamation can block in arc_wait_for_eviction() In particular, in stress tests it's possible to hit a deadlock on FreeBSD when the number of free pages is very low, wherein the system is waiting for the page daemon to reclaim memory, the page daemon is waiting for the ARC eviction thread to finish, and the ARC eviction thread is blocked waiting for more memory. Try to reduce the likelihood of such deadlocks by pre-allocating markers for the eviction thread at ARC initialization time. When evicting buffers from an ARC state, check to see if the current thread is the ARC eviction thread, and use the pre-allocated markers for that purpose rather than dynamically allocating them. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes #12985
* FreeBSD: Fix zvol_cdev_open lockingRyan Moeller2022-02-031-2/+2
| | | | | | | | | | | | First open locking changes were correctly applied to zvol_geom_open but incorrectly applied to zvol_cdev_open, causing spa_namespace_lock to be held indefinitely. Make the first open locking in zvol_cdev_open match zvol_geom_open. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #13016
* FreeBSD: Fix zvol_*_open() lockingRyan Moeller2022-02-031-31/+59
| | | | | | | | | | | | | | | | These are the changes for FreeBSD corresponding to the changes made for Linux in #12863, see that PR for details. Changes from #12863 are applied for zvol_geom_open and zvol_cdev_open on FreeBSD. This also adds a check for the zvol dying which we had in zvol_geom_open but was missing in zvol_cdev_open. The check causes the open to fail early with ENXIO when we are in the middle of changing volmode. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12934
* FreeBSD: vfsops: use setgen for error caseнаб2022-02-031-1/+7
| | | | | | | | | Fix from https://github.com/openzfs/zfs/pull/12844#discussion_r774179413 Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #12905
* zfs_prune: reset sc.nr_to_scanchrisrd2022-02-031-0/+5
| | | | | | | | | | | | | | | | sc.nr_to_scan is an input to super_cache_clean (via shrinker->scan_objects), used to set the number of objects to scan in the various caches. However super_cache_scan also modifies sc.nr_to_scan, so when used in a loop we need to reset sc.nr_to_scan back to our desired nr_to_scan for the next iteration. Issue discovered and solution suggested by Tenzin Lhakhang @tlhakhan. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlop <[email protected]> Issue #12433 Closes #12908
* Verify dRAID empty sectorsBrian Behlendorf2022-02-032-5/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | Verify that all empty sectors are zero filled before using them to calculate parity. Failure to do so can result in incorrect parity columns being generated and written to disk if the contents of an empty sector are non-zero. This was possible because the checksum only protects the data portions of the buffer, not the empty sector padding. This issue has been addressed by updating raidz_parity_verify() to check that all dRAID empty sectors are zero filled. Any sectors which are non-zero will be fixed, repair IO issued, and a checksum error logged. They can then be safely used to verify the parity. This specific type of damage is unlikely to occur since it requires a disk to have silently returned bad data, for an empty sector, while performing a scrub. However, if a pool were to have been damaged in this way, scrubbing the pool with this change applied will repair both the empty sector and parity columns as long as the data checksum is valid. Checksum errors will be reported in the `zpool status` output for any repairs which are made. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12857
* FreeBSD: fix unpropagated errorнаб2022-02-031-0/+1
| | | | | | | | | | | When performing I/O on FreeBSD using a file based vdev ensure all errors encountered when reading/writing are propagated through the zio pipeline. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #12904
* Fix zvol_open() lock inversionBrian Behlendorf2022-02-031-63/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When restructuring the zvol_open() logic for the Linux 5.13 kernel a lock inversion was accidentally introduced. In the updated code the spa_namespace_lock is now taken before the zv_suspend_lock allowing the following scenario to occur: down_read <=== waiting for zv_suspend_lock zvol_open <=== holds spa_namespace_lock __blkdev_get blkdev_get_by_dev blkdev_open ... mutex_lock <== waiting for spa_namespace_lock spa_open_common spa_open dsl_pool_hold dmu_objset_hold_flags dmu_objset_hold dsl_prop_get dsl_prop_get_integer zvol_create_minor dmu_recv_end zfs_ioc_recv_impl <=== holds zv_suspend_lock via zvol_suspend() zfs_ioc_recv ... This commit resolves the issue by moving the acquisition of the spa_namespace_lock back to after the zv_suspend_lock which restores the original ordering. Additionally, as part of this change the error exit paths were simplified where possible. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Rich Ercolani <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12863
* FreeBSD: Update argument types for VOP_READDIRAlan Somers2022-02-031-4/+10
| | | | | | | | | | | | | A recent commit to FreeBSD changed the type of vop_readdir_args.a_cookies to a uint64_t**. There is no functional impact to ZFS because ZFS only uses 32-bit cookies, which will be zero-extended to 64-bits by the existing code. https://github.com/freebsd/freebsd-src/commit/b214fcceacad6b842545150664bd2695c1c2b34f Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes #12874
* Reduce number of arc_prune threadsAlexander Motin2022-02-031-3/+10
| | | | | | | | | | | | | | | | On FreeBSD vnode reclamation is single-threaded, protected by single global lock. Linux seems to be able to use a thread per mount point, but at this time it creates more harm than good. Reduce number of threads to 1, adding tunable in case somebody wants to try more. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Ahelenia Ziemiańska <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #12896 Issue #9966
* FreeBSD: Provide correct file generation numberRyan Moeller2022-02-032-2/+2
| | | | | | | | | | | | va_seq was actually a thin veil over va_gen, so z_gen is a more appropriate value than z_seq to populate the field with. Drop the unnecessary compat obfuscation and provide the correct file generation number. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12851
* FreeBSD: Add vop_standard_writecount_nomsyncRyan Moeller2021-12-132-0/+18
| | | | | | | | | | https://cgit.freebsd.org/src/commit?id=3ffcfa599e29686cf2b3c1a6087408c37acaed78 Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12828
* FreeBSD: Catch up with more VFS changesRyan Moeller2021-12-131-0/+18
| | | | | | | | | | | | Unused thread argument was removed from NDINIT* https://cgit.freebsd.org/src/commit?id=7e1d3eefd410ca0fbae5a217422821244c3eeee4 Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12828
* Fix several bugs in the FreeBSD rename VOP implementationMark Johnston2021-12-131-136/+134
| | | | | | | | | | | | | | | | | | | - To avoid a use-after-free, zfsvfs->z_log needs to be loaded after the teardown lock is acquired with ZFS_ENTER(). - Avoid leaking vnode locks in zfs_rename_relock() and zfs_rename_() when the ZFS_ENTER() macros forces an early return. Refactor the rename implementation so that ZFS_ENTER() can be used safely. As a bonus, this lets us use the ZFS_VERIFY_ZP() macro instead of open-coding its implementation. Reported-by: Peter Holm <[email protected]> Tested-by: Peter Holm <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Sponsored-by: The FreeBSD Foundation Closes #12717
* Remove (now unused) td argument from zfs_lookup()Pawel Jakub Dawidek2021-12-131-11/+10
| | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #12748
* Exit the teardown section later in rename on FreeBSDMark Johnston2021-12-131-3/+4
| | | | | | | | | | | | We have to hold the teardown lock while dereferencing zfsvfs->z_os and, I believe, when committing to the ZIL. Note that jumping to the "out" label, "error" is always non-zero. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes #12704
* Fix potential use-after-frees in FreeBSD getpages and setattr VOPsMark Johnston2021-12-131-4/+4
| | | | | | | | | | | The objset object is reallocated during certain dataset operations, such as rollbacks, so the objset pointer must be loaded after acquiring the teardown lock. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes #12704
* Update `checkstyle` workflow env to ubuntu-20.04Damian Szuberski2021-12-085-3/+13
| | | | | | | | | | - `checkstyle` workflow uses ubuntu-20.04 environment - improved `mancheck.sh` readability Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: szubersk <[email protected]> Closes #12713
* ZFS send/recv with ashift 9->12 leads to data corruptionPaul Dagnelie2021-12-072-7/+17
| | | | | | | | | | Improve the ability of zfs send to determine if a block is compressed or not by using information contained in the blkptr. Reviewed-by: Rich Ercolani <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #12770
* Linux 5.16: Resolve ZSTD_isError symbol collision in Linux kernelColeman Kane2021-12-072-2/+1
| | | | | | | | | | | | | Newer zstd code introduced in the main kernel tree now creates a symbol collision with ZSTD_isError in our ZSTD code. This change relabels our implementation with a ZFS-specific symbol name, and undoes some macro-based micro-optimizations that conflict with the attempt to rename our internal-use version. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #12819
* Linux 5.16: The blk-cgroup.h header is where struct blkcg_gq is definedColeman Kane2021-12-071-0/+3
| | | | | | | | | | | | | The definition of struct blkcg_gq was moved into blk-cgroup.h, which is a header that's been in Linux since 2015. This is used by vdev_blkg_tryget() in module/os/linux/zfs/vdev_disk.c. Since the kernel for CentOS 7 and similar-generation releases doesn't have this header, its inclusion is guarded by a configure test. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #12819
* Linux 5.16: bio_set_dev is no longer a helper macroColeman Kane2021-12-071-0/+24
| | | | | | | | | | | | | | | | | | This change adds a confiugre check to determine if bio_set_dev is a helper macro or not. If not, then the attempt to override its internal call to bio_associate_blkg(), with a macro definition to our own version, is no longer possible, as the compiler won't use it when compiling the new inline function replacement implemented in the header. This change also creates a new vdev_bio_set_dev() function that performs the same work, and also performs the work implemented in vdev_bio_associate_blkg(), as it is the only thing calling that function in our code. Our custom vdev_bio_associate_blkg() is now only compiled if the bio_set_dev() is a macro in the Linux headers. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #12819
* Linux 5.16: type member of iov_iter renamed iter_typeColeman Kane2021-12-071-0/+6
| | | | | | | | | | | | | The iov_iter->type member was renamed iov_iter->iter_type. However, while looking into this, realized that in 2018 a iov_iter_type(*iov) accessor function was introduced. So if that is present, use it, otherwise fall back to trying the existing behavior of directly accessing type from iov_iter. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #12819
* Linux 5.16: block_device_operations->submit_bio now returns voidColeman Kane2021-12-071-2/+8
| | | | | | | | | | The return type for the submit_bio member of struct block_device_operations was changed to no longer return a value. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #12819
* Linux 5.13 compat: retry zvol_open() when contendedBrian Behlendorf2021-12-063-37/+68
| | | | | | | | | | | | | | | | | | | | | | | Due to a possible lock inversion the zvol open call path on Linux needs to be able to retry in the case where the spa_namespace_lock cannot be acquired. For Linux 5.12 an older kernel this was accomplished by returning -ERESTARTSYS from zvol_open() to request that blkdev_get() drop the bdev->bd_mutex lock, reaquire it, then call the open callback again. However, as of the 5.13 kernel this behavior was removed. Therefore, for 5.12 and older kernels we preserved the existing retry logic, but for 5.13 and newer kernels we retry internally in zvol_open(). This should always succeed except in the case where a pool's vdev are layed on zvols, in which case it may fail. To handle this case vdev_disk_open() has been updated to retry when opening a device when -ERESTARTSYS is returned. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #12301 Closes #12759
* Linux 5.16: wait_on_page_bit() no longer available to modulesColeman Kane2021-12-061-0/+4
| | | | | | | | | | | | | Instead, linux/pagemap.h offers a number of folio-specific functions to be called instead. In this case, module/os/linux/zfs/zfs_vnops_os.c wants to call wait_on_page_bit(pp, PG_writeback). This gets replaced with folio_wait_bit(folio_page(pp), PG_writeback). This change modifies the code to conditionally compile that if configure identifies th presence of the folio_wait_bit() function. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #12800
* Iterate encrypted clones at zvol_create_minorJorgen Lundman2021-12-061-0/+64
| | | | | | | | | | | | | | | Userland figures out which encryption-root keys are required to load, and issues ZFS_IOC_LOAD_KEY. The tail section of spa_keystore_load_wkey() will call zvol_create_minors() on the encryption-root object. Any clones of the encrypted zvol will not be plumbed. This commits adds additional logic to detect if zvol has clones, and is encrypted, then adds these to the list of zvols to call zvol_create_minors() on. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #12471
* Restore dirty dnode detection logicBrian Behlendorf2021-11-051-1/+1
| | | | | | | | | | | In addition to flushing memory mapped regions when checking holes, commit de198f2d95 modified the dirty dnode detection logic to check the dn->dn_dirty_records instead of the dn->dn_dirty_link. Relying on the dirty record has not be reliable, switch back to the previous method. Signed-off-by: Brian Behlendorf <[email protected]> Issue #11900 Closes #12745
* Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistencyBrian Behlendorf2021-11-053-28/+54
| | | | | | | | | | | | | | | | | | | | | | | When using lseek(2) to report data/holes memory mapped regions of the file were ignored. This could result in incorrect results. To handle this zfs_holey_common() was updated to asynchronously writeback any dirty mmap(2) regions prior to reporting holes. Additionally, while not strictly required, the dn_struct_rwlock is now held over the dirty check to prevent the dnode structure from changing. This ensures that a clean dnode can't be dirtied before the data/hole is located. The range lock is now also taken to ensure the call cannot race with zfs_write(). Furthermore, the code was refactored to provide a dnode_is_dirty() helper function which checks the dnode for any dirty records to determine its dirtiness. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Rich Ercolani <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #11900 Closes #12724
* Linux 5.16 compat: submit_bio()Brian Behlendorf2021-11-051-2/+2
| | | | | | | | | | | | The submit_bio() prototype has changed again. The version is 5.16 still only expects a single argument but the return type has changed to void. Since we never used the returned value before update the configure check to detect both single arg versions. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12725
* Rescan enclosure sysfs path on importTony Hutter2021-11-021-0/+24
| | | | | | | | | | | | | | | When you create a pool, zfs writes vd->vdev_enc_sysfs_path with the enclosure sysfs path to the fault LEDs, like: vdev_enc_sysfs_path = /sys/class/enclosure/0:0:1:0/SLOT8 However, this enclosure path doesn't get updated on successive imports even if enclosure path to the disk changes. This patch fixes the issue. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #11950 Closes #12095
* FreeBSD: Catch up with recent VFS changesRyan Moeller2021-11-022-1/+7
| | | | | | | | | | | | cn_thread is always curthread. https://cgit.freebsd.org/src/commit?id=b4a58fbf640409a1e507d9f7b411c83a3f83a2f3 https://cgit.freebsd.org/src/commit?id=2b68eb8e1dbbdaf6a0df1c83b26f5403ca52d4c3 Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Alan Somers <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12668
* Use fallthrough macroBrian Behlendorf2021-11-0218-27/+29
| | | | | | | | | | | | | | | | As of the Linux 5.9 kernel a fallthrough macro has been added which should be used to anotate all intentional fallthrough paths. Once all of the kernel code paths have been updated to use fallthrough the -Wimplicit-fallthrough option will because the default. To avoid warnings in the OpenZFS code base when this happens apply the fallthrough macro. Additional reading: https://lwn.net/Articles/794944/ Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12441
* Fixed data integrity issue when underlying disk returns errorArun KV2021-09-141-3/+31
| | | | | | | | | | | | | | | | | | | Errors in zil_lwb_write_done() are not propagated to zil_lwb_flush_vdevs_done() which can result in zil_commit_impl() not returning an error to applications even when zfs was not able to write data to the disk. Remove the ZIO_FLAG_DONT_PROPAGATE flag from zio_rewrite() to allow errors to propagate and consolidate the error handling for flush and write errors to a single location (rather than having error handling split between the "write done" and "flush done" handlers). Reviewed-by: George Wilson <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: Arun KV <[email protected]> Closes #12391 Closes #12443
* Verify embedded blkptr's in arc_read()Brian Behlendorf2021-09-142-7/+14
| | | | | | | | | | | | | | | | | The block pointer verification check in arc_read() should also cover embedded block pointers. While highly unlikely, accessing a damaged block pointer can result in panic. To further harden the code extend the existing check to include embedded block pointers and add a comment explaining the rational for this sanity check. Lastly, correct a flaw in zfs_blkptr_verify() so the error count is checked even when checking a untrusted config to verify the non-pool-specific portions of a block pointer. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12535
* Linux 5.15 compat: get_acl()Brian Behlendorf2021-09-141-8/+26
| | | | | | | | | | | | | | | Kernel commits 332f606b32b6 ovl: enable RCU'd ->get_acl() 0cad6246621b vfs: add rcu argument to ->get_acl() callback Added compatibility code to detect the new ->get_acl() interface and correctly handle the case where the new rcu argument is set. Reviewed-by: Coleman Kane <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12548
* Allow sending corrupt snapshots even if metadata is corruptedAllan Jude2021-09-141-0/+2
| | | | | | | | | | | | | When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag, so traverse_visitbp() will not fail with ECKSUM if a blockpointer cannot be read, but rather will continue and send the objects it can. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Allan Jude <[email protected]> Sponsored-By: Klara Inc. Sponsored-By: WHC Online Solutions Inc. Closes #12541