aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Handle partial reads in zfs_readRich Ercolani2021-09-201-0/+8
| | | | | | | | | | | | | | | | | | | Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one loop, get EFAULT back from zfs_uiomove() (because the iovec only holds 64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN gets interpreted as "I didn't read anything", the caller tries again without consuming the 64k we already read, and we're stuck. This apparently works on newer kernels because the caller which breaks on older Linux kernels by happily passing along a 1M read request and a 64k iovec just requests 64k at a time. With this, we now won't return EFAULT if we got a partial read. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12370 Closes #12509 Closes #12516
* Upstream: unmount snapshots before destroying them on macOSJorgen Lundman2021-09-205-1/+23
| | | | | | | | | | | | | Add function zfs_destroy_snaps_nvl_os() call. The main issue is that macOS needs to unmount any mounted snapshots before they can be destroyed. Other platforms can handle this in the kernel, but sending a storm of zed events to unmount seems undesirable when we can do it in userland to start with. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Co-authored-by: ilovezfs <[email protected]> Closes #12550
* Added test for being able to read various variants of zstdRich Ercolani2021-09-204-2/+59
| | | | | | | | | | | | | As detailed in #12022 and #12008, it turns out the current zstd implementation is quite nonportable, and results in various configurations of ondisk header that only each platform can read. So I've added a test which contains a dataset with a file written by Linux/x86_64 and one written by FBSD/ppc64. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12030
* Really zero the zero pageAlexander Motin2021-09-171-0/+1
| | | | | | | | | | | | | | While switching abd_zero_buf allocation KPI I've missed the fact that kmem_zalloc() zeroed the allocation, while kmem_cache_alloc() does not. Add explicit bzero() after it. I don't think it should have caused real problems, but leaking one memory page content all over the pool is not good. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #12569
* Avoid panic in case of pool errors and missing L2ARCGeorge Amanakis2021-09-161-1/+6
| | | | | | | | | | | | | | | In case an ARC buffer is allocated only on L2ARC, and there are underlying errors in a pool with the cache device in faulty state, a panic can occur in arc_read_done()->arc_hdr_destroy()-> arc_hdr_l2arc_destroy()->arc_hdr_clear_flags() when trying to free the ARC buffer. Fix this by discarding the buffer's identity in arc_hdr_destroy(), in case the buffer is not empty, before calling arc_hdr_l2hdr_destroy(). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #12392
* Linux 5.14 compat: METABrian Behlendorf2021-09-151-1/+1
| | | | | | | | | Increase the Linux-Maximum version in the META file to 5.14. All of the required compatibility patches have been merged and the 5.14 kernel has been officially released. Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12565
* Temporarily use root credentials to mount snapshots in .zfsAllan Jude2021-09-141-3/+7
| | | | | | | | | | | | | | | | | | When mounting a snapshot in the .zfs/snapshots control directory, temporarily assume roots credentials to perform the VFS_MOUNT(). This allows regular users and users inside jails to access these snapshots. The regular usermount code is not helpful here, since it requires that the user performing the mount own the mountpoint, which won't be the case for .zfs/snapshot/<snapname> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Allan Jude <[email protected]> Sponsored-By: Modirum MDPay Sponsored-By: Klara Inc. Closes #11312
* Use fallthrough macroBrian Behlendorf2021-09-1437-40/+88
| | | | | | | | | | | | | | | As of the Linux 5.9 kernel a fallthrough macro has been added which should be used to anotate all intentional fallthrough paths. Once all of the kernel code paths have been updated to use fallthrough the -Wimplicit-fallthrough option will because the default. To avoid warnings in the OpenZFS code base when this happens apply the fallthrough macro. Additional reading: https://lwn.net/Articles/794944/ Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12441
* Iterate encrypted clones at zvol_create_minorJorgen Lundman2021-09-131-0/+64
| | | | | | | | | | | | | | | Userland figures out which encryption-root keys are required to load, and issues ZFS_IOC_LOAD_KEY. The tail section of spa_keystore_load_wkey() will call zvol_create_minors() on the encryption-root object. Any clones of the encrypted zvol will not be plumbed. This commits adds additional logic to detect if zvol has clones, and is encrypted, then adds these to the list of zvols to call zvol_create_minors() on. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #12471
* Fixed data integrity issue when underlying disk returns errorArun KV2021-09-131-3/+31
| | | | | | | | | | | | | | | | | | Errors in zil_lwb_write_done() are not propagated to zil_lwb_flush_vdevs_done() which can result in zil_commit_impl() not returning an error to applications even when zfs was not able to write data to the disk. Remove the ZIO_FLAG_DONT_PROPAGATE flag from zio_rewrite() to allow errors to propagate and consolidate the error handling for flush and write errors to a single location (rather than having error handling split between the "write done" and "flush done" handlers). Reviewed-by: George Wilson <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: Arun KV <[email protected]> Closes #12391 Closes #12443
* ZTS: Waiting for zvols to be availableBrian Behlendorf2021-09-134-7/+8
| | | | | | | | | | | This is a follow up patch for PR #12515 which addresses some additional ZTS tests which are unreliable are should explicitly wait for the required zvols to be available. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: @Theo13111 Signed-off-by: Brian Behlendorf <[email protected]> Closes #12553
* Verify embedded blkptr's in arc_read()Brian Behlendorf2021-09-092-7/+14
| | | | | | | | | | | | | | | | The block pointer verification check in arc_read() should also cover embedded block pointers. While highly unlikely, accessing a damaged block pointer can result in panic. To further harden the code extend the existing check to include embedded block pointers and add a comment explaining the rational for this sanity check. Lastly, correct a flaw in zfs_blkptr_verify() so the error count is checked even when checking a untrusted config to verify the non-pool-specific portions of a block pointer. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12535
* Upstream: Add snapshot and zvol eventsJorgen Lundman2021-09-098-1/+70
| | | | | | | | | | | | | | For kernel to send snapshot mount/unmount events to zed. For kernel to send symlink creates/removes on zvol plumbing. (/dev/run/dsk/zvol/$pool/$zvol -> /dev/diskX) If zed misses the ENODEV, all errors after are EINVAL. Treat any error as kernel module failure. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #12416
* Linux 5.15 compat: get_acl()Brian Behlendorf2021-09-093-9/+52
| | | | | | | | | | | | | | Kernel commits 332f606b32b6 ovl: enable RCU'd ->get_acl() 0cad6246621b vfs: add rcu argument to ->get_acl() callback Added compatibility code to detect the new ->get_acl() interface and correctly handle the case where the new rcu argument is set. Reviewed-by: Coleman Kane <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12548
* Allow sending corrupt snapshots even if metadata is corruptedAllan Jude2021-09-091-0/+2
| | | | | | | | | | | | When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag, so traverse_visitbp() will not fail with ECKSUM if a blockpointer cannot be read, but rather will continue and send the objects it can. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Allan Jude <[email protected]> Sponsored-By: Klara Inc. Sponsored-By: WHC Online Solutions Inc. Closes #12541
* arc: Drop an incorrect assertRich Ercolani2021-09-081-1/+0
| | | | | | | | | | | | | Unfortunately, there was an overzealous assertion that was (in pretty specific circumstances) false, causing failure. This assertion was added in error, so we're removing it. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #9897 Closes #12020 Closes #12246
* Upstream: zdb inode mappingJorgen Lundman2021-09-081-0/+11
| | | | | | | | | | | Unfortunately macOS reserves inode ID numbers 0-15, and we can not used them. In macOS port we simply map them really high IDs. Normally this is hidden inside the _os implementation, but this is the one place in the common source files. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #12530
* Compressed receive with different ashift can result in incorrect PSIZE on diskPaul Dagnelie2021-09-081-0/+12
| | | | | | | | | | | | | | We round up the psize to the nearest multiple of the asize or to the lsize, whichever is smaller. Once that's done, we allocate a new buffer of the appropriate size, zero the tail, and copy the data into it. This adds a small performance cost to these kinds of writes, but fixes the bookkeeping problems. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Co-authored-by: Matthew Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #12522 Closes #8462
* Linux 5.15 compat: standalone <linux/stdarg.h>Alexander2021-09-083-0/+38
| | | | | | | | | | | | | | | | | | | | | | | Kernel commits 39f75da7bcc8 ("isystem: trim/fixup stdarg.h and other headers") c0891ac15f04 ("isystem: ship and use stdarg.h") 564f963eabd1 ("isystem: delete global -isystem compile option") (for now can be found in linux-next.git tree, will land into the Linus' tree during the ongoing 5.15 cycle with one of akpm merges) removed the -isystem flag and disallowed the inclusion of any compiler header files. They also introduced a minimal <linux/stdarg.h> as a replacement for <stdarg.h>. include/os/linux/spl/sys/cmn_err.h in the ZFS source tree includes <stdarg.h> unconditionally. Introduce a test for <linux/stdarg.h> and include it instead of the compiler's one to prevent module build breakage. Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Closes #12531
* Linux 5.15 compat: block device readaheadBrian Behlendorf2021-09-082-0/+43
| | | | | | | | | | | | | | | | | | | | | | The 5.15 kernel moved the backing_dev_info structure out of the request queue structure which causes a build failure. Rather than look in the new location for the BDI we instead detect this upstream refactoring by the existance of either the blk_queue_update_readahead() or disk_update_readahead() functions. In either case, there's no longer any reason to manually set the ra_pages value since it will be overridden with a reasonable default (2x the block size) when blk_queue_io_opt() is called. Therefore, we update the compatibility wrapper to do nothing for 5.9 and newer kernels. While it's tempting to do the same for older kernels we want to keep the compatibility code to preserve the existing behavior. Removing it would effectively increase the default readahead to 128k. Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12532
* Detect iSCSI in the zpool cmd vdev media scriptDon Brady2021-09-021-1/+8
| | | | | | | | Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #12206
* CI: don't install abigail-toolsGeorge Melikov2021-09-021-1/+1
| | | | | | | | | We use docker image instead. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #12529
* Update ABI files via new libabigail versionGeorge Melikov2021-09-025-10851/+8384
| | | | | | | Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #12529
* Libabigail: make .abi files more consistentGeorge Melikov2021-09-021-0/+2
| | | | | | | Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #12529
* CI: use fresh libabigail via docker imageGeorge Melikov2021-09-021-2/+2
| | | | | | | Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #12529
* Check for libabigail versionGeorge Melikov2021-09-021-2/+12
| | | | | | | | | | We need to use 1.8.0+ version, older versions may segfault and give inconsistent results. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #12529
* autoconf: allow Release to contain hyphenJorgen Lundman2021-09-021-2/+2
| | | | | | | | | | | | To avoid clashing with tags and releases, we'll use "zfs-macOS". Meta: 1 Name: zfs-macOS Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #12437
* ZTS: Remove exceptions for flaky zhack on FreeBSDRyan Moeller2021-09-011-7/+0
| | | | | | | | | Issue #11854 has been resolved, so we can remove the exceptions for it. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12527
* Add zpool_disable_datasets_os() / zfs_unmount_os()Jorgen Lundman2021-08-315-1/+48
| | | | | | | | | | | | | zpool_disable_datasets_os(): macOS needs to do a bunch of work to kick everything off zvols. zfs_unmount_os(): This allows us to unmount any zvols that may be mounted. Like with zfs destroy foo/vol Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #12436
* FreeBSD: Don't remove SA xattr if not SA znodeRyan Moeller2021-08-301-1/+1
| | | | | | | | | | | | We attempt to remove an existing SA xattr when setting a dir xattr, but this only makes sense if the znode has been upgraded to the SA format. Otherwise, we will hit an assert in zfs_sa_get_xattr. Make sure this is an SA znode before attempting to remove the SA xattr. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12514
* Fix cross-endian interoperability of zstdRich Ercolani2021-08-306-19/+165
| | | | | | | | | | | | | | | It turns out that layouts of union bitfields are a pain, and the current code results in an inconsistent layout between BE and LE systems, leading to zstd-active datasets on one erroring out on the other. Switch everyone over to the LE layout, and add compatibility code to read both. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12008 Closes #12022
* ZTS: Enable punch-hole tests on FreeBSDKa Ho Ng2021-08-305-8/+53
| | | | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Ka Ho Ng <[email protected]> Sponsored-by: The FreeBSD Foundation Closes #12458
* FreeBSD: Implement hole-punching supportKa Ho Ng2021-08-302-3/+64
| | | | | | | | | | | | This adds supports for hole-punching facilities in the FreeBSD kernel starting from __FreeBSD_version 1400032. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Ka Ho Ng <[email protected]> Sponsored-by: The FreeBSD Foundation Closes #12458
* ZTS: Waiting for zvols to be availableBrian Behlendorf2021-08-296-9/+9
| | | | | | | | | | | | | | | | The ZTS block_device_wait helper function should use -e when waiting for a file to appear since it will be either a block special device or a symlink. This didn't cause any failures but when a device path was specified the function would wait longer than needed. Additionally update the most flakey test cases to pass the file path to block_device_wait to try and improve the test reliability. The udev behavior on Fedora in particular can result in frequent false positives. Reviewed-by: George Melikov <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12515
* Correct checking bdev_check_media_change messageRyan Moeller2021-08-271-1/+1
| | | | | | | | We're not looking for bdev_disk_changed. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12492
* Make 'zpool labelclear -f' work on offlined disksTony Hutter2021-08-271-1/+35
| | | | | | | | | This patch allows you to clear the label on offlined disks in an active pool with `-f`. Previously, labelclear wouldn't let you do that. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #12511
* Extend zpool-iostat to account for ZIO_PRIORITY_REBUILD (#12319)Trevor Bautista2021-08-266-15/+62
| | | | | | | | | | | | | Previously, zpool-iostat did not display any data regarding rebuild I/Os in either the latency/size histograms (-w/-l/-r) or the queue data (-q). This fix essentially utilizes the existing infrastructure for tracking rebuild queue data and displays this data in the proper places within zpool-iostat's output. Signed-off-by: Trevor Bautista <[email protected]> Signed-off-by: Trevor Bautista <[email protected]> Co-authored-by: Trevor Bautista <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* vdev_id: Return an error if config file is not found (#12508)Anton Gubarkov2021-08-251-2/+3
| | | | | Signed-off-by: Anton Gubarkov <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
* zpool-remove.8: describe top-level vdev sector size limitationSam Hathaway2021-08-231-4/+4
| | | | | | | | | | Document that top-level vdevs cannot be removed unless all top-level vdevs have the same sector size. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Sam Hathaway <[email protected]> Closes #11339 Closes #12472
* Initialize parity blocks before RAID-Z reconstruction benchmarkingMark Johnston2021-08-231-0/+7
| | | | | | | | | | | | | benchmark_raidz() allocates a row to benchmark parity calculation and reconstruction. In the latter case, the parity blocks are left uninitialized, leading to reports from KMSAN. Initialize parity blocks to 0xAA as we do for the data earlier in the function. This does not affect the selected RAID-Z implementation on any of several systems tested. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes #12473
* ZTS: Add tests for creation timeRyan Moeller2021-08-1710-5/+187
| | | | | | | | | Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12432
* Linux 4.11 compat: statx supportRichard Yao2021-08-173-9/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linux 4.11 added a new statx system call that allows us to expose crtime as btime. We do this by caching crtime in the znode to match how atime, ctime and mtime are cached in the inode. statx also introduced a new way of reporting whether the immutable, append and nodump bits have been set. It adds support for reporting compression and encryption, but the semantics on other filesystems is not just to report compression/encryption, but to allow it to be turned on/off at the file level. We do not support that. We could implement semantics where we refuse to allow user modification of the bit, but we would need to do a dnode_hold() in zfs_znode_alloc() to find out encryption/compression information. That would introduce locking that will have a minor (although unmeasured) performance cost. It also would be inferior to zdb, which reports far more detailed information. We therefore omit reporting of encryption/compression through statx in favor of recommending that users interested in such information use zdb. Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #8507
* zfs.4: Fix typo s/compatiblity/compatibility/Gordon Bergling2021-08-171-1/+1
| | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Gordon Bergling <[email protected]> Closes #12464
* Remove b_pabd/b_rabd allocation from arc_hdr_alloc()Alexander Motin2021-08-174-51/+68
| | | | | | | | | | | | | | | | | | When a header is allocated for full overwrite it is a waste of time to allocate b_pabd/b_rabd for it, since arc_write() will free them without ever being touched. If it is a read or a partial overwrite then arc_read() and arc_hdr_decrypt() allocate them explicitly. Reduced memory allocation in user threads also reduces ARC eviction throttling there, proportionally increasing it in ZIO threads, that is not good. To minimize or even avoid it introduce ARC allocation reserve, allowing certain arc_get_data_abd() callers to allocate a bit longer in situations where user threads will already throttle. Reviewed-by: George Wilson <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12398
* Increase default volblocksize from 8KB to 16KBAlexander Motin2021-08-175-8/+8
| | | | | | | | | | | | | | Many things has changed since previous default was set many years ago. Nowadays 8KB does not allow adequate compression or even decent space efficiency on many of pools due to 4KB disk physical block rounding, especially on RAIDZ and DRAID. It effectively limits write throughput to only 2-3GB/s (250-350K blocks/s) due to sync thread, allocation, vdev queue and other block rate bottlenecks. It keeps L2ARC expensive despite many optimizations and dedup just unrealistic. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #12406
* Optimize arc_l2c_only lists assertionsAlexander Motin2021-08-171-9/+12
| | | | | | | | | | | | It is very expensive and not informative to call multilist_is_empty() for each arc_change_state() on debug builds to check for impossible. Instead implement special index function for arc_l2c_only->arcs_list, multilists, panicking on any attempt to use it. Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12421
* Fix/improve dbuf hits accountingAlexander Motin2021-08-173-28/+16
| | | | | | | | | | | | | | | Instead of clearing stats inside arc_buf_alloc_impl() do it inside arc_hdr_alloc() and arc_release(). It fixes statistics being wiped every time a new dbuf is filled from the ARC. Remove b_l1hdr.b_l2_hits. L2ARC hits are accounted at b_l2hdr.b_hits. Since the hits are accounted under hash lock, replace atomics with simple increments. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12422
* Avoid vq_lock drop in vdev_queue_aggregate()Alexander Motin2021-08-171-29/+34
| | | | | | | | | | | | | vq_lock is already too congested for two more operations per I/O. Instead of dropping and reacquiring it inside vdev_queue_aggregate() delegate the zio_vdev_io_bypass() and zio_execute() calls for parent I/Os to callers, that drop the lock any way to execute the new I/O. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12297
* Use more atomics in refcountsAlexander Motin2021-08-172-33/+26
| | | | | | | | | | | | | | Use atomic_load_64() for zfs_refcount_count() to prevent torn reads on 32-bit platforms. On 64-bit ones it should not change anything. When built with ZFS_DEBUG but running without tracking enabled use atomics instead of mutexes same as for builds without ZFS_DEBUG. Since rc_tracked can't change live we can check it without lock. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12420
* ZTS: Avoid unset $tmpdir in redacted_panicRyan Moeller2021-08-161-2/+8
| | | | | | | | | | | | | The redacted_send tests make use of a $tmpdir variable, except in redacted_send/redacted_panic the variable is never defined. Use $TEST_BASE_DIR instead. Clean up the stream file after the test. Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12455