aboutsummaryrefslogtreecommitdiffstats
path: root/module/zfs
Commit message (Collapse)AuthorAgeFilesLines
* module: zfs: fix unused, remove argsusedнаб2021-12-2349-170/+215
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #12844
* module: zfs: vdev: shim out vdev_indirect_mapping_verify()наб2021-12-231-0/+2
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #12844
* module: zfs: vdev: shim out vdev_indirect_births_verify()наб2021-12-231-0/+2
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #12844
* module: zfs: spa: shim out vdev_count_verify_zaps()наб2021-12-231-0/+2
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #12844
* module: zfs: multilist: shim out multilist_d2l()наб2021-12-231-0/+2
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #12844
* module: zfs: dsl: pool: shim out dsl_early_sync_task_verify()наб2021-12-231-0/+3
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #12844
* module: zfs: dnode: use debug-only in debug mode onlyнаб2021-12-231-0/+2
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #12844
* Reduce number of arc_prune threadsAlexander Motin2021-12-221-3/+10
| | | | | | | | | | | | | | | | On FreeBSD vnode reclamation is single-threaded, protected by single global lock. Linux seems to be able to use a thread per mount point, but at this time it creates more harm than good. Reduce number of threads to 1, adding tunable in case somebody wants to try more. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Ahelenia Ziemiańska <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #12896 Issue #9966
* zfs list: Allow more fields in ZFS_ITER_SIMPLE modeAllan Jude2021-12-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the fields to be listed and sorted by are constrained to those populated by dsl_dataset_fast_stat(), then zfs list is much faster, as it does not need to open each objset and reads its properties. A previous optimization by Pawel Dawidek (0cee24064a79f9c01fc4521543c37acea538405f) took advantage of this to make listing snapshot names sorted only by name much faster. However, it was limited to `-o name -s name`, this work extends this optimization to work with: - name - guid - createtxg - numclones - inconsistent - redacted - origin and could be further extended to any other properties supported by dsl_dataset_fast_stat() or similar, that do not require extra locking or reading from disk. Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pawel Jakub Dawidek <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #11080
* ZFS send/recv with ashift 9->12 leads to data corruptionPaul Dagnelie2021-12-072-7/+17
| | | | | | | | | Improve the ability of zfs send to determine if a block is compressed or not by using information contained in the blkptr. Reviewed-by: Rich Ercolani <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #12770
* Add `const` to nvlist functions to properly expose their real behaviorPaul Dagnelie2021-12-066-30/+42
| | | | | | | | Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #12728
* Corrected a case where we could read uninited ABD memoryRich Ercolani2021-12-031-11/+14
| | | | | | | | | | | | For my sins, I started running valgrind over ztest to try and fix that pesky intermittent "zloop dies with malloc errors" problem. This one seemed exciting enough to merit cutting a PR for before the rest get polished. Suggested-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12214
* Linux 5.13 compat: retry zvol_open() when contendedBrian Behlendorf2021-12-011-33/+5
| | | | | | | | | | | | | | | | | | | | | | | Due to a possible lock inversion the zvol open call path on Linux needs to be able to retry in the case where the spa_namespace_lock cannot be acquired. For Linux 5.12 an older kernel this was accomplished by returning -ERESTARTSYS from zvol_open() to request that blkdev_get() drop the bdev->bd_mutex lock, reaquire it, then call the open callback again. However, as of the 5.13 kernel this behavior was removed. Therefore, for 5.12 and older kernels we preserved the existing retry logic, but for 5.13 and newer kernels we retry internally in zvol_open(). This should always succeed except in the case where a pool's vdev are layed on zvols, in which case it may fail. To handle this case vdev_disk_open() has been updated to retry when opening a device when -ERESTARTSYS is returned. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #12301 Closes #12759
* Default to zfs_dmu_offset_next_sync=1Brian Behlendorf2021-11-301-4/+8
| | | | | | | | | | | | Strict hole reporting was previously disabled by default as a performance optimization. However, this has lead to confusion over the expected behavior and a variety of workarounds being adopted by consumers of ZFS. Change the default behavior to always report holes and force the TXG sync. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12746
* Code cleanupsPawel Jakub Dawidek2021-11-303-7/+8
| | | | | | | | | | | | | | | - Allocate ve_search on the stack, so we avoid allocating memory for every I/O even if the VDEV cache is disabled. - Reduce lock scope. - Avoid locking in vdev_cache_read() when the VDEV cache is disabled. - Sort file names properly. - Correct comment. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #12749
* Vdev Properties FeatureAllan Jude2021-11-307-77/+971
| | | | | | | | | | | | | | | | | | | | | | | | Add properties, similar to pool properties, to each vdev. This makes use of the existing per-vdev ZAP that was added as part of device evacuation/removal. A large number of read-only properties are exposed, many of the members of struct vdev_t, that provide useful statistics. Adds support for read-only "removing" vdev property. Adds the "allocating" property that defaults to "on" and can be set to "off" to prevent future allocations from that top-level vdev. Supports user-defined vdev properties. Includes support for properties.vdev in SYSFS. Co-authored-by: Allan Jude <[email protected]> Co-authored-by: Mark Maybee <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #11711
* Enable edonr in FreeBSDRich Ercolani2021-11-161-4/+0
| | | | | | | | | | | The code is integrated, builds fine, runs fine, there's not really any reason not to. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12735
* Introduce a tunable to exclude special class buffers from L2ARCGeorge Amanakis2021-11-114-7/+112
| | | | | | | | | | | Special allocation class or dedup vdevs may have roughly the same performance as L2ARC vdevs. Introduce a new tunable to exclude those buffers from being cacheable on L2ARC. Reviewed-by: Don Brady <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #11761 Closes #12285
* Check l2cache vdevs pending list inside the vdev_inuse()Fedor Uporov2021-11-112-9/+35
| | | | | | | | | | | The l2cache device could be added twice because vdev_inuse() does not check spa_l2cache for added devices. Make l2cache vdevs inuse checking logic more closer to spare vdevs. Reviewed-by: George Amanakis <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fedor Uporov <[email protected]> Closes #9153 Closes #12689
* Restore dirty dnode detection logicBrian Behlendorf2021-11-101-1/+1
| | | | | | | | | | | In addition to flushing memory mapped regions when checking holes, commit de198f2d95 modified the dirty dnode detection logic to check the dn->dn_dirty_records instead of the dn->dn_dirty_link. Relying on the dirty record has not be reliable, switch back to the previous method. Signed-off-by: Brian Behlendorf <[email protected]> Issue #11900 Closes #12745
* Skip spacemaps reading in case of pool readonly importFedor Uporov2021-11-093-2/+17
| | | | | | | | | | | | The only zdb utility require to read metaslab-related data during read-only pool import because of spacemaps validation. Add global variable which will allow zdb read spacemaps in case of readonly import mode. Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fedor Uporov <[email protected]> Closes #9095 Closes #12687
* Single IO issue for raidz writes with skip sectorBrian Atkinson2021-11-091-37/+135
| | | | | | | | | | | | | | | | | In order to reduce contention on the vq_lock, optional skip sectors for Raidz writes can be placed into a single IO request. This is done by padding out the linear ABD for a parity column to contain the skip sector and by creating gang ABD to contain the data and skip sector for data columns. The vdev_raidz_map_alloc() function now contains specific functions for both reads and write to allocate the ABD's that will be issued down to the VDEV chldren. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-By: Mark Maybee <[email protected]> Signed-off-by: Brian Atkinson <[email protected]> Closes #12333
* Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistencyBrian Behlendorf2021-11-073-28/+54
| | | | | | | | | | | | | | | | | | | | | | | When using lseek(2) to report data/holes memory mapped regions of the file were ignored. This could result in incorrect results. To handle this zfs_holey_common() was updated to asynchronously writeback any dirty mmap(2) regions prior to reporting holes. Additionally, while not strictly required, the dn_struct_rwlock is now held over the dirty check to prevent the dnode structure from changing. This ensures that a clean dnode can't be dirtied before the data/hole is located. The range lock is now also taken to ensure the call cannot race with zfs_write(). Furthermore, the code was refactored to provide a dnode_is_dirty() helper function which checks the dnode for any dirty records to determine its dirtiness. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Rich Ercolani <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #11900 Closes #12724
* Revert behavior of 59eab109 on not-LinuxRich Ercolani2021-11-041-1/+8
| | | | | | | | | | | | | | It turns out that short-circuiting the EFAULT behavior on a short read breaks things on FreeBSD. So until there's a nicer solution, let's just revert the behavior for not-Linux. Reference: https://reviews.freebsd.org/R10:70f51f0e474ffe1fb74cb427423a2fba3637544d Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12698
* Remove unused function zvol_set_volblocksize()Fedor Uporov2021-10-261-45/+0
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Fedor Uporov <[email protected]> Closes #12688
* Make dsl_scan print the pool name in dbgmsgRich Ercolani2021-10-261-37/+64
| | | | | | | | | If you've got multiple scrubs/resilvers going, it's rather helpful to know which pool each scan line refers to. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes: #12674
* spa.c: Replace VERIFY(nvlist_*(...) == 0) with fnvlist_* (#12678)Allan Jude2021-10-261-156/+129
| | | | | | | | | The fnvlist versions of the functions are fatal if they fail, saving each call from having to include checking the result. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Allan Jude <[email protected]>
* Remove code duplicationPawel Jakub Dawidek2021-10-181-42/+33
| | | | | | | | | Remove code duplication by moving code responsible for partial block zeroing to a separate function: dnode_partial_zero(). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #12627
* Remove FreeBSD's local copy of the dmu_buf_hold_array() functionPawel Jakub Dawidek2021-10-131-1/+1
| | | | | | | | Make the main dmu_buf_hold_array() function non-static. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #12628
* Export minimal zfs_refcount interfacesBrian Behlendorf2021-10-111-0/+8
| | | | | | | | | | Lustre makes light use of the zfs_refcount interfaces which isn't a problem when using a non-debug build of OpenZFS. However, when debugging is enabled the required symbols are not exported. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12613
* Correct refcount_add in dmu_zfetchRich Ercolani2021-10-081-1/+2
| | | | | | | | | | | | refcount_add_many(foo,N) is not the same as for (i=0; i < N; i++) { refcount_add(foo); } Unfortunately, this is only actually true with debug kernels and reference_tracking_enable=1. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12589 Closes #12602
* Rescan enclosure sysfs path on importTony Hutter2021-10-041-0/+24
| | | | | | | | | | | | | | When you create a pool, zfs writes vd->vdev_enc_sysfs_path with the enclosure sysfs path to the fault LEDs, like: vdev_enc_sysfs_path = /sys/class/enclosure/0:0:1:0/SLOT8 However, this enclosure path doesn't get updated on successive imports even if enclosure path to the disk changes. This patch fixes the issue. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #11950 Closes #12095
* ZFS: Remove a redundant if condition (#12598)Attila Fülöp2021-10-021-3/+1
| | | | | | | | | | Commit 0c03d21ac99ebdbe left in a redundant if condition while removing some code. Just remove it. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes #12598
* Handle partial reads in zfs_readRich Ercolani2021-09-201-0/+8
| | | | | | | | | | | | | | | | | | | Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one loop, get EFAULT back from zfs_uiomove() (because the iovec only holds 64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN gets interpreted as "I didn't read anything", the caller tries again without consuming the 64k we already read, and we're stuck. This apparently works on newer kernels because the caller which breaks on older Linux kernels by happily passing along a 1M read request and a 64k iovec just requests 64k at a time. With this, we now won't return EFAULT if we got a partial read. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12370 Closes #12509 Closes #12516
* Avoid panic in case of pool errors and missing L2ARCGeorge Amanakis2021-09-161-1/+6
| | | | | | | | | | | | | | | In case an ARC buffer is allocated only on L2ARC, and there are underlying errors in a pool with the cache device in faulty state, a panic can occur in arc_read_done()->arc_hdr_destroy()-> arc_hdr_l2arc_destroy()->arc_hdr_clear_flags() when trying to free the ARC buffer. Fix this by discarding the buffer's identity in arc_hdr_destroy(), in case the buffer is not empty, before calling arc_hdr_l2hdr_destroy(). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #12392
* Use fallthrough macroBrian Behlendorf2021-09-147-13/+13
| | | | | | | | | | | | | | | As of the Linux 5.9 kernel a fallthrough macro has been added which should be used to anotate all intentional fallthrough paths. Once all of the kernel code paths have been updated to use fallthrough the -Wimplicit-fallthrough option will because the default. To avoid warnings in the OpenZFS code base when this happens apply the fallthrough macro. Additional reading: https://lwn.net/Articles/794944/ Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12441
* Iterate encrypted clones at zvol_create_minorJorgen Lundman2021-09-131-0/+64
| | | | | | | | | | | | | | | Userland figures out which encryption-root keys are required to load, and issues ZFS_IOC_LOAD_KEY. The tail section of spa_keystore_load_wkey() will call zvol_create_minors() on the encryption-root object. Any clones of the encrypted zvol will not be plumbed. This commits adds additional logic to detect if zvol has clones, and is encrypted, then adds these to the list of zvols to call zvol_create_minors() on. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #12471
* Fixed data integrity issue when underlying disk returns errorArun KV2021-09-131-3/+31
| | | | | | | | | | | | | | | | | | Errors in zil_lwb_write_done() are not propagated to zil_lwb_flush_vdevs_done() which can result in zil_commit_impl() not returning an error to applications even when zfs was not able to write data to the disk. Remove the ZIO_FLAG_DONT_PROPAGATE flag from zio_rewrite() to allow errors to propagate and consolidate the error handling for flush and write errors to a single location (rather than having error handling split between the "write done" and "flush done" handlers). Reviewed-by: George Wilson <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: Arun KV <[email protected]> Closes #12391 Closes #12443
* Verify embedded blkptr's in arc_read()Brian Behlendorf2021-09-092-7/+14
| | | | | | | | | | | | | | | | The block pointer verification check in arc_read() should also cover embedded block pointers. While highly unlikely, accessing a damaged block pointer can result in panic. To further harden the code extend the existing check to include embedded block pointers and add a comment explaining the rational for this sanity check. Lastly, correct a flaw in zfs_blkptr_verify() so the error count is checked even when checking a untrusted config to verify the non-pool-specific portions of a block pointer. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12535
* Upstream: Add snapshot and zvol eventsJorgen Lundman2021-09-091-0/+52
| | | | | | | | | | | | | | For kernel to send snapshot mount/unmount events to zed. For kernel to send symlink creates/removes on zvol plumbing. (/dev/run/dsk/zvol/$pool/$zvol -> /dev/diskX) If zed misses the ENODEV, all errors after are EINVAL. Treat any error as kernel module failure. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #12416
* Allow sending corrupt snapshots even if metadata is corruptedAllan Jude2021-09-091-0/+2
| | | | | | | | | | | | When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag, so traverse_visitbp() will not fail with ECKSUM if a blockpointer cannot be read, but rather will continue and send the objects it can. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Allan Jude <[email protected]> Sponsored-By: Klara Inc. Sponsored-By: WHC Online Solutions Inc. Closes #12541
* arc: Drop an incorrect assertRich Ercolani2021-09-081-1/+0
| | | | | | | | | | | | | Unfortunately, there was an overzealous assertion that was (in pretty specific circumstances) false, causing failure. This assertion was added in error, so we're removing it. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #9897 Closes #12020 Closes #12246
* Compressed receive with different ashift can result in incorrect PSIZE on diskPaul Dagnelie2021-09-081-0/+12
| | | | | | | | | | | | | | We round up the psize to the nearest multiple of the asize or to the lsize, whichever is smaller. Once that's done, we allocate a new buffer of the appropriate size, zero the tail, and copy the data into it. This adds a small performance cost to these kinds of writes, but fixes the bookkeeping problems. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Co-authored-by: Matthew Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #12522 Closes #8462
* Extend zpool-iostat to account for ZIO_PRIORITY_REBUILD (#12319)Trevor Bautista2021-08-262-6/+21
| | | | | | | | | | | | | Previously, zpool-iostat did not display any data regarding rebuild I/Os in either the latency/size histograms (-w/-l/-r) or the queue data (-q). This fix essentially utilizes the existing infrastructure for tracking rebuild queue data and displays this data in the proper places within zpool-iostat's output. Signed-off-by: Trevor Bautista <[email protected]> Signed-off-by: Trevor Bautista <[email protected]> Co-authored-by: Trevor Bautista <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Initialize parity blocks before RAID-Z reconstruction benchmarkingMark Johnston2021-08-231-0/+7
| | | | | | | | | | | | | benchmark_raidz() allocates a row to benchmark parity calculation and reconstruction. In the latter case, the parity blocks are left uninitialized, leading to reports from KMSAN. Initialize parity blocks to 0xAA as we do for the data earlier in the function. This does not affect the selected RAID-Z implementation on any of several systems tested. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes #12473
* Remove b_pabd/b_rabd allocation from arc_hdr_alloc()Alexander Motin2021-08-171-48/+65
| | | | | | | | | | | | | | | | | | When a header is allocated for full overwrite it is a waste of time to allocate b_pabd/b_rabd for it, since arc_write() will free them without ever being touched. If it is a read or a partial overwrite then arc_read() and arc_hdr_decrypt() allocate them explicitly. Reduced memory allocation in user threads also reduces ARC eviction throttling there, proportionally increasing it in ZIO threads, that is not good. To minimize or even avoid it introduce ARC allocation reserve, allowing certain arc_get_data_abd() callers to allocate a bit longer in situations where user threads will already throttle. Reviewed-by: George Wilson <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12398
* Optimize arc_l2c_only lists assertionsAlexander Motin2021-08-171-9/+12
| | | | | | | | | | | | It is very expensive and not informative to call multilist_is_empty() for each arc_change_state() on debug builds to check for impossible. Instead implement special index function for arc_l2c_only->arcs_list, multilists, panicking on any attempt to use it. Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12421
* Fix/improve dbuf hits accountingAlexander Motin2021-08-171-20/+10
| | | | | | | | | | | | | | | Instead of clearing stats inside arc_buf_alloc_impl() do it inside arc_hdr_alloc() and arc_release(). It fixes statistics being wiped every time a new dbuf is filled from the ARC. Remove b_l1hdr.b_l2_hits. L2ARC hits are accounted at b_l2hdr.b_hits. Since the hits are accounted under hash lock, replace atomics with simple increments. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12422
* Avoid vq_lock drop in vdev_queue_aggregate()Alexander Motin2021-08-171-29/+34
| | | | | | | | | | | | | vq_lock is already too congested for two more operations per I/O. Instead of dropping and reacquiring it inside vdev_queue_aggregate() delegate the zio_vdev_io_bypass() and zio_execute() calls for parent I/Os to callers, that drop the lock any way to execute the new I/O. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12297
* Use more atomics in refcountsAlexander Motin2021-08-171-29/+22
| | | | | | | | | | | | | | Use atomic_load_64() for zfs_refcount_count() to prevent torn reads on 32-bit platforms. On 64-bit ones it should not change anything. When built with ZFS_DEBUG but running without tracking enabled use atomics instead of mutexes same as for builds without ZFS_DEBUG. Since rc_tracked can't change live we can check it without lock. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12420