aboutsummaryrefslogtreecommitdiffstats
path: root/module
Commit message (Collapse)AuthorAgeFilesLines
* Use fallthrough macroBrian Behlendorf2021-09-1418-25/+29
| | | | | | | | | | | | | | | As of the Linux 5.9 kernel a fallthrough macro has been added which should be used to anotate all intentional fallthrough paths. Once all of the kernel code paths have been updated to use fallthrough the -Wimplicit-fallthrough option will because the default. To avoid warnings in the OpenZFS code base when this happens apply the fallthrough macro. Additional reading: https://lwn.net/Articles/794944/ Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12441
* Iterate encrypted clones at zvol_create_minorJorgen Lundman2021-09-131-0/+64
| | | | | | | | | | | | | | | Userland figures out which encryption-root keys are required to load, and issues ZFS_IOC_LOAD_KEY. The tail section of spa_keystore_load_wkey() will call zvol_create_minors() on the encryption-root object. Any clones of the encrypted zvol will not be plumbed. This commits adds additional logic to detect if zvol has clones, and is encrypted, then adds these to the list of zvols to call zvol_create_minors() on. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #12471
* Fixed data integrity issue when underlying disk returns errorArun KV2021-09-131-3/+31
| | | | | | | | | | | | | | | | | | Errors in zil_lwb_write_done() are not propagated to zil_lwb_flush_vdevs_done() which can result in zil_commit_impl() not returning an error to applications even when zfs was not able to write data to the disk. Remove the ZIO_FLAG_DONT_PROPAGATE flag from zio_rewrite() to allow errors to propagate and consolidate the error handling for flush and write errors to a single location (rather than having error handling split between the "write done" and "flush done" handlers). Reviewed-by: George Wilson <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: Arun KV <[email protected]> Closes #12391 Closes #12443
* Verify embedded blkptr's in arc_read()Brian Behlendorf2021-09-092-7/+14
| | | | | | | | | | | | | | | | The block pointer verification check in arc_read() should also cover embedded block pointers. While highly unlikely, accessing a damaged block pointer can result in panic. To further harden the code extend the existing check to include embedded block pointers and add a comment explaining the rational for this sanity check. Lastly, correct a flaw in zfs_blkptr_verify() so the error count is checked even when checking a untrusted config to verify the non-pool-specific portions of a block pointer. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12535
* Upstream: Add snapshot and zvol eventsJorgen Lundman2021-09-091-0/+52
| | | | | | | | | | | | | | For kernel to send snapshot mount/unmount events to zed. For kernel to send symlink creates/removes on zvol plumbing. (/dev/run/dsk/zvol/$pool/$zvol -> /dev/diskX) If zed misses the ENODEV, all errors after are EINVAL. Treat any error as kernel module failure. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #12416
* Linux 5.15 compat: get_acl()Brian Behlendorf2021-09-091-8/+26
| | | | | | | | | | | | | | Kernel commits 332f606b32b6 ovl: enable RCU'd ->get_acl() 0cad6246621b vfs: add rcu argument to ->get_acl() callback Added compatibility code to detect the new ->get_acl() interface and correctly handle the case where the new rcu argument is set. Reviewed-by: Coleman Kane <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12548
* Allow sending corrupt snapshots even if metadata is corruptedAllan Jude2021-09-091-0/+2
| | | | | | | | | | | | When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag, so traverse_visitbp() will not fail with ECKSUM if a blockpointer cannot be read, but rather will continue and send the objects it can. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Allan Jude <[email protected]> Sponsored-By: Klara Inc. Sponsored-By: WHC Online Solutions Inc. Closes #12541
* arc: Drop an incorrect assertRich Ercolani2021-09-081-1/+0
| | | | | | | | | | | | | Unfortunately, there was an overzealous assertion that was (in pretty specific circumstances) false, causing failure. This assertion was added in error, so we're removing it. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #9897 Closes #12020 Closes #12246
* Compressed receive with different ashift can result in incorrect PSIZE on diskPaul Dagnelie2021-09-081-0/+12
| | | | | | | | | | | | | | We round up the psize to the nearest multiple of the asize or to the lsize, whichever is smaller. Once that's done, we allocate a new buffer of the appropriate size, zero the tail, and copy the data into it. This adds a small performance cost to these kinds of writes, but fixes the bookkeeping problems. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Co-authored-by: Matthew Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #12522 Closes #8462
* FreeBSD: Don't remove SA xattr if not SA znodeRyan Moeller2021-08-301-1/+1
| | | | | | | | | | | | We attempt to remove an existing SA xattr when setting a dir xattr, but this only makes sense if the znode has been upgraded to the SA format. Otherwise, we will hit an assert in zfs_sa_get_xattr. Make sure this is an SA znode before attempting to remove the SA xattr. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12514
* Fix cross-endian interoperability of zstdRich Ercolani2021-08-304-6/+24
| | | | | | | | | | | | | | | It turns out that layouts of union bitfields are a pain, and the current code results in an inconsistent layout between BE and LE systems, leading to zstd-active datasets on one erroring out on the other. Switch everyone over to the LE layout, and add compatibility code to read both. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12008 Closes #12022
* FreeBSD: Implement hole-punching supportKa Ho Ng2021-08-302-3/+64
| | | | | | | | | | | | This adds supports for hole-punching facilities in the FreeBSD kernel starting from __FreeBSD_version 1400032. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Ka Ho Ng <[email protected]> Sponsored-by: The FreeBSD Foundation Closes #12458
* Extend zpool-iostat to account for ZIO_PRIORITY_REBUILD (#12319)Trevor Bautista2021-08-262-6/+21
| | | | | | | | | | | | | Previously, zpool-iostat did not display any data regarding rebuild I/Os in either the latency/size histograms (-w/-l/-r) or the queue data (-q). This fix essentially utilizes the existing infrastructure for tracking rebuild queue data and displays this data in the proper places within zpool-iostat's output. Signed-off-by: Trevor Bautista <[email protected]> Signed-off-by: Trevor Bautista <[email protected]> Co-authored-by: Trevor Bautista <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Initialize parity blocks before RAID-Z reconstruction benchmarkingMark Johnston2021-08-231-0/+7
| | | | | | | | | | | | | benchmark_raidz() allocates a row to benchmark parity calculation and reconstruction. In the latter case, the parity blocks are left uninitialized, leading to reports from KMSAN. Initialize parity blocks to 0xAA as we do for the data earlier in the function. This does not affect the selected RAID-Z implementation on any of several systems tested. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes #12473
* Linux 4.11 compat: statx supportRichard Yao2021-08-172-7/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linux 4.11 added a new statx system call that allows us to expose crtime as btime. We do this by caching crtime in the znode to match how atime, ctime and mtime are cached in the inode. statx also introduced a new way of reporting whether the immutable, append and nodump bits have been set. It adds support for reporting compression and encryption, but the semantics on other filesystems is not just to report compression/encryption, but to allow it to be turned on/off at the file level. We do not support that. We could implement semantics where we refuse to allow user modification of the bit, but we would need to do a dnode_hold() in zfs_znode_alloc() to find out encryption/compression information. That would introduce locking that will have a minor (although unmeasured) performance cost. It also would be inferior to zdb, which reports far more detailed information. We therefore omit reporting of encryption/compression through statx in favor of recommending that users interested in such information use zdb. Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #8507
* Remove b_pabd/b_rabd allocation from arc_hdr_alloc()Alexander Motin2021-08-173-50/+67
| | | | | | | | | | | | | | | | | | When a header is allocated for full overwrite it is a waste of time to allocate b_pabd/b_rabd for it, since arc_write() will free them without ever being touched. If it is a read or a partial overwrite then arc_read() and arc_hdr_decrypt() allocate them explicitly. Reduced memory allocation in user threads also reduces ARC eviction throttling there, proportionally increasing it in ZIO threads, that is not good. To minimize or even avoid it introduce ARC allocation reserve, allowing certain arc_get_data_abd() callers to allocate a bit longer in situations where user threads will already throttle. Reviewed-by: George Wilson <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12398
* Optimize arc_l2c_only lists assertionsAlexander Motin2021-08-171-9/+12
| | | | | | | | | | | | It is very expensive and not informative to call multilist_is_empty() for each arc_change_state() on debug builds to check for impossible. Instead implement special index function for arc_l2c_only->arcs_list, multilists, panicking on any attempt to use it. Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12421
* Fix/improve dbuf hits accountingAlexander Motin2021-08-171-20/+10
| | | | | | | | | | | | | | | Instead of clearing stats inside arc_buf_alloc_impl() do it inside arc_hdr_alloc() and arc_release(). It fixes statistics being wiped every time a new dbuf is filled from the ARC. Remove b_l1hdr.b_l2_hits. L2ARC hits are accounted at b_l2hdr.b_hits. Since the hits are accounted under hash lock, replace atomics with simple increments. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12422
* Avoid vq_lock drop in vdev_queue_aggregate()Alexander Motin2021-08-171-29/+34
| | | | | | | | | | | | | vq_lock is already too congested for two more operations per I/O. Instead of dropping and reacquiring it inside vdev_queue_aggregate() delegate the zio_vdev_io_bypass() and zio_execute() calls for parent I/Os to callers, that drop the lock any way to execute the new I/O. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12297
* Use more atomics in refcountsAlexander Motin2021-08-171-29/+22
| | | | | | | | | | | | | | Use atomic_load_64() for zfs_refcount_count() to prevent torn reads on 32-bit platforms. On 64-bit ones it should not change anything. When built with ZFS_DEBUG but running without tracking enabled use atomics instead of mutexes same as for builds without ZFS_DEBUG. Since rc_tracked can't change live we can check it without lock. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12420
* Restore FreeBSD sysctl processing for arc.min and arc.maxAllan Jude2021-08-163-6/+83
| | | | | | | | | | | | | | | | Before OpenZFS 2.0, trying to set the FreeBSD sysctl vfs.zfs.arc_max to a disallowed value would return an error. Since the switch, it instead only generates WARN_IF_TUNING_IGNORED Keep the ability to set the sysctl's specifically to 0, even though that is less than the minimum, because some tests depend on this. Also lost, was the ability to set vfs.zfs.arc_max to a value less than the default vfs.zfs.arc_min at boot time. Restore this as well. Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #12161
* zfs: add missed dependency of zfs module on zlibRyan Moeller2021-08-131-0/+1
| | | | | | | | Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Martin Matuska <[email protected]> Co-authored-by: Konstantin Belousov <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> External-issue: https://reviews.freebsd.org/D31207 Closes #12442
* Run arc_evict thread at higher priorityTony Nguyen2021-08-104-13/+19
| | | | | | | | | | | | | | Run arc_evict thread at higher priority, nice=0, to give it more CPU time which can improve performance for workload with high ARC evict activities. On mixed read/write and sequential read workloads, I've seen between 10-40% better performance. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Closes #12397
* Enable /proc/diskstats for zvolsBrian Behlendorf2021-08-051-0/+3
| | | | | | | | | | The /proc/diskstats accounting needs to be explicitly enabled for block devices which do not use multi-queue. Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12440 Closes #12066
* Modify checksum obtain method of QAThedongzhang2021-08-031-23/+4
| | | | | | | | | | CpaDcGeneratefooter function that obtain the checksum code does not support the CPA_DC_STATELESS mode. So we get the adler32 chencksum of the end of the zlib from dc_results. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chengfei Zhu <[email protected]> Signed-off-by: hedong.zhang <[email protected]> Closes #12343
* Allow disabling of unmapped I/O on FreeBSDMark Johnston2021-08-021-0/+4
| | | | | | | | | | | | | | | We have a tunable which permits one to disable the use of unmapped I/O for the buffer cache. Respect it in ZFS as well. This is useful for KMSAN, which cannot easily maintain shadow state for unmapped pages. No functional change intended, as unmapped I/O is permitted by default and there's no real reason to disable it in practice except for debugging. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes #12446
* Avoid small buffer copying on writeAlexander Motin2021-07-274-4/+5
| | | | | | | | | | | It is wrong for arc_write_ready() to use zfs_abd_scatter_enabled to decide whether to reallocate/copy the buffer, because the answer is OS-specific and depends on the buffer size. Instead of that use abd_size_alloc_linear(), moved into public header. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #12425
* Fix format specifier warningsBrian Behlendorf2021-07-271-2/+2
| | | | | | | | | | | Commit 5dbf6c5a66d did not address these format specifier warnings since they were introduced by an unrelated commit which had not been rebased on 5dbf6c5a66d when merged. Fix them. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12435
* macOS can also set va_typeJorgen Lundman2021-07-261-1/+1
| | | | | | | Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #12357
* Add comment on metaslab_class_throttle_reserve() lockingAlexander Motin2021-07-261-0/+7
| | | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Issue #12314 Closes #12419
* Fixes in persistent L2ARCGeorge Amanakis2021-07-261-75/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In l2arc_add_vdev() first decide whether the device is eligible for L2ARC rebuild or whole device trim and then add it to the list of cache devices. Otherwise l2arc_feed_thread() might already start writing on the device invalidating previous content as l2ad_hand = l2ad_start. However l2arc_rebuild_vdev() needs the device present in the cache device list to figure out its l2arc_dev_t. Fix this by moving most of l2arc_rebuild_vdev() in a new function l2arc_rebuild_dev() which does not need to search in the cache device list. In contrast to l2arc_add_vdev() we do not have to worry about l2arc_feed_thread() invalidating previous content when onlining a cache device. The device parameters (l2ad*) are not cleared when offlining the device and writing new buffers will not invalidate all previous content. In worst case only buffers that have not had their log block written to the device will be lost. Retire persist_l2arc_00{4,5,8} tests since they cover code already covered by the remaining ones. Test persist_l2arc_006 is renamed to persist_l2arc_004 and persist_l2arc_007 is renamed to persist_l2arc_005. Fix a typo in persist_l2arc_004, and remove an assertion that is not always true from l2arc_arcstats_pos. Also update an assertion in persist_l2arc_005 and explain why in a comment. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #12365
* Remove NOTE(CONSTCOND) and note.hнаб2021-07-265-14/+0
| | | | | | | | These were mostly used to annotate do {} while(0)s Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Issue #12201
* Normalise /*FALLTHR{OUGH,U}*/наб2021-07-269-14/+14
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Issue #12201
* Prune /*NOTREACHED*/наб2021-07-262-2/+1
| | | | | | | | | This includes a simplification of mkbusy and format correctness in zhack and ztest Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Issue #12201
* Replace /*PRINTFLIKEn*/ with attribute(printf)наб2021-07-266-12/+12
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Issue #12201
* Initialize dn_next_type[] in the dnode constructorMark Johnston2021-07-261-0/+1
| | | | | | | | | | | | | | | It seems nothing ensures that this array is zeroed when a dnode is freshly allocated, so in principle it retains the values from the previous allocation. In practice it seems to be the case that the fields should end up zeroed, but we can zero the field anyway for consistency. This was found using KMSAN. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes #12383
* Zero pad bytes following TX_WRITE log dataMark Johnston2021-07-261-2/+6
| | | | | | | | | | | | | | When logging a TX_WRITE record in the case where file data has to be copied from the DMU, we pad the log record size to a multiple of 8 bytes. In this case, any padding bytes should be zeroed, otherwise the contents of uninitialized memory are written to the ZIL. This was found using KMSAN. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes #12383
* Zero pad bytes when allocating a ZIL recordMark Johnston2021-07-261-3/+4
| | | | | | | | | | | | | When allocating a record, we round up the allocation size to a multiple of 8. In this case, any padding bytes should be zeroed, otherwise the contents of uninitialized memory are written to the ZIL. This was found using KMSAN. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes #12383
* Initialize all fields in zfs_log_xvattr()Mark Johnston2021-07-261-1/+3
| | | | | | | | | | | | | | When logging TX_SETATTR, we could otherwise fail to initialize part of the corresponding ZIL record depending on which fields are present in the xvattr. Initialize the creation time and the AV scan timestamp to zero so that uninitialized bytes are not written to the ZIL. This was found using KMSAN. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes #12383
* Initialize "autoreplace" in spa_ld_get_props()Mark Johnston2021-07-261-1/+1
| | | | | | | | | | | | | | | | spa_prop_find() may fail to find the specified property, in which case it suppresses ENOENT from zap_lookup(). In this case, the return value is left uninitialized, so spa_autoreplace was being initialized using an uninitialized stack variable. This was found using KMSAN. It appears to be a regression from commit 9eb7b46ed0, which removed the initialization of "autoreplace" from the definition. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes #12383
* Linux 5.14 compat: explicity assign set_page_dirtyColeman Kane2021-07-261-0/+6
| | | | | | | | | | | | | | Kernel 5.14 introduced a change where set_page_dirty of struct address_space_operations is no longer implicitly set to __set_page_dirty_buffers(), which ended up resulting in a NULL pointer deref in the kernel when it is attempted to be called. This change sets .set_page_dirty in the structure to __set_page_dirty_nobuffers(), which was introduced with the related patch set. The breaking change was introduce in commit 0af573780b0b13fceb7fabd49dc1b073cee9a507 to torvalds/linux.git. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #12427
* Fix unfortunate NULL in spa_update_dspaceRich Ercolani2021-07-261-1/+8
| | | | | | | | | | | | | After 1325434b, we can in certain circumstances end up calling spa_update_dspace with vd->vdev_mg NULL, which ends poorly during vdev removal. So let's not do that further space adjustment when we can't. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12380 Closes #12428
* Linux 5.14 compat: blk_alloc_disk()Brian Behlendorf2021-07-231-9/+34
| | | | | | | | | | | | | | | In Linux 5.14, blk_alloc_queue is no longer exported, and its usage has been superseded by blk_alloc_disk, which returns a gendisk struct from which we can still retrieve the struct request_queue* that is needed in the one place where it is used. This also replaces the call to alloc_disk(minors), and minors is now set via struct member assignment. Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Coleman Kane <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12362 Closes #12409
* FreeBSD: Ignore make_dev_s() errorsAlexander Motin2021-07-221-13/+18
| | | | | | | | | | | | | | | | Since errors returned by zvol_create_minor_impl() are ignored by the common code, it is more convenient to ignore make_dev_s() errors there. It allows, for example, to get device created for the zvol after later rename instead of having it further stuck in half-created state. zvol_rename_minor() already ignores those errors. While there, switch from MAXPHYS to maxphys in FreeBSD 13+. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12375
* Optimize allocation throttlingAlexander Motin2021-07-214-51/+35
| | | | | | | | | | | | | | | | | | | | | | | Remove mc_lock use from metaslab_class_throttle_*(). The math there is based on refcounts and so atomic, so the only race possible there is between zfs_refcount_count() and zfs_refcount_add(). But in most cases metaslab_class_throttle_reserve() is called with the allocator lock held, which covers the race. In cases where the lock is not held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we do not use zfs_refcount_count(). And even if we assume some other non-existing scenario, the worst that may happen from this race is few more I/Os get to allocation earlier, that is not a problem. Move locks and data of different allocators into different cache lines to avoid false sharing. Group spa_alloc_* arrays together into single array of aligned struct spa_alloc spa_allocs. Align struct metaslab_class_allocator. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12314
* Add Module Parameter Regarding Log Size LimitKevin Jin2021-07-205-2/+86
| | | | | | | | | | | | | | * Add Module Parameters Regarding Log Size Limit zfs_wrlog_data_max The upper limit of TX_WRITE log data. Once it is reached, write operation is blocked, until log data is cleared out after txg sync. It only counts TX_WRITE log with WR_COPIED or WR_NEED_COPY. Reviewed-by: Prakash Surya <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: jxdking <[email protected]> Closes #12284
* Minor ARC optimizationsAlexander Motin2021-07-201-31/+9
| | | | | | | | | | | | | | | Remove unneeded global, practically constant, state pointer variables (arc_anon, arc_mru, etc.), replacing them with macros of real state variables addresses (&ARC_anon, &ARC_mru, etc.). Change ARC_EVICT_ALL from -1ULL to UINT64_MAX, not requiring special handling in inner loop of ARC reclamation. Respectively change bytes argument of arc_evict_state() from int64_t to uint64_t. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #12348
* dmu_redact.c does not call bqueue_destroyJorgen Lundman2021-07-201-0/+2
| | | | | | | | Ensure all calls to bqueue_init() has a corresponding call to bqueue_destroy() Reviewed-by: Paul Dagnelie <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #12118
* A few fixes of callback typecasting (for the upcoming ClangCFI)Alexander2021-07-207-30/+109
| | | | | | | | | | | | | | * zio: avoid callback typecasting * zil: avoid zil_itxg_clean() callback typecasting * zpl: decouple zpl_readpage() into two separate callbacks * nvpair: explicitly declare callbacks for xdr_array() * linux/zfs_nvops: don't use external iput() as a callback * zcp_synctask: don't use fnvlist_free() as a callback * zvol: don't use ops->zv_free() as a callback for taskq_dispatch() Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Closes #12260
* Use SET_ERROR for more errors in FreeBSD vnopsRyan Moeller2021-07-191-16/+29
| | | | | | | | | | | We should use SET_ERROR when we first get an error. Add it in the FreeBSD xattr implementations where missing. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12356