aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix zdb_dump_block for little endian (#16310)Chunwei Chen2024-07-311-1/+1
| | | | | | | | | The endian macros were changed but zdb_dump_block wasn't updated accordingly. Signed-off-by: Chunwei Chen <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Allan Jude <[email protected]>
* zfs: add bounds checking to zil_parse (#16308)c1ick2024-07-311-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure log record don't stray beyond valid memory region. There is a lack of verification of the space occupied by fixed members of lr_t in the zil_parse. We can create a crafted image to trigger an out of bounds read by following these steps: 1) Do some file operations and reboot to simulate abnormal exit without umount 2) zil_chain.zc_nused: 0x1000 3) First lr_t lr_t.lrc_txtype: 0x0 lr_t.lrc_reclen: 0x1000-0xb8-0x1 lr_t.lrc_txg: 0x0 lr_t.lrc_seq: 0x1 4) Update checksum in zil_chain.zc_eck Fix: Add some checks to make sure the remaining bytes are large enough to hold an log record. Signed-off-by: XDTG <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
* Linux: Report reclaimable memory to kernel as such (#16385)Alexander Motin2024-07-3014-14/+29
| | | | | | | | | | | | | | | Linux provides SLAB_RECLAIM_ACCOUNT and __GFP_RECLAIMABLE flags to mark memory allocations that can be freed via shinker calls. It should allow kernel to tune and group such allocations for lower memory fragmentation and better reclamation under pressure. This patch marks as reclaimable most of ARC memory, directly evictable via ZFS shrinker, plus also dnode/znode/sa memory, indirectly evictable via kernel's superblock shrinker. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Allan Jude <[email protected]>
* dnode: allow storage class to be overridden by object typeRob Norris2024-07-296-2/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | spa_preferred_class() selects a storage class based on (among other things) the DMU object type. This only works for old-style object types that match only one specific kind of thing. For DMU_OTN_ types we need another way to signal the storage class. This commit allows the object type to be overridden in the IO policy for the purposes of choosing a storage class. It then adds the ability to set the storage type on a dnode hold, such that all writes generated under that hold will get it. This method has two shortcomings: - it would be better if we could "name" a set of storage class preferences rather than it being implied by the object type. - it would be better if this info were stored in the dnode on disk. In the absence of those things, this seems like the smallest possible change. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15894
* spa_preferred_class: pass the entire zioRob Norris2024-07-293-12/+11
| | | | | | | | | | | | | | | | | Rather than picking out specific values out of the properties, just pass the entire zio in, to make it easier in the future to use more of that info to decide on the storage class. I would have rathered just pass io_prop in, but having spa.h include zio.h gets a bit tricky. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15894
* Skip dnode handles use when not neededAlexander Motin2024-07-293-1/+36
| | | | | | | | | | | | | | | | | | | | | | | Neither FreeBSD nor Linux currently implement kmem_cache_set_move(), which means dnode_move() is never called. In such situation use of dnode handles with respective locking to access dnode from dbuf is a waste of time for no benefit. This patch implements optional simplified code for such platforms, saving at least 3 dnode lock/dereference/unlock per dbuf life cycle. Originally I hoped to drop the handles completely to save memory, but they are still used in dnodes allocation code, so left for now. Before this change in CPU profiles of some workloads I saw 4-20% of CPU time spent in zrl_add_impl()/zrl_remove(), which are gone now. Reviewed-by: Rob Wing <[email protected] Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #16374
* Cleanup DB_DNODE() macros usageAlexander Motin2024-07-296-51/+34
| | | | | | | | | | | | | | | | - Use the macros in few places it was missed. - Reduce scope of DB_DNODE_ENTER/EXIT() and inline some DB_DNODE() uses to make it more obvious what exactly is protected there and make unprotected accesses by mistake more difficult. - Make use of zrl_owner(). Reviewed-by: Rob Wing <[email protected] Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #16374
* ddt: add support for prefetching tables into the ARCAllan Jude2024-07-2637-42/+1057
| | | | | | | | | | | | | | | | | | | | | | This change adds a new `zpool prefetch -t ddt $pool` command which causes a pool's DDT to be loaded into the ARC. The primary goal is to remove the need to "warm" a pool's cache before deduplication stops slowing write performance. It may also provide a way to reload portions of a DDT if they have been flushed due to inactivity. Sponsored-by: iXsystems, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Klara, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Will Andrews <[email protected]> Signed-off-by: Fred Weigel <[email protected]> Signed-off-by: Rob Norris <[email protected]> Signed-off-by: Don Brady <[email protected]> Co-authored-by: Will Andrews <[email protected]> Co-authored-by: Don Brady <[email protected]> Closes #15890
* Fix ZDB to dump projid for projectquota enabled (#16291)Jitendra Patidar2024-07-251-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ZDB is supposed to dump "projid" via dump_znode(), when projectquota is enabled. ----------- static void dump_znode(objset_t *os, uint64_t object, void *data, size_t size) { ... if (dmu_objset_projectquota_enabled(os) && (pflags & ZFS_PROJID)) { uint64_t projid; if (sa_lookup(hdl, sa_attr_table[ZPL_PROJID], &projid, sizeof (uint64_t)) == 0) (void) printf("\tprojid %llu\n", (u_longlong_t)projid); } ... } ---------- But its not dumping "projid", even for project quota enabled. dmu_objset_projectquota_enabled() does following 3 checks, ---------- boolean_t dmu_objset_projectquota_enabled(objset_t *os) { return (file_cbs[os->os_phys->os_type] != NULL && DMU_PROJECTUSED_DNODE(os) != NULL && spa_feature_is_enabled(os->os_spa, SPA_FEATURE_PROJECT_QUOTA)); } ---------- It fails on file_cbs[] check. file_cbs[] gets initialised via dmu_objset_register_type(); which is not done for the ZDB, its done for the kernel via zfs_init(). Register a dummy callback handle for the DMU_OST_ZFS type in ZDB main() function to dump the projid for projectquota enabled. Signed-off-by: Jitendra Patidar <[email protected]> Closes #16290 Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tino Reichardt <[email protected]>
* zil: add stats for commit failure/fallback (#16315)Rob Norris2024-07-253-0/+40
| | | | | | | | | | | | There's no good way to tell when a ZIL commit fails and falls back to a transaction sync, other than perhaps a throughput drop. This adds counters so we can see when it happens and why. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Replace goo.gl style link (#16373)Alexander Motin2024-07-251-1/+1
| | | | | | | | | | | | That URL shortening scheme should stop working soon [1], while we don't really need it here. 1. https://developers.googleblog.com/en/google-url-shortener-links-will-no-longer-be-available/ Signed-off-by: Alexander Motin <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Several improvements to ARC shrinking (#16197)Alexander Motin2024-07-255-94/+123
| | | | | | | | | | | | | | | | | | | | | | | | - When receiving memory pressure signal from OS be more strict trying to free some memory. Otherwise kernel may come again and request much more. Return as result how much arc_c was actually reduced due to this request, that may be less than requested. - On Linux when receiving direct reclaim from some file system (that may be ZFS) instead of ignoring request completely, just shrink the ARC, but do not wait for eviction. Waiting there may cause deadlock. Ignoring it as before may put extra pressure on other caches and/or swap, and cause OOM if nothing help. While not waiting may result in more ARC evicted later, and may be too late if OOM killer activate right now, but I hope it to be better than doing nothing at all. - On Linux set arc_no_grow before waiting for reclaim, not after, or it may grow back while we are waiting. - On Linux add new parameter zfs_arc_shrinker_seeks to balance ARC eviction cost, relative to page cache and other subsystems. - Slightly update Linux arc_set_sys_free() math for new kernels. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* ddt: dedup table quota enforcementAllan Jude2024-07-2522-22/+599
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds two new pool properties: - dedup_table_size, the total size of all DDTs on the pool; and - dedup_table_quota, the maximum possible size of all DDTs in the pool When set, quota will be enforced by checking when a new entry is about to be created. If the pool is over its dedup quota, the entry won't be created, and the corresponding write will be converted to a regular non-dedup write. Note that existing entries can be updated (ie their refcounts changed), as that reuses the space rather than requiring more. dedup_table_quota can be set to 'auto', which will set it based on the size of the devices backing the "dedup" allocation device. This makes it possible to limit the DDTs to the size of a dedup vdev only, such that when the device fills, no new blocks are deduplicated. Sponsored-by: iXsystems, Inc. Sponsored-By: Klara Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Signed-off-by: Don Brady <[email protected]> Co-authored-by: Don Brady <[email protected]> Co-authored-by: Rob Wing <[email protected]> Co-authored-by: Sean Eric Fagan <[email protected]> Closes #15889
* ZTS: Make do_vol_test() more deterministic (#16379)Alexander Motin2024-07-241-9/+9
| | | | | | | | | - Explicitly disable compression since mkfile uses a zero buffer. - Explicitly sync file systems instead of waiting for timeout. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Linux 6.9: Fix UBSAN errors in sa.c (#16380)Tony Hutter2024-07-231-0/+1
| | | | | | | | | | | | | This is a follow-on to 156a64161b4f9da35f2e0484106173344cf78317 that ignores UBSAN errors in sa.c. Thank you @thwalker3 for the fix. Original-patch-by: @thwalker3 Signed-off-by: Tony Hutter <[email protected]> Closes #16278 Closes #16330 Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
* Add support for multiple lines to the sharenfs property for FreeBSD (#16338)rmacklem2024-07-232-21/+64
| | | | | | | | | | | | There has been a bugzilla PR#147881 requesting this for a long time (14 years!). It extends the syntax of the ZFS shanenfs property (for FreeBSD only) to allow multiple sets of options for different hosts/nets, separated by ';'s. Signed-off-by: Rick Macklem <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Add some missing vdev properties (#16346)Don Brady2024-07-237-2/+53
| | | | | | | | | Sponsored-by: Klara, Inc. Sponsored-By: Wasabi Technology, Inc. Signed-off-by: Don Brady <[email protected]> Co-authored-by: Don Brady <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* AUTHORS: refresh with recent new contributors (#16362)Rob Norris2024-07-232-0/+19
| | | | | | | | Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: George Melikov <[email protected]>
* Fix long_free_dirty accounting for small files (#16264)Chunwei Chen2024-07-231-0/+7
| | | | | | | | | | | | | | For files smaller than recordsize, it's most likely that they don't have L1 blocks. However, current calculation will always return at least 1 L1 block. In this change, we check dnode level to figure out if it has L1 blocks or not, and return 0 if it doesn't. This will reduce the chance of unnecessary throttling when deleting a large number of small files. Signed-off-by: Chunwei Chen <[email protected]> Co-authored-by: Chunwei Chen <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
* ZTS: Change cp_stress to fit timings (#16369)Tino Reichardt2024-07-221-7/+2
| | | | | | | | | | | | | cp_stress is getting killed on the new QEMU-based github runners we're developing. The problem is that the Linux based runners should do 10 RUNS, where the FreeBSD based runners only have 3 RUNS to succeed. This patch removes this different handling of Linux and FreeBSD. The cp_stress test is running fine in around 2 minutes now. Signed-off-by: Tino Reichardt <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* zdb: fix BRT dump (#16335)Rob Norris2024-07-181-2/+7
| | | | | | | | | | | | | BRT refcounts are stored as eight uint8_ts rather than a single uint64_t. This means that za_first_integer is only the first byte, so max 256. This fixes it by doing a lookup for the whole value. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]>
* Fix printf typo for `zfs receive -cv` (#16295)glibg10b2024-07-171-1/+1
| | | | | | | | | | | Current output: > receiving correctivefull stream of a into b New output: > receiving corrective full stream of a into b Signed-off-by: glibg10b <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Make sure avl_tree.avl_pad is not in kernel module (#16280)youzhongyang2024-07-172-2/+2
| | | | | | | | | | | | | | | | | The commit b192a2c (Remove avl_size field from struct avl_tree) uses a def _KERNEL to decide to include avl_pad or not, but this _KERNEL is defined in sys/sysmacros.h. If avl.h and sysmacros.h are not included in the right order, it can cause a headache when working on a zfs related kernel module. Add sysmacros.h in avl_impl.h to fix. sysmacros.h is also removed from spa.h as it's reduntant. Signed-off-by: Youzhong Yang <[email protected]> Co-authored-by: Youzhong Yang <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* zdb: dump ZAP_FLAG_UINT64_KEY ZAPs properly (#16334)Rob Norris2024-07-171-4/+26
| | | | | | | | | | | | These are used for DDT and BRT stores. There's limited information available to produce meaningful output, but at least we can put something on screen rather than crashing. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* vdev_open: clear async fault flag after reopenRob Norris2024-07-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | After c3f2f1aa2, vdev_fault_wanted is set on a vdev after a probe fails. An end-of-txg async task is charged with actually faulting the vdev. In a single-disk pool, the probe failure will degrade the last disk, and then suspend the pool. However, vdev_fault_wanted is not cleared. After the pool returns, the transaction finishes and the async task runs and faults the vdev, which suspends the pool again. The fix is simple: when reopening a vdev, clear the async fault flag. If the vdev is still failed, the startup probe will quickly notice and degrade/suspend it again. If not, all is well! Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Co-authored-by: Don Brady <[email protected]> Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Don Brady <[email protected]>
* zts: test single-disk pool resumes properly after disk pullRob Norris2024-07-174-1/+105
| | | | | | | | | | | | | A single disk pool should suspend when its disk fails and hold the IO. When the disk is returned, the pool should return and the IO be reissued, leaving everything in good shape. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Don Brady <[email protected]>
* Use kmap_local_page instead of kmap_atomic (#16329)Jason Lee2024-07-165-10/+41
| | | | | | | Changed zfs_k(un)map_atomic to zfs_k(un)map_local Signed-off-by: Jason Lee <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Atkinson <[email protected]>
* Linux 6.9 compat: META (#16358)Tony Hutter2024-07-161-1/+1
| | | | | | | Update the META file to reflect compatibility with the 6.9 kernel. Signed-off-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
* Linux 5.16: use bdev_nr_bytes() to get device capacityRob Norris2024-07-152-5/+35
| | | | | | | | | | | This helper was introduced long ago, in 5.16. Since 6.10, bd_inode no longer exists, but the helper has been updated, so detect it and use it in all versions where it is available. Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
* Linux 6.10: work harder to avoid kmem_cache_alloc reuseRob Norris2024-07-152-18/+13
| | | | | | | | | | | | | | | | | | | | | Linux 6.10 change kmem_cache_alloc to be a macro, rather than a function, such that the old #undef for it in spl-kmem-cache.c would remove its definition completely, breaking the build. This inverts the model used before. Rather than always defining the kmem_cache_* macro, then undefining then inside spl-kmem-cache.c, instead we make a special tag to indicate we're currently inside spl-kmem-cache.c, and not defining those in macros in the first place, so we can use the kernel-supplied kmem_cache_* functions to implement spl_kmem_cache_*, as we expect. For all other callers, we create the macros as normal and remove access to the kernel's own conflicting names. Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
* Linux 6.10: rework queue limits setupRob Norris2024-07-152-72/+118
| | | | | | | | | | | | | | | | | | | | | | | Linux has started moving to a model where instead of applying block queue limits through individual modification functions, a complete limits structure is built up and applied atomically, either when the block device or open, or some time afterwards. As of 6.10 this transition appears only partly completed. This commit matches that model within OpenZFS in a way that should work for past and future kernels. We set up a queue limits structure with any limits that have had their modification functions removed. For newer kernels that can have limits applied at block device open (HAVE_BLK_ALLOC_DISK_2ARG), we have a conversion function to turn the OpenZFS queue limits structure into Linux's queue_limits structure, which can then be passed in. For older kernels, we provide an application function that just calls the old functions for each limit in the structure. Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
* Add building support for Artix Linux (#16265)Zhao Yongming2024-07-151-3/+7
| | | | | | | | | | | | Artix Linux is systemd free distribution based on Arch Linux, with openrc dinit runit s6 as init alternatives. This patch will make init scripts installation work the way Gentoo Linux with openrc. The scripts tweaking for other init will be left to packager. Signed-off-by: Yongming Zhao <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* zstd: don't call zstd_mempool_reap if there are no buffers (#16302)Mateusz Guzik2024-07-151-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | zfs_zstd_cache_reap_now is issued every second. zstd_mempool_reap checks for both pool existence and buffer count, but that's still 2 func calls which are trivially avoidable. With clang it even avoids pushing the stack pointer (but still suffers the mispredict due to a forward jump, not modified in case someone is using zstd): <+0>: cmpq $0x0,0x0(%rip) # <zfs_zstd_cache_reap_now+8> <+8>: je 0x217de4 <zfs_zstd_cache_reap_now+36> <+10>: push %rbp <+11>: mov %rsp,%rbp <+14>: mov 0x0(%rip),%rdi # <zfs_zstd_cache_reap_now+21> <+21>: call 0x217df0 <zstd_mempool_reap> <+26>: mov 0x0(%rip),%rdi # <zfs_zstd_cache_reap_now+33> <+33>: pop %rbp <+34>: jmp 0x217df0 <zstd_mempool_reap> <+36>: ret Preferably the call would not be made to begin with if zstd is not used, but this retains all the logic confined to zstd code. Sponsored by: Rubicon Communications, LLC ("Netgate") Signed-off-by: Mateusz Guzik <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* head_errlog: fix use-after-freeGeorge Amanakis2024-07-151-2/+5
| | | | | | | | | | | | | In the commit of the head_errlog feature we introduced a bug in dsl_dataset_promote_sync(): we may dereference origin_head and hds, both dereferencing ddpa after calling promote_sync() on ddpa. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #16272 Closes #16273
* Fix missing semicolon in trace_dbuf.h (#16281)Daniel Berlin2024-07-121-1/+1
| | | | | | | | | | | | | | | On fedora 40, on the 6.9.4 kernel (in updates-testing), assign_str expands to a "do {<stuff> } while(0)" loop. Without this semicolon, the while(0) is unterminated, causing a cascade of useless errors. With this semicolon, it compiles fine. It also compiles fine on 6.8.11 (the previous kernel). I have not tested earlier kernels than that, but at worst it should add a pointless semicolon. All other instances in the source tree are already terminated with semicolons. Signed-off-by: Daniel Berlin <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* one-word manpage correction: snapshot->rollback (#16294)a1ea3212024-07-121-1/+1
| | | | | | | | | | | This commit fixes what is probably a copy-paste mistake. The `dracut.zfs` manpage claims that the `bootfs.rollback` option executes `zfs snapshot -Rf`. `zfs snapshot` does not have a `-R` option. `zfs rollback` does. Signed-off-by: Alphan Yılmaz <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* ZTS: handle FreeBSD version numbers correctly (#16340)Rob Norris2024-07-121-1/+29
| | | | | | | | | | | FreeBSD patchlevel versions are optional and, if present, in a different location in the version string. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* FreeBSD: Use the new freeuio() helper to free dynamically allocated UIOs ↵Mark Johnston2024-07-111-1/+11
| | | | | | | | | | | | | (#16300) This freeuio() interface was introduced to FreeBSD recently. For now it simply calls free(), so this change has no effect. However, this may not always be true, and in CheriBSD this change is required. Signed-off-by: Mark Johnston <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brooks Davis <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Linux 6.9: Fix UBSAN errors in zap_micro.cTony Hutter2024-07-111-0/+1
| | | | | | | | | | | | | | | | | You can use the UBSAN_SANITIZE_* Kbuild options to exclude certain kernel objects from the UBSAN checks. We previously excluded zap_micro.o with: UBSAN_SANITIZE_zap_micro.o := n For some reason that didn't work for the 6.9 kernel, which wants us to use: UBSAN_SANITIZE_zfs/zap_micro.o := n Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #16278 Closes #16330
* zvol: Fix suspend lock leaks (#16270)Mark Johnston2024-07-102-0/+3
| | | | | | | | | | | | In several functions, we use a flag variable to track whether zv_suspend_lock is held. This flag was not getting reset in a particular case where we need to retry the underlying operation, resulting in a lock leak. Make sure to update the flag where necessary. Signed-off-by: Mark Johnston <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Fix the name of the zfs_prefetch_disable parameter (#16319)Peter Doherty2024-07-091-1/+1
| | | | | | | | | The ZFS module parameter name is zfs_prefetch_disable, not zfs_disable_prefetch. Signed-off-by: Peter Doherty <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Fix zdb "Memory fault" found on FreeBSD ZTS (#16332)Tino Reichardt2024-07-091-1/+1
| | | | | | | | | | Reason: nvlist_free() tries to free sth. which isn't allocted Solution: init this variable with NULL Closes #16311 Signed-off-by: Tino Reichardt <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* FreeBSD: Use a statement expression to implement SET_ERROR() (#16284)Mark Johnston2024-07-081-5/+6
| | | | | | | | | | | This way we can avoid making assumptions about the SDT probe implementation. No functional change intended. Signed-off-by: Mark Johnston <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* zfs.4: Document the actual default for zfs_txg_history (#16305)Mateusz Piotrowski2024-06-281-2/+2
| | | | | | | Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Mateusz Piotrowski <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Fix a mis-merge in the zdb man page (#16304)Allan Jude2024-06-281-2/+1
| | | | | | | | Sponsored-by: Klara, Inc. Sponsored-By: Wasabi Technology, Inc. Signed-off-by: Allan Jude <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Linux 6.9: Call add_disk() from workqueue to fix zfs_allow_010_pos (#16282)Tony Hutter2024-06-281-5/+97
| | | | | | | | | | | | | | | The 6.9 kernel behaves differently in how it releases block devices. In the common case it will async release the device only after the return to userspace. This is different from the 6.8 and older kernels which release the block devices synchronously. To get around this, call add_disk() from a workqueue so that the kernel uses a different codepath to release our zvols in the way we expect. This stops zfs_allow_010_pos from hanging. Fixes: #16089 Signed-off-by: Tony Hutter <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Rob Norris <[email protected]>
* disable automatic dependency tracking for dkms buildsMartin Wagner2024-06-131-0/+1
| | | | | | | | | | | Previously the dkms build left some unwanted files in `/usr/lib/modules` which could cause package managers to not properly clean up old kernels. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Martin Wagner <[email protected]> Closes #16221 Closes #16241
* FreeBSD: unregister mountroot eventhandler on unloadMateusz Guzik2024-06-131-7/+14
| | | | | | | | | | Otherwise if zfs is unloaded and reroot is being used it trips over a stale pointer. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Sponsored by: Rubicon Communications, LLC ("Netgate") Signed-off-by: Mateusz Guzik <[email protected]> Closes #16242
* FreeBSD: Update use of UMA-related symbols in arc_available_memorybnovkov2024-06-061-10/+10
| | | | | | | | | | Recent UMA changes repurposed the use of UMA_MD_SMALL_ALLOC in a way that breaks arc_available_memory on -CURRENT. This change ensures that arc_available_memory uses the new symbol while maintaining compatibility with older FreeBSD releases. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Bojan Novković <[email protected]> Closes #16230
* contrib/bash_completion.d: squelch FreeBSD seq when first < lastDerek Schrock2024-06-061-1/+1
| | | | | | | | | | | | With seq x -1 z and x is less than z FreeBSD seq will print the error: $ seq 1 -1 2 seq: needs positive increment Hide this error. Alternatively $COMP_CWORD could be checked for < 2. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Derek Schrock <[email protected]> Closes #16234