aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* FreeBSD: Improve taskq wrapperAlexander Motin2023-11-062-46/+46
| | | | | | | | | | | | | | | | | | | - Group tqent_task and tqent_timeout_task into a union. They are never used same time. This shrinks taskq_ent_t from 192 to 160 bytes. - Remove tqent_registered. Use tqent_id != 0 instead. - Remove tqent_cancelled. Use taskqueue pending counter instead. - Change tqent_type into uint_t. We don't need to pack it any more. - Change tqent_rc into uint_t, matching refcount(9). - Take shared locks in taskq_lookup(). - Call proper taskqueue_drain_timeout() for TIMEOUT_TASK in taskq_cancel_id() and taskq_wait_id(). - Switch from CK_LIST to regular LIST. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Mateusz Guzik <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #15356
* Fix block cloning between unencrypted and encrypted datasetsMartin Matuška2023-11-061-0/+9
| | | | | | | | | | | | | | | Block cloning from an encrypted dataset into an unencrypted dataset and vice versa is not possible. The current code did allow cloning unencrypted files into an encrypted dataset causing a panic when these were accessed. Block cloning between encrypted and encrypted is currently supported on the same filesystem only. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Kay Pedersen <[email protected]> Reviewed-by: Rob N <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Martin Matuska <[email protected]> Closes #15464 Closes #15465
* Tag 2.2.0zfs-2.2.0Brian Behlendorf2023-10-121-1/+1
| | | | | | | | | | | | | | | | | | | | | New Features - Block cloning (#13392) - Linux container support (#14070, #14097, #12263) - Scrub error log (#12812, #12355) - BLAKE3 checksums (#12918) - Corrective "zfs receive" - Vdev and zpool user properties Performance - Fully adaptive ARC (#14359) - SHA2 checksums (#13741) - Edon-R checksums (#13618) - Zstd early abort (#13244) - Prefetch improvements (#14603, #14516, #14402, #14243, #13452) - General optimization (#14121, #14123, #14039, #13680, #13613, #13606, #13576, #13553, #12789, #14925, #14948) Signed-off-by: Brian Behlendorf <[email protected]>
* Zpool can start allocating from metaslab before TRIMs have completedJason King2023-10-121-9/+19
| | | | | | | | | | | | | | | | | | | | | | | | When doing a manual TRIM on a zpool, the metaslab being TRIMmed is potentially re-enabled before all queued TRIM zios for that metaslab have completed. Since TRIM zios have the lowest priority, it is possible to get into a situation where allocations occur from the just re-enabled metaslab and cut ahead of queued TRIMs to the same metaslab. If the ranges overlap, this will cause corruption. We were able to trigger this pretty consistently with a small single top-level vdev zpool (i.e. small number of metaslabs) with heavy parallel write activity while performing a manual TRIM against a somewhat 'slow' device (so TRIMs took a bit of time to complete). With the patch, we've not been able to recreate it since. It was on illumos, but inspection of the OpenZFS trim code looks like the relevant pieces are largely unchanged and so it appears it would be vulnerable to the same issue. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jason King <[email protected]> Illumos-issue: https://www.illumos.org/issues/15939 Closes #15395
* spec: define _bashcompletiondir if undefinedBrian Behlendorf2023-10-111-0/+9
| | | | | | | Always define _bashcompletiondir in the spec file to a reasonable value when it is undefined. Required for `rpmbuild --rebuild <srpm>`. Signed-off-by: Brian Behlendorf <[email protected]> Closes #15396
* ZTS: Debug zfs_share_concurrent_shares failureBrian Behlendorf2023-10-101-5/+34
| | | | | | | | | | Update zfs_share_concurrent_shares test case to wait a few seconds and recheck that the filesystem isn't shared. The intent here is determine the nature of the error and if it may be a race. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Umer Saleem <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #15379
* CI: Move perl script to dist_noinst_DATABrian Behlendorf2023-10-101-1/+1
| | | | | | | | | | | Everything listed in dist_noinst_SCRIPTS is assumed to be a shell script, this generates a shellcheck SC1071 error since perl is not supported. Move update_authors.pl to dist_noinst_DATA with the other perl scripts. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Rob N <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #15392
* Ensure we call fput when cloning fails due to different devices.Daniel Berlin2023-10-101-2/+6
| | | | | | | | | | | Right now, zpl_ioctl_ficlone and zpl_ioctl_ficlonerange do not call put on the src fd if the source and destination are on two different devices. This leaves the source file held open in this case. Reviewed-by: Kay Pedersen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Daniel Berlin <[email protected]> Closes #15386
* ZTS: Remove zfs_allow_010_pos expection for FreeBSDBrian Behlendorf2023-10-101-1/+0
| | | | | | | | | This issue should now be address by PR #15376 and the exception for this test case be removed. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Umer Saleem <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #15382
* zvol: Temporally disable blk-mqTony Hutter2023-10-103-70/+1
| | | | | | | | | | | | | | | There was a report of zvol data loss (#15351) after enabling blk-mq on a zvol backed with 16k physical block sized disks. Out of an abundance of caution, do not allow the user to enable blk-mq until we can look into the issue. Note that blk-mq was not enabled by default on zvols. It was always opt-in via the zvol_use_blk_mq module parameter. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Addresses: #15351 Closes #15378
* AUTHORS: update with missing namesRob Norris2023-10-101-24/+337
| | | | | | | | | | | | This is generated by scripts/update_authors.pl. I've looked over the results fairly closely and while I don't think they're bad, they could be improved somewhat, but also, I don't know if its good form to just update this without explicit consent from those named. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #15374
* update_authors: add missing names from commits to AUTHORSRob Norris2023-10-102-0/+323
| | | | | | | | | Full description of what's happening in comments. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #15374
* mailmap: initial, trying to tidy up a lot of the commit historyRob Norris2023-10-102-0/+190
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This comes from the observation that a huge number of commit author fields look quite strange (to my eyes), but quite often the Signed-off-by: trailer has the correct name. For these I have updated the name where it was obvious how to do so, however, I have not created a mapping for the commit email to the Signed-off-by email, as whatever I choose for email will become the prime candidate for inclusion in the AUTHORS file, and care needs to be taken when acting without explicit consent. There's a small handful of commits that look like they were done on local machines, or CI hosts, or similar, where the git authorship config wasn't set up properly. Its obvious what this should look like, so I've just done them. The remainder is mapping Github noreply emails to either an obviously-correct Signed-off-by trailer, or to a an author from another commit. This was mostly done by hand, so there may be errors, but I think its close. I do not understand where these come from - I know that they're what commits made via Github web look like when there's no real address set on the account, but I find it hard to believe that so many of these came through the web, especially given the complexity of most of the changes. I suspect there's some kind of merge helper tool in play here. Regardless, the history is set now, and this tries to get it back on track. Obviously, all of this helps the history look tidy, but this also feeds into the AUTHORS update script. See next commit. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #15374
* ZTS: Fix verify_fs_mount in delegate_common.kshlibUmer Saleem2023-10-101-2/+2
| | | | | | | | | | | | | | | | | verify_fs_mount expects the dataset to remain unmounted after updating the mountpoint property in delegate_common.kshlib. This commit updates verify_fs_mount and uses nomount parameter for zfs set to update the mountpoint property without mounting the dataset. This fixes the zfs_allow_010_pos test case, which was failing on FreeBSD after the behavior update in setting the mountpoint property. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #15376
* ZTS: Move zpool_import_hostid_changed* tests to Linux runfileBrian Behlendorf2023-10-102-4/+7
| | | | | | | | Relocate the zpool_import_hostid_changed* test cases to the Linux runfile until these tests are modified to run cleanly on FreeBSD. Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #15377
* FreeBSD: Reduce divergence from in-tree sourcesAlexander Motin2023-10-1011-10/+15
| | | | | | | | | | This includes random small tweaks, primarily a build fixes, required when ZFS is built as part of FreeBSD base. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #15368
* config/zfs-build.m4: add Gentoo's bash-completion pathSam James2023-10-101-0/+1
| | | | | | | | | | | | Followup e69ade32e116e72d03068c03799924c3f1a15c95 by adding Gentoo's bash completion path. We should probably consider using/honouring the standard --with-bashcompletiondir autoconf option as well, but that's something to do later. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Umer Saleem <[email protected]> Signed-off-by: Sam James <[email protected]> Closes #15372
* Tag 2.2.0-rc5zfs-2.2.0-rc5Brian Behlendorf2023-10-071-1/+1
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* ZIL: Reduce maximum size of WR_COPIED to 7.5KAlexander Motin2023-10-072-6/+16
| | | | | | | | | | | | Benchmarks show that at certain write sizes range lock/unlock take not so much time as extra memory copy. The exact threshold is not obvious due to other overheads, but it is definitely lower than ~63KB used before. Make it configurable, defaulting at 7.5KB, that is 8KB of nearest malloc() size minus itx and lr structs. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #15353
* rpm: Fix `make rpm` on Debian/Ubuntusiv02023-10-072-1/+4
| | | | | | | | | | | | | | | | | | | | The recent patch to change the bash completion install location based on the Distribution, ignored that it should still be possible to create RPMs on Debian derived systems. Additionally `make deb` itself creates RPMs and converts them via `alien`. This patch adds the bashcompletiondir variable to the rpm defines and uses this for the location, where to get the bash completion file. It still changes the location on Debian/Ubuntu systems in the final packages from /etc/bash_completion.d to /usr/share/bash-completion/completions Fixes: e69ade32e116e72d03068c03799924c3f1a15c95 Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Stoiko Ivanov <[email protected]> Closes #15355 Closes #15365
* import: require force when cachefile hostid doesn't match on-diskRob Norris2023-10-073-12/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if a cachefile is passed to zpool import, the cached config is mostly offered as-is to ZFS_IOC_POOL_TRYIMPORT->spa_tryimport(), and the results are taken as the canonical pool config and handed back to ZFS_IOC_POOL_IMPORT. In the course of its operation, spa_load() will inspect the pool and build a new config from what it finds on disk. However, it then regenerates a new config ready to import, and so rightly sets the hostid and hostname for the local host in the config it returns. Because of this, the "require force" checks always decide the pool is exported and last touched by the local host, even if this is not true, which is possible in a HA environment when MMP is not enabled. The pool may be imported on another head, but the import checks still pass here, so the pool ends up imported on both. (This doesn't happen when a cachefile isn't used, because the pool config is discovered in userspace in zpool_find_import(), and that does find the on-disk hostid and hostname correctly). Since the systemd zfs-import-cache.service unit uses cachefile imports, this can lead to a system returning after a crash with a "valid" cachefile on disk and automatically, quietly, importing a pool that has already been taken up by a secondary head. This commit causes the on-disk hostid and hostname to be included in the ZPOOL_CONFIG_LOAD_INFO item in the returned config, and then changes the "force" checks for zpool import to use them if present. This method should give no change in behaviour for old userspace on new kernels (they won't know to look for the new config items) and for new userspace on old kernels (the won't find the new config items). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #15290
* tests: add tests for zpool import behaviour when hostid changesRob Norris2023-10-078-0/+279
| | | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #15290
* zfsconcepts: add description of block cloningRob N2023-10-071-1/+39
| | | | | | | | | | | | | Here I'm trying to succinctly introduce the concept, the basics of its construction, how its different to dedup, how to use it, and where its limitations lie, in four paragraphs and with enough searchable terms to help the reader find more information both within OpenZFS and elsewhere. Phew. Sponsored-By: Klara, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #15362
* Reduce number of metaslab preload taskq threads.Alexander Motin2023-10-076-47/+37
| | | | | | | | | | | | | | | | | | Before this change ZFS created threads for 50% of CPUs for each top- level vdev. Plus it created the same number of threads for embedded log groups (that have only one metaslab and don't need any preload). As result, on system with 80 CPUs and pool of 60 vdevs this resulted in 4800 metaslab preload threads, that is absolutely insane. This patch changes the preload threads to 50% of CPUs in one taskq per pool, so on the mentioned system it will be only 40 threads. Among other things this fixes zdb on the mentioned system and pool on FreeBSD, that failed to create so many threads in one process. Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #15319
* CI: add FreeBSD build with Cirrus CIMartin Matuška2023-10-072-1/+22
| | | | | | | | | As a first step for automatic FreeBSD testing add a build and install for FreeBSD versions 12.4, 13.2 and 14-snapshot using Cirrus CI. Reviewed-by: Jose Luis Duran Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Martin Matuska <[email protected]> Closes #15332
* tests/block_cloning: sync before write in fallback testRob N2023-10-071-0/+2
| | | | | | | | | | | | | We're still seeing this test fail intermittently (that is, the clone happens), which must mean the write and the clone can still be happening on different txgs. It might be that there's still activity after the pool is created. So here we force a sync before starting the write. Sponsored-By: Klara Inc. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #15359
* ARC: Drop different size headers for cryptoAlexander Motin2023-10-071-175/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | To reduce memory usage ZFS crypto allocated bigger by 56 bytes ARC headers only when specific block was encrypted on disk. It was a nice optimization, except in some cases the code reallocated them on fly, that invalidated header pointers from the buffers. Since the buffers use different locking, it created number of races, that were originally covered (at least partially) by b_evict_lock, used also to protection evictions. But it has gone as part of #14340. As result, as was found in #15293, arc_hdr_realloc_crypt() ended up unprotected and causing use-after-free. Instead of introducing some even more elaborate locking, this patch just drops the difference between normal and protected headers. It cost us additional 56 bytes per header, but with couple patches saving 24 bytes, the net growth is only 32 bytes with total header size of 232 bytes on FreeBSD, that IMHO is acceptable price for simplicity. Additional locking would also end up consuming space, time or both. Reviewe-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #15293 Closes #15347
* ARC: Remove b_bufcnt/b_ebufcnt from ARC headersAlexander Motin2023-10-073-83/+42
| | | | | | | | | | | | | | | | | | In most cases we do not care about exact number of buffers linked to the header, we just need to know if it is zero, non-zero or one. That can easily be checked just looking on b_buf pointer or in some cases derefencing it. b_ebufcnt is read only once, and in that case we already traverse the list as part of arc_buf_remove(), so second traverse should not be expensive. This reduces L1 ARC header size by 8 bytes and full crypto header by 16 bytes, down to 176 and 232 bytes on FreeBSD respectively. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #15350
* ARC: Remove b_cv from struct l1arc_buf_hdrAlexander Motin2023-10-072-14/+22
| | | | | | | | | | | | | Earlier as part of #14123 I've removed one use of b_cv. This patch reuses the same approach to remove the other one from much more rare code path. This saves 16 bytes of L1 ARC header on FreeBSD (reducing it from 200 to 184 bytes) and seems even more on Linux. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #15340
* Add BTI landing pads to the AArch64 SHA2 assemblyAndrew Turner2023-10-042-0/+5
| | | | | | | | | | | | | | | The Arm Branch Target Identification (BTI) extension guards against branching to an unintended instruction. To support BTI add the landing pad instructions to the SHA2 functions. These are from the hint space so are a nop on hardware that lacks BTI support or if BTI isn't enabled. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Andrew Turner <[email protected]> Closes #14862 Closes #15339
* Add '-u' - nomount flag for zfs setUmer Saleem2023-10-0311-33/+216
| | | | | | | | | | | | | | | | | | | | | | | | This commit adds '-u' flag for zfs set operation. With this flag, mountpoint, sharenfs and sharesmb properties can be updated without actually mounting or sharing the dataset. Previously, if dataset was unmounted, and mountpoint property was updated, dataset was not mounted after the update. This behavior is changed in #15240. We mount the dataset whenever mountpoint property is updated, regardless if it's mounted or not. To provide the user with option to keep the dataset unmounted and still update the mountpoint without mounting the dataset, '-u' flag can be used. If any of mountpoint, sharenfs or sharesmb properties are updated with '-u' flag, the property is set to desired value but the operation to (re/un)mount and/or (re/un)share the dataset is not performed and dataset remains as it was before. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #15322
* Improve the handling of sharesmb,sharenfs propertiesUmer Saleem2023-10-034-11/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | For sharesmb and sharenfs properties, the status of setting the property is tied with whether we succeed to share the dataset or not. In case sharing the dataset is not successful, this is treated as overall failure of setting the property. In this case, if we check the property after the failure, it is set to on. This commit updates this behavior and the status of setting the share properties is not returned as failure, when we fail to share the dataset. For sharenfs property, if access list is provided, the syntax errors in access list/host adresses are not validated until after setting the property during postfix phase while trying to share the dataset. This is not correct, since the property has already been set when we reach there. Syntax errors in access list/host addresses are validated while validating the property list, before setting the property and failure is returned to user in this case when there are errors in access list. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #15240
* Update the behavior of mountpoint propertyUmer Saleem2023-10-037-24/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are some inconsistencies in the handling of mountpoint property. This commit updates the behavior and makes it consistent. If mountpoint property is set when dataset is unmounted, this would update the mountpoint property. The mountpoint could be valid or invalid in this case. Setting the mountpoint property would result in success in this case. Dataset would still be unmounted here. On the other hand, if dataset is mounted and mountpoint property is updated to something invalid where mount cannot be successful, for example, setting the mountpoint inside a readonly directory. This would unmount the dataset, set the mountpoint property to requested value and tries to mount the dataset. The mount operation returns error and this error is treated as overall failure of setting the property while the property is actually set. To make the behavior consistent in case dataset is mounted or unmounted, we should try to mount the dataset whenever mountpoint property is updated. This would result in mounting the datasets if canmount property is set to on, regardless if the dataset was previously unmounted. The failure in mount operation while setting the mountpoint property should not be treated as failure, since the property is actually set now to user requested value. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #15240
* contrib: debian: drop bashcompletion mangling after installStoiko Ivanov2023-10-032-6/+0
| | | | | | | | | | | | | | | tested by running: ``` ./configure --with-config=user; cp -a contrib/debian . dpkg-buildpackage -b -uc -us ``` on a Debian 12 based system. and checking where the completion file got installed. Reviewed-by: Umer Saleem <[email protected]> Signed-off-by: Stoiko Ivanov <[email protected]> Closes #15304
* contrib: debian: switch to dh-sequence-dkmsStoiko Ivanov2023-10-031-1/+1
| | | | | | | | | | | | | | | | Follows b191f9a13d3005621ead9a727b811892264505ef from Debian's packaging team at: https://salsa.debian.org/zfsonlinux-team/zfs/ The previous build-dependency is kept as option, to still be able to build on older Debian based distros (e.g. Ubuntu 20.04). Without this building on Debian 12/bookworm does not work, as `dkms` is a virtual package. Reviewed-by: Umer Saleem <[email protected]> Signed-off-by: Stoiko Ivanov <[email protected]> Closes #15304
* contrib: bash_completion.d: make install destination vendor dependentStoiko Ivanov2023-10-032-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Certain Linux distributions (Debian/Ubuntu at least) expect bash-completion snippets to be installed in /usr/share/bash-completion/completions instead of /etc/bash_completion.d. This patch sets the bashcompletiondir variable based on the vendor, inspired by similar settings for initdir and initconfdir. It seems that commit 612b8dff5bc3d827efb864a199a62bda1a419254 caused the file to be installed in the first-place (thus the error when building debian packages only became apparent when testing a 2.2.0-rc4 build) The change only sets the variable in Makefile context - the rpm/zfs.spec.in file has the path hardcoded as %{_sysconfdir}/bash_completion.d/zfs, but since running ``` ./configure --sysconfdir=/myetc ; make rpm ``` also results in all relevant files to be installed in /etc instead of /myetc I assume this can remain as is. Reviewed-by: Umer Saleem <[email protected]> Signed-off-by: Stoiko Ivanov <[email protected]> Closes #15304
* Fix invalid pointer access in trace_dbuf.hChunwei Chen2023-10-031-2/+6
| | | | | | | | | | In dnode_destroy, dn_objset is invalidated. However, it will later call into dbuf_destroy, in which DTRACE_SET_STATE will try to access spa_name via dn_objset causing illegal pointer access. Reviewed-by: Brian Atkinson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #15333
* Report ashift of L2ARC devices in zdbGeorge Amanakis2023-10-032-1/+11
| | | | | | | | | | Commit 8af1104f does not actually store the ashift of cache devices in their label. However, in order to facilitate reporting the ashift through zdb, we enable this in the present commit. We also document how the retrieval of the ashift is done. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #15331
* Restrict short block cloning requestsAlexander Motin2023-10-031-0/+13
| | | | | | | | | | | | | | If we are copying only one block and it is smaller than recordsize property, do not allow destination to grow beyond one block if it is not there yet. Otherwise the destination will get stuck with that block size forever, that can be as small as 512 bytes, no matter how big the destination grow later. Reviewed-by: Kay Pedersen <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #15321
* Tweak rebuild in-flight hard limitBrian Behlendorf2023-10-031-2/+2
| | | | | | | | | Vendor testing shows we should be able to get a little more performance if we further relax the hard limit which we're hitting. Authored-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #15324
* Fix ENOSPC for extended quotaAkash B2023-09-282-15/+19
| | | | | | | | | | | | | | | | | | | When unlinking multiple files from a pool at 100% capacity, it was possible for ENOSPC to be returned after the first few unlinks. This issue was fixed previously by PR #13172 but then this was again introduced by PR #13839. This is resolved using the existing mechanism of returning ERESTART when over quota as long as we know enough space will shortly be available after processing the pending deferred frees. Also, updated the existing testcase which reliably reproduced the issue without this patch. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Dipak Ghosh <[email protected]> Signed-off-by: Akash B <[email protected]> Closes #15312
* Don't allocate from new metaslabsPaul Dagnelie2023-09-281-0/+9
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #15307 Closes #15308
* Reduce trim min size even lower for tests to reduce flakiness Paul Dagnelie2023-09-287-7/+7
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #15315
* ZTS: Fix introduced test bug in block_cloning_copyfilerangePaul Dagnelie2023-09-281-1/+0
| | | | | | Reviewed-by: John Wren Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #15316
* ZTS: Add additional exceptionsBrian Behlendorf2023-09-281-2/+2
| | | | | | | | | | | | "zfs_share_concurrent_shares" may fail on FreeBSD and some Linux distributions (fedora). Move it to the common list. "zfs_allow_010_pos" has been observed to fail on FreeBSD 13. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Kay Pedersen <[email protected]> Reviewed-by: Umer Saleem <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #15306
* Set timeout before creating pool in testPaul Dagnelie2023-09-281-2/+2
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #15309
* Invoke zdb by guid to avoid import errorsPaul Dagnelie2023-09-281-9/+12
| | | | | | | | | | | | | | | | | | | | | The problem that was occurring is basically that a device was removed by ztest and replaced with another device. It was then reguided. The import then failed because there were two possible imports with the same name; one with the new guid, and one with the old. This can happen because the label writes from the device removal/replacement can be subject to ztest's error injection. The other ways to fix this would be to change the error injection to not trigger on removals (which may not be technically feasible), or to change the import code to not report configurations that are so short on devices (which would potentially have unpleasant end-user effects when trying to recover from data losses/device configuration issues). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #15298
* ZIL: Avoid dbuf_read() in ztest_get_data()Alexander Motin2023-09-281-2/+1
| | | | | | | | | | While working on similar patches for zfs and zvol in #15153 I've forgot about ztest. Update it also so that we test the same code paths as use in production. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #15301
* tests/block_cloning: try harder to stay on same txg in fallback testRob N2023-09-221-0/+1
| | | | | | | | | | | | | | | | | | | | We've observed this test failing intermittently. When it does, the "same block" check shows that both files have the same content, that is, the file was cloned. The only way this could have happened is if the open txg moved between the dd and clonefile calls. That's possible because although we set zfs_txg_timeout to be large, that only affects the wait time in the sync thread at the start of a new txg; it doesn't change anything if its currently waiting or working. So here we just force the txgs to move immediately before, which should get both operations onto the same txg as intented. Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris Rob Norris <[email protected]> Closes #15303
* status: report pool suspension state under failmode=continueRob N2023-09-222-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When failmode=continue is set and the pool suspends, both 'zpool status' and the 'zfs/pool/state' kstat ignore it and report the normal vdev tree state. There's no clear indicator that the pool is suspended. This is unlike suspend in failmode=wait, or suspend due to MMP check failure, which both report "SUSPENDED" explicitly. This commit changes it so SUSPENDED is reported for failmode=continue the same as for other modes. Rationale: The historical behaviour of failmode=continue is roughly, "press on as though all is well". To this end, the fact that the pool had suspended was not shown, to maintain the façade that all is well. Its unclear why hiding this information was considered appropriate. One possibility is that it was expected that a true pool fault would always be reported as DEGRADED or FAULTED, and that the pool could not suspend without these happening. That is not necessarily true, as vdev health and suspend state are only loosely connected, such that a pool in (apparent) good health can be suspended for good reasons, and of course a degraded pool does not lead to suspension. Even if that expectation were true, there's still a difference in urgency - a degraded pool may not need to be attended to for hours, while a suspended pool is most often unusable until an operator intervenes. An operator that has set failmode=continue has presumably done so because their workload is one that can continue to operate in a useful way when the pool suspends. In this case the operator still needs a clear indicator that there is a problem that needs attending to. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #15297