aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* FreeBSD: Do zcommon_init sooner to avoid FPU panicRyan Moeller2020-12-233-1/+10
| | | | | | | | | | | | | | | | | | | | | There has been a panic affecting some system configurations where the thread FPU context is disturbed during the fletcher 4 benchmarks, leading to a panic at boot. module_init() registers zcommon_init to run in the last subsystem (SI_SUB_LAST). Running it as soon as interrupts have been configured (SI_SUB_INT_CONFIG_HOOKS) makes sure we have finished the benchmarks before we start doing other things. While it's not clear *how* the FPU context was being disturbed, this does seem to avoid it. Add a module_init_early() macro to run zcommon_init() at this earlier point on FreeBSD. On Linux this is defined as module_init(). Authored by: Konstantin Belousov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11302
* mount_zfs: print strerror instead of errno for error reportingÉrico Nogueira Rolim2020-12-231-6/+6
| | | | | | | | Tracking down an error message with the errno value can be difficult, using strerror makes the error message clearer. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Érico Rolim <[email protected]> Closes #11303
* Drop path prefix workaroundsterlingjensen2020-12-232-61/+40
| | | | | | | Canonicalization, the source of the trouble, was disabled in 9000a9f. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Sterling Jensen <[email protected]> Closes #11295
* Delete rw_semaphore.wait_lock configure checkOrivej Desh2020-12-231-28/+0
| | | | | | | | | | | Last use of wait_lock was removed in "Linux 5.3 compat: retire rw_tryupgrade()" (e7a99dab2b065ac2f8736a65d1b226d21754d771). Fixes the issue reported in https://github.com/openzfs/zfs/issues/11097#issuecomment-714532367 Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Orivej Desh <[email protected]> Closes #11309
* Fix optional "force" arg handing in zfs_ioc_pool_sync()Brian Behlendorf2020-12-231-4/+7
| | | | | | | | | | The fnvlist_lookup_boolean_value() function should not be used to check the force argument since it's optional. It may not be provided or may have been created with the wrong flags. Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11281 Closes #11284
* CI: add new zfs-tests-sanity workflowGeorge Melikov2020-12-232-1/+59
| | | | | | | | | Run zfs-tests with sanity.run for brief results. Timeouts are rare, so minimize false positives by increasing the default from 60 to 180 seconds. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #11304
* ZTS: zpool_trim tests throttle trim processGeorge Melikov2020-12-233-5/+26
| | | | | | | Otherwise trim may finish before progress checks. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #11296
* Reduce fletcher4 and raidz benchmark timesBrian Behlendorf2020-12-232-3/+3
| | | | | | | | | | | | | | | | | | | | During module load time all of the available fetcher4 and raidz implementations are benchmarked for a fixed amount of time to determine the fastest available. Manual testing has shown that this time can be significantly reduced with negligible effect on the final results. This commit changes the benchmark time to 1ms which can reduce the module load time by over a second on x86_64. On an x86_64 system with sse3, ssse3, and avx2 instructions the benchmark times are: Fletcher4 603ms -> 15ms RAIDZ 1,322ms -> 64ms Reviewed-by: Matthew Macy <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11282
* ZTS: adjust zpool_import_012_pos timeoutBrian Behlendorf2020-12-231-0/+1
| | | | | | | | | | | | When running in the CI the zpool_import_012_pos test case occasionally takes longer than the maximum 600 seconds. When this happens the test case is considered to have failed but always completes a few minutes latter. Since the logs suggest nothing has actually failed this commit increases timeout and removes the exception. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11286
* ZTS: Update zfs_share_concurrent_shares.kshBrian Behlendorf2020-12-231-6/+6
| | | | | | | | | | | | | | Occasionally an out of memory error is hit by this test case when mounting the filesystems. Try and reduce the likelihood of this occurring by reducing the thread count from 100 to 50. It also has the advantage of slightly speeding up the test. cannot mount 'testpool/testfs3/79': Cannot allocate memory filesystem successfully created, but not mounted Reviewed-by: George Melikov <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11283
* Add sanity.run fileBrian Behlendorf2020-12-233-0/+621
| | | | | | | | | | | | | | | | | | | | This run file contains a subset of functional tests which exercise as much functionality as possible while still executing relatively quickly. The included tests should take no more than a few seconds each to run at most. This provides a convenient way to sanity test a change before committing to a full test run which takes several hours. $ ./scripts/zfs-tests.sh -r sanity ... Results Summary PASS 813 Running Time: 00:14:42 Percent passed: 100.0% Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11271
* Fix trivial typo in zfs-diff.8melak2020-12-231-1/+1
| | | | | | | Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tamas TEVESZ <[email protected]> Closes #11268 Closes #11272
* Fix for "Reduce latency effects of non-interactive I/O"Alexander Motin2020-12-231-4/+4
| | | | | | | | | | | | | | | | | It was found that setting min_active tunables for non-interactive I/Os makes them stuck. It is caused by zfs_vdev_nia_delay, that can never be reached if we never issue any I/Os due to min_active set to zero. Fix this by issuing at least one non-interactive I/O at a time when there are no interactive I/Os. When there are interactive I/Os, zero min_active allows to completely block any non-interactive I/O. It may min_active starvation in some scenarios, but who we are to deny foot shooting? Reviewed-by: George Melikov <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #11261
* Reduce latency effects of non-interactive I/OAlexander Motin2020-12-233-18/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Investigating influence of scrub (especially sequential) on random read latency I've noticed that on some HDDs single 4KB read may take up to 4 seconds! Deeper investigation shown that many HDDs heavily prioritize sequential reads even when those are submitted with queue depth of 1. This patch addresses the latency from two sides: - by using _min_active queue depths for non-interactive requests while the interactive request(s) are active and few requests after; - by throttling it further if no interactive requests has completed while configured amount of non-interactive did. While there, I've also modified vdev_queue_class_to_issue() to give more chances to schedule at least _min_active requests to the lowest priorities. It should reduce starvation if several non-interactive processes are running same time with some interactive and I think should make possible setting of zfs_vdev_max_active to as low as 1. I've benchmarked this change with 4KB random reads from ZVOL with 16KB block size on newly written non-fragmented pool. On fragmented pool I also saw improvements, but not so dramatic. Below are log2 histograms of the random read latency in milliseconds for different devices: 4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before: 0, 0, 2, 1, 12, 21, 19, 18, 10, 15, 17, 21 after: 0, 0, 0, 24, 101, 195, 419, 250, 47, 4, 0, 0 , that means maximum latency reduction from 2s to 500ms. 4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before: 0, 0, 2, 31, 38, 28, 18, 12, 17, 20, 24, 10, 3 after: 0, 0, 55, 247, 455, 470, 412, 181, 36, 0, 0, 0, 0 , i.e. from 4s to 250ms. 1 SAS HDD SEAGATE ST14000NM0048 before: 0, 0, 29, 70, 107, 45, 27, 1, 0, 0, 1, 4, 19 after: 1, 29, 681, 1261, 676, 1633, 67, 1, 0, 0, 0, 0, 0 , i.e. from 4s to 125ms. 1 SAS SSD SEAGATE XS3840TE70014 before (microseconds): 0, 0, 0, 0, 0, 0, 0, 0, 70, 18343, 82548, 618 after: 0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844, 90 I've also measured scrub time during the test and on idle pools. On idle fragmented pool I've measured scrub getting few percent faster due to use of QD3 instead of QD2 before. On idle non-fragmented pool I've measured no difference. On busy non-fragmented pool I've measured scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #11166
* Add compatibility for busybox mktempqzdanis2020-12-233-7/+7
| | | | | | | | | | | Busybox's mktemp requires at least six X's in the template, causing the current sed --in-place check to fail because the file does not exist. This change adds additional X's to mktemp templates that do not already have at least six X's in them. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Quentin Zdanis <[email protected]> Closes #11269
* FreeBSD: notify userspace when a vdev is removedRyan Moeller2020-12-232-0/+5
| | | | | | | | This is needed for zfsd to autoreplace vdevs. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11260
* Make zpool status "remove:" label print in boldAndrew Sun2020-12-231-1/+1
| | | | | | | | | | When ZFS_COLOR is set, zpool status shows row headings in bold, except for the "remove:" heading. This is a quick fix that makes it print in bold too. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrew Sun <[email protected]> Closes #11255
* CI: simplify checkstyle runnerGeorge Melikov2020-12-231-1/+1
| | | | | | | | Remove excess steps. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #11262
* ZED/zfs-list-cacher.sh: don't exit on ignored event typePrawn2020-12-191-1/+1
| | | | | | | | | | | | | | Check for the history_event type instead. The zfs-list-cacher.sh script currently respects the event types excluded from syslog(!) in ZED_SYSLOG_SUBCLASS_EXCLUDE. This makes little sense in this single-purpose script and silently breaks when history_events are excluded from syslog, which is the default since 13d65987a9d9958de77422f5d9d25b47e486537d. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: InsanePrawn <[email protected]> Closes #11164 Closes #11347
* Tag 2.0.0zfs-2.0.0Brian Behlendorf2020-11-301-1/+1
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* Verify zfs module loaded before starting servicesBrian Behlendorf2020-11-303-0/+3
| | | | | | | | | Extend the change made in ae12b02 to verify the zfs kernel modules are loaded to the rest of the OpenZFS services. If the modules aren't loaded the neither the share, volume, or and zed services can be started. Signed-off-by: Brian Behlendorf <[email protected]> Closes #11243
* dracut: use /bin/sh instead of bash as the intepreterĐoàn Trần Công Danh2020-11-308-19/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Despite that dracut has a hard dependency on bash, its modules doesn't, dracut only has a hard dependency on bash for module-setup (on a fully usable machine). Inside initramfs, dracut allows users choose from a list of handful other shells, e.g. bash, busybox, dash, mkfsh. In fact, my local machine's initramfs is being built with dash, and it's functional for a very long time. Before 64025fa3a (Silence 'make checkbashisms', 2020-08-20), we also allows our users to have that right, too. Let's fix the problem 'make checkbashisms' reported and allows our users to have that right, again. For 'plymouth' case, let's simply run the command inside the if instead of checking for the existence of command before running it, because the status is also failture if plymouth is unavailable. While we're at it, let's remove an unnecessary fork for grep in zfs-generator.sh.in and its following complicated 'if elif fi' with a simple 'case ... esac'. To support this change, also exclude 90zfs from "make checkbashisms" because the current CI infrastructure ships an old version of "checkbashisms", which complains about "command -v", while the current latest "checkbashisms" thinks it's fine. In the near future, we can revert that change to "Makefile.am" when CI infrastructure is updated. Reviewed-by: Gabriel A. Devenyi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Đoàn Trần Công Danh <[email protected]> Closes #11244
* Revert "Reduce latency effects of non-interactive I/O"Brian Behlendorf2020-11-303-145/+18
| | | | | | | | | Under certain conditions commit a3a4b8def appears to result in a hang, or poor performance, when importing a pool. Until the root cause can be identified it has been reverted from the release branch. Signed-off-by: Brian Behlendorf <[email protected]> Issue #11245
* Tag 2.0.0-rc7zfs-2.0.0-rc7Brian Behlendorf2020-11-251-1/+1
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* Reduce latency effects of non-interactive I/OAlexander Motin2020-11-253-18/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Investigating influence of scrub (especially sequential) on random read latency I've noticed that on some HDDs single 4KB read may take up to 4 seconds! Deeper investigation shown that many HDDs heavily prioritize sequential reads even when those are submitted with queue depth of 1. This patch addresses the latency from two sides: - by using _min_active queue depths for non-interactive requests while the interactive request(s) are active and few requests after; - by throttling it further if no interactive requests has completed while configured amount of non-interactive did. While there, I've also modified vdev_queue_class_to_issue() to give more chances to schedule at least _min_active requests to the lowest priorities. It should reduce starvation if several non-interactive processes are running same time with some interactive and I think should make possible setting of zfs_vdev_max_active to as low as 1. I've benchmarked this change with 4KB random reads from ZVOL with 16KB block size on newly written non-fragmented pool. On fragmented pool I also saw improvements, but not so dramatic. Below are log2 histograms of the random read latency in milliseconds for different devices: 4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before: 0, 0, 2, 1, 12, 21, 19, 18, 10, 15, 17, 21 after: 0, 0, 0, 24, 101, 195, 419, 250, 47, 4, 0, 0 , that means maximum latency reduction from 2s to 500ms. 4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before: 0, 0, 2, 31, 38, 28, 18, 12, 17, 20, 24, 10, 3 after: 0, 0, 55, 247, 455, 470, 412, 181, 36, 0, 0, 0, 0 , i.e. from 4s to 250ms. 1 SAS HDD SEAGATE ST14000NM0048 before: 0, 0, 29, 70, 107, 45, 27, 1, 0, 0, 1, 4, 19 after: 1, 29, 681, 1261, 676, 1633, 67, 1, 0, 0, 0, 0, 0 , i.e. from 4s to 125ms. 1 SAS SSD SEAGATE XS3840TE70014 before (microseconds): 0, 0, 0, 0, 0, 0, 0, 0, 70, 18343, 82548, 618 after: 0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844, 90 I've also measured scrub time during the test and on idle pools. On idle fragmented pool I've measured scrub getting few percent faster due to use of QD3 instead of QD2 before. On idle non-fragmented pool I've measured no difference. On busy non-fragmented pool I've measured scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #11166
* pam_zfs_key: accommodate different dataset naming schemecragw2020-11-251-0/+54
| | | | | | | | | | | | | | | | Name of dataset for user home directory may vary from the expected $homes_prefix/$username, if different naming scheme is being used. We can use property mountpoint to specify the dataset for $username as long as its value is identical to passwd's pw_dir. For example: NAME PROPERTY VALUE rpool/home/myuser_123456 mountpoint /home/myuser Reviewed-by: Felix Dörre <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Crag Wang <[email protected]> Closes #11165
* FreeBSD: decouple ZFS_DEBUG from kernel debug settingsMatthew Macy2020-11-251-1/+7
| | | | | | Reviewed-by: Martelli Nikola @martellini Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #11213
* Obsolete earlier packages due to version bumpBrian Behlendorf2020-11-241-0/+5
| | | | | | | | | In order for package managers such as dnf to upgrade cleanly after the package SONAME bump the obsolete package names must be known. Update the new packages to correctly obsolete the old ones. Signed-off-by: Brian Behlendorf <[email protected]> Closes #11230 Closes #11233
* libzfsbootenv: do not depend on libnvpairAntonio Russo2020-11-241-1/+1
| | | | | | | | | We do not build libnvpair.pc. Moreover, it is automatically pulled in by libzfs.pc, so no additional specific dependency is required. Reviewed by: Toomas Soome <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #11227
* Include the ABI with dist tarballBrian Behlendorf2020-11-225-1/+16
| | | | | | | The ABI should be included when generating the `make dist` tarball since it's required by the `make checkabi` target. Signed-off-by: Brian Behlendorf <[email protected]> Closes #11225
* Correct missing zil_claim() DTL updatesBrian Behlendorf2020-11-221-14/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a1d477c2 accidentally disabled DTL updates for the zil_claim() case described at the end of vdev_stat_update() by unconditionally disabling all DTL updates when loading. This was done to avoid a deadlock on the vd_dtl_lock when loading the DTLs from disk. vdev_dtl_contains <--- Takes vd->vd_dtl_lock vdev_mirror_child_missing vdev_mirror_io_start zio_vdev_io_start __zio_execute arc_read dbuf_issue_final_prefetch dbuf_prefetch_impl dbuf_prefetch dmu_prefetch space_map_iterate space_map_load_length space_map_load vdev_dtl_load <--- Takes vd->vd_dtl_lock vdev_load spa_ld_load_vdev_metadata spa_tryimport The missing DTL updates can be restored by moving the space_map_load() call outside the vd_dtl_lock. A private range tree is populated by reading the space map and then merged in to the DTL_MISSING tree under the lock. Furthermore, the SPA_LOAD_NONE check in vdev_dtl_contains() leads to an additional problem. Any resilvering which occurs before SPA_LOAD_NONE is set will incorrectly determine that there's nothing to repair. This can result in full redundancy not being restored for some blocks. Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11218
* Track SONAME version bump in packagingAntonio Russo2020-11-202-42/+42
| | | | | | | | | | RPM and DEB packages are named after the SONAME version of the library they contain. After bumping this version, the packaging should be renamed. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #11219
* Enable ABI checks for the checkstyle workflowBrian Behlendorf2020-11-181-1/+5
| | | | | | | | For the OpenZFS 2.0 release branch extend the CI checkstyle workflow to perform the library ABI checks. Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11215
* Add ABI snapshotBrian Behlendorf2020-11-175-0/+12318
| | | | | | | | | | | | Add a snapshot of the OpenZFS 2.0 ABI using libabigail-1.7-2. The included ABI passes `make checkabi` for CentOS 7, Fedora 33, Debian 10, and Ubuntu 20.04. This covers a fairly wide range of glibc, gcc, and libabigail versions plus other changes which are platform specific. Reviewed-by: Antonio Russo <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11144
* Library ABI tracking with abigailAntonio Russo2020-11-1713-1/+93
| | | | | | | | | | | | | | | Provide two make targets: checkabi and storeabi. storeabi uses libabigail to generate a reference copy of the ABI for the public libraries. checkabi compares such a reference to the compiled version, failing if they are not compatible. No ABI is generated for libzpool.so, it is only used by ztest and zdb and not external consumers. Co-authored-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #11144
* Fix problems in zvol_set_volmode_implMatthew Macy2020-11-174-31/+83
| | | | | | | | | | | | | | - Don't leave fstrans set when passed a snapshot - Don't remove minor if volmode already matches new value - (FreeBSD) Wait for GEOM ops to complete before trying remove (at create time GEOM will be "tasting" in parallel) - (FreeBSD) Don't leak zvol_state_lock on open if zv == NULL - (FreeBSD) Don't try to unlock zv->zv_state lock if zv == NULL Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #11199
* zpool: correctly align columns with -pнаб2020-11-175-29/+40
| | | | | | | | | | zpool_expand_proplist() now ignores pl_fixed if its new literal argument is true. The rest is a consequence of needing to pass that down. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiao?=~Dska <[email protected]> Closes #11202
* zpool(8): fix pool-wi[sd]e typoнаб2020-11-171-1/+1
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #11202
* Fix 'zfs userspace' for received datasets in encrypted rootloli10K2020-11-176-20/+153
| | | | | | | | | | | | | | For encrypted receives, where user accounting is initially disabled on creation, both 'zfs userspace' and 'zfs groupspace' fails with EOPNOTSUPP: this is because dmu_objset_id_quota_upgrade_cb() forgets to set OBJSET_FLAG_USERACCOUNTING_COMPLETE on the objset flags after a successful dmu_objset_space_upgrade(). Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #9501 Closes #9596
* Fix ASSERT logic in l2arc_evict()George Amanakis2020-11-171-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of cache device removal it is possible that at the end of l2arc_evict() we have l2ad_hand = l2ad_evict. This can lead to the following panic in case of a debug build: VERIFY3(dev->l2ad_hand < dev->l2ad_evict) failed (321920512 < 321920512) Call Trace: dump_stack+0x66/0x90 spl_panic+0xef/0x117 [spl] l2arc_remove_vdev+0x11d/0x290 [zfs] spa_load_l2cache+0x275/0x5b0 [zfs] spa_vdev_remove+0x4a5/0x6e0 [zfs] zfs_ioc_vdev_remove+0x59/0xa0 [zfs] zfsdev_ioctl_common+0x5b3/0x630 [zfs] zfsdev_ioctl+0x53/0xe0 [zfs] do_vfs_ioctl+0x42e/0x6b0 ksys_ioctl+0x5e/0x90 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 In case of cache device removal it also possible that l2ad_hand + distance > l2ad_end since we do not iterate l2arc_evict() and l2ad_hand is not reset. This has no functional consequence however as the cache device is about to be removed. Fix this by omitting the ASSERT in case of device removal. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #11205
* config/dracut/90zfs: handle cases where hostid(1) returns all zerosÉrico Rolim2020-11-171-1/+7
| | | | | | | | | | | | | | | | | | | | | On systems with musl libc, hostid(1) always prints "00000000", which will cause improper behavior when the 90zfs module is configured in a dracut initramfs. Work around this by copying the host /etc/hostid if the file exists, and otherwise only write /etc/hostid if hostid(1) returns something meaningful. This avoids zgenhostid creating a random /etc/hostid for the initramfs, which could lead to errors when trying to import the pool if spl_hostid isn't defined in the kernel command line. Furthermore, tag the /etc/hostid file as hostonly, since it is system specific and shouldn't be taken into account when trying to use an initramfs generated in one system to boot into a different system. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Georgy Yakovlev <[email protected]> Co-authored-by: Andrew J. Hesford <[email protected]> Signed-off-by: Érico Rolim <[email protected]> Closes #11174 Closes #11189
* zgenhostid: accept hostid arguments equal to zero.Érico Rolim2020-11-172-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A common usage pattern for zgenhostid, including in the ZFS dracut module, is running it as: zgenhostid $(hostid) However, zgenhostid only accepted hostid arguments greater than 0, which meant that, when the output of hostid(1) was "00000000", zgenhostid would error out, even though 0 is a possible return value for the gethostid(3) function used by hostid(1): - On current musl libc, gethostid(3) is a stub that always returns 0. - On glibc, gethostid(3) will return 0 if /etc/hostid exists but is smaller than 4 bytes. In these cases, it makes more sense for zgenhostid to treat a value of 0 as other parts of the zfs codebase do, meaning that a hostid value couldn't be determined; therefore, it should attempt to generate a random value to write into /etc/hostid. The manpage and usage output have been updated to reflect this. Whitespace has also been fixed in the usage output. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Georgy Yakovlev <[email protected]> Reviewed-by: Andrew J. Hesford <[email protected]> Signed-off-by: Érico Rolim <[email protected]> Closes #11174 Closes #11189
* Linux: Fix ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP usageBrian Behlendorf2020-11-144-19/+23
| | | | | | | | | | | | | | | | | | | The ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP macros should not be used in the Linux zpl_*.c source files. They return a positive error value which is correct for the common code, but not for the Linux specific kernel code which expects a negative return value. The ZPL_ENTER/ZPL_EXIT/ZPL_VERFY_ZP macros should be used instead. Furthermore, the ZPL_EXIT macro has been updated to not call the zfs_exit_fs() function. This prevents a possible deadlock which can occur when a snapshot is automatically unmounted because the zpl_show_devname() must never wait on in progress automatic snapshot unmounts. Reviewed-by: Adam Moss <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11169 Closes #11201
* Assertion failure when logging large output of channel programMatthew Ahrens2020-11-144-2/+39
| | | | | | | | | | | | | | | | | | | The output of ZFS channel programs is logged on-disk in the zpool history, and printed by `zpool history -i`. Channel programs can use 10MB of memory by default, and up to 100MB by using the `zfs program -m` flag. Therefore their output can be up to some fraction of 100MB. In addition to being somewhat wasteful of the limited space reserved for the pool history (which for large pools is 1GB), in extreme cases this can result in a failure of `ASSERT(length <= DMU_MAX_ACCESS);` in `dmu_buf_hold_array_by_dnode()`. This commit limits the output size that will be logged to 1MB. Larger outputs will not be logged, instead a entry will be logged indicating the size of the omitted output. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #11194
* Tag 2.0.0-rc6zfs-2.0.0-rc6Brian Behlendorf2020-11-121-1/+1
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* Channel program may spuriously fail with "memory limit exhausted"Matthew Ahrens2020-11-121-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ZFS channel programs (invoked by `zfs program`) are executed in a LUA sandbox with a limit on the amount of memory they can consume. The limit is 10MB by default, and can be raised to 100MB with the `-m` flag. If the memory limit is exceeded, the LUA program exits and the command fails with a message like `Channel program execution failed: Memory limit exhausted.` The LUA sandbox allocates memory with `vmem_alloc(KM_NOSLEEP)`, which will fail if the requested memory is not immediately available. In this case, the program fails with the same message, `Memory limit exhausted`. However, in this case the specified memory limit has not been reached, and the memory may only be temporarily unavailable. This commit changes the LUA memory allocator `zcp_lua_alloc()` to use `vmem_alloc(KM_SLEEP)`, so that we won't spuriously fail when memory is temporarily low. Instead, we rely on the system to be able to free up memory (e.g. by evicting from the ARC), and we assume that even at the highest memory limit of 100MB, the channel program will not truly exhaust the system's memory. External-issue: DLPX-71924 Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #11190
* Linux: Fix mount/unmount when dataset name has a spaceBrian Behlendorf2020-11-122-4/+23
| | | | | | | | | | | | | | | | The custom zpl_show_devname() helper should translate spaces in to the octal escape sequence \040. The getmntent(2) function is aware of this convention and properly translates the escape character back to a space when reading the fsname. Without this change the `zfs mount` and `zfs unmount` commands incorrectly detect when a dataset with a name containing spaces is mounted. Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11182 Closes #11187
* Start snapdir_iterate traversals to begin wtih the value of zero.Tony Perkins2020-11-122-1/+19
| | | | | | | | | | | | The microzap hash can sometimes be zero for single digit snapnames. The zap cursor can then have a serialized value of two (for . and ..), and skip the first entry in the avl tree for the .zfs/snapshot directory listing, and therefore does not return all snapshots. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Cedric Berger <[email protected]> Signed-off-by: Tony Perkins <[email protected]> Closes #11039
* G/C data_alloc_arenaMateusz Guzik2020-11-111-3/+1
| | | | | | | | | It is a leftover from illumos always set to NULL and introducing a spurious difference between zio_buf and zio_data_buf. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11188
* G/C struct znode -> z_movedMateusz Guzik2020-11-115-13/+2
| | | | | | | | | The field is yet another leftover from unsupported zfs_znode_move. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11186