summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Tag zfs-0.8.2zfs-0.8.2Tony Hutter2019-09-251-2/+2
| | | | | | META file and changelog updated. Signed-off-by: Tony Hutter <[email protected]>
* Disabled resilver_defer feature leads to looping resilversKody A Kantor2019-09-251-8/+11
| | | | | | | | | | | | | | | | | | | When a disk is replaced with another on a pool with the resilver_defer feature present, but not enabled the resilver activity restarts during each spa_sync. This patch checks to make sure that the resilver_defer feature is first enabled before requesting a deferred resilver. This was originally fixed in illumos-joyent as OS-7982. Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Reviewed by: Jerry Jelinek <[email protected]> Signed-off-by: Kody A Kantor <[email protected]> External-issue: illumos-joyent OS-7982 Closes #9299 Closes #9338
* Fix dsl_scan_ds_clone_swapped logicAndriy Gapon2019-09-251-31/+69
| | | | | | | | | | | | | | | | | The was incorrect with respect to swapping dataset IDs both in the on-disk ZAP object and the in-memory queue. In both cases, if ds1 was already present, then it would be first replaced with ds2 and then ds would be replaced back with ds1. Also, both cases did not properly handle a situation where both ds1 and ds2 are already queued. A duplicate insertion would be attempted and its failure would result in a panic. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Andriy Gapon <[email protected]> Closes #9140 Closes #9163
* Scrubbing root pools may deadlock on kernels without elevator_change() (#9321)loli10K2019-09-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally the zfs_vdev_elevator module option was added as a convenience so the requested elevator would be automatically set on the underlying block devices. At the time this was simple because the kernel provided an API function which did exactly this. This API was then removed in the Linux 4.12 kernel which prompted us to add compatibly code to set the elevator via a usermodehelper. Unfortunately changing the evelator via usermodehelper requires reading some userland binaries, most notably modprobe(8) or sh(1), from a zfs dataset on systems with root-on-zfs. This can deadlock the system if used during the following call path because it may need, if the data is not already cached in the ARC, reading directly from disk while holding the spa config lock as a writer: zfs_ioc_pool_scan() -> spa_scan() -> spa_scan() -> vdev_reopen() -> vdev_elevator_switch() -> call_usermodehelper() While the usermodehelper waits sh(1), modprobe(8) is blocked in the ZIO pipeline trying to read from disk: INFO: task modprobe:2650 blocked for more than 10 seconds. Tainted: P OE 5.2.14 modprobe D 0 2650 206 0x00000000 Call Trace: ? __schedule+0x244/0x5f0 schedule+0x2f/0xa0 cv_wait_common+0x156/0x290 [spl] ? do_wait_intr_irq+0xb0/0xb0 spa_config_enter+0x13b/0x1e0 [zfs] zio_vdev_io_start+0x51d/0x590 [zfs] ? tsd_get_by_thread+0x3b/0x80 [spl] zio_nowait+0x142/0x2f0 [zfs] arc_read+0xb2d/0x19d0 [zfs] ... zpl_iter_read+0xfa/0x170 [zfs] new_sync_read+0x124/0x1b0 vfs_read+0x91/0x140 ksys_read+0x59/0xd0 do_syscall_64+0x4f/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This commit changes how we use the usermodehelper functionality from synchronous (UMH_WAIT_PROC) to asynchronous (UMH_NO_WAIT) which prevents scrubs, and other vdev_elevator_switch() consumers, from triggering the aforementioned issue. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Issue #8664 Closes #9321
* QAT related bug fixesChengfei ZHu2019-09-254-32/+18
| | | | | | | | | | | | | | | 1. Fix issue: Kernel BUG with QAT during decompression #9276. Now it is uninterruptible for a specific given QAT request, but Ctrl-C interrupt still works in user-space process. 2. Copy the digest result to the buffer only when doing encryption, and vise-versa for decryption. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chengfei Zhu <[email protected]> Closes #9276 Closes #9303
* kmodtool: depmod pathBrian Behlendorf2019-09-251-7/+19
| | | | | | | | | | | | | | | | Determine the location of depmod on the system, either /sbin/depmod or /usr/sbin/depmod. Then use that path when generating the specfile. Additionally, update the Requires lines to reference the package which provides depmod rather than the binary itself. For CentOS/RHEL 7+8 and all supported Fedora releases this is the kmod package, and for CentOS/RHEL 6 it is the module-init-tools package. Reviewed-by: Minh Diep <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8724 Closes #9310
* Fix /etc/hostid on root pool deadlockBrian Behlendorf2019-09-258-23/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock as a reader in zfs_blkptr_verify(). A deadlock can occur if the /etc/hostid file resides on a dataset in the same pool. This is because reading the /etc/hostid file may occur while the caller is holding the SCL_VDEV lock as a writer. For example, to perform a `zpool attach` as shown in the abbreviated stack below. To resolve the issue we cache the system's hostid when initializing the spa_t, or when modifying the multihost property. The cached value is then relied upon for subsequent accesses. Call Trace: spa_config_enter+0x1e8/0x350 [zfs] zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock zio_read+0x6c/0x140 [zfs] ... vfs_read+0xfc/0x1e0 kernel_read+0x50/0x90 ... spa_get_hostid+0x1c/0x38 [zfs] spa_config_generate+0x1a0/0x610 [zfs] vdev_label_init+0xa0/0xc80 [zfs] vdev_create+0x98/0xe0 [zfs] spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock Reviewed-by: loli10K <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9256 Closes #9285
* BuildRequires libtirpc-devel needed for RHEL 8Olaf Faaland2019-09-251-1/+1
| | | | | | | | | | | | | Building against RHEL 8 requires libtirpc-devel, as with fedora 28. Add rhel8 and centos8 options to the test, to account for that. BuildRequires Originally added for fedora 28 via commit 1a62a305be01972ef1b81469134faa4937836096 Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #9289
* Fix zpool subcommands error message with some unsupported optionsloli10K2019-09-251-4/+2
| | | | | | | | | | | | | | | | | | Both 'detach' and 'online' zpool subcommands, when provided with an unsupported option, forget to print it in the error message: # zpool online -t rpool vda3 invalid option '' usage: online [-e] <pool> <device> ... This changes fixes the error message in order to include the actual option that is not supported. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #9270
* Fix zfs-dkms .deb package warning in prerm scriptloli10K2019-09-251-1/+1
| | | | | | | | | | | | | | Debian zfs-dkms package generated by alien doesn't call the prerm script (rpm's %preun) with an integer as first parameter, which results in the following warning when the package is uninstalled: "zfs-dkms.prerm: line 3: [: remove: integer expression expected" Modify the if-condition to avoid the warning. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #9271
* zvol_wait script should ignore partially received zvolsPavel Zakharov2019-09-251-2/+21
| | | | | | | | | | Partially received zvols won't have links in /dev/zvol. Reviewed-by: Sebastien Roy <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pavel Zakharov <[email protected]> Closes #9260
* New service that waits on zvol links to be createdPavel Zakharov2019-09-2511-3/+145
| | | | | | | | | | | | | | | | | The zfs-volume-wait.service scans existing zvols and waits for their links under /dev to be created. Any service that depends on zvol links to be there should add a dependency on zfs-volumes.target. By default, this target is not enabled. Reviewed-by: Fabian Grünbichler <[email protected]> Reviewed-by: Antonio Russo <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: John Gallagher <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pavel Zakharov <[email protected]> Closes #8975
* Always refuse receving non-resume stream when resume state existsAndriy Gapon2019-09-252-7/+18
| | | | | | | | | | | | | | | This fixes a hole in the situation where the resume state is left from receiving a new dataset and, so, the state is set on the dataset itself (as opposed to %recv child). Additionally, distinguish incremental and resume streams in error messages. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andriy Gapon <[email protected]> Closes #9252
* Fix Intel QAT / ZFS compatibility on v4.7.1+ kernelsloli10K2019-09-252-3/+3
| | | | | | | | | This change use the compat code introduced in 9cc1844a. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #9268 Closes #9269
* etc/init.d/zfs-functions.in: remove arch warningGeorgy Yakovlev2019-09-251-7/+0
| | | | | | | | | Remove the x86_64 warning, it's no longer the case that this is the only supported architecture. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Georgy Yakovlev <[email protected]> Closes: #9177
* zfs_handle used after being closed/freed in change_one callbackPavel Zakharov2019-09-251-12/+18
| | | | | | | | | | | | | | | | | | | | | This is a typical case of use after free. We would call zfs_close(zhp) which would free the handle, and then call zfs_iter_children() on that handle later. This change ensures that the zfs_handle is only closed when we are ready to return. Running `zfs inherit -r sharenfs pool` was failing with an error code without any error messages. After some debugging I've pinpointed the issue to be memory corruption, which would cause zfs to try to issue an ioctl to the wrong device and receive ENOTTY. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Sebastien Roy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Pavel Zakharov <[email protected]> Issue #7967 Closes #9165
* Fix zil replay panic when TX_REMOVE followed by TX_CREATEChunwei Chen2019-09-257-24/+184
| | | | | | | | | | | | | | | | | | | | If TX_REMOVE is followed by TX_CREATE on the same object id, we need to make sure the object removal is completely finished before creation. The current implementation relies on dnode_hold_impl with DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work fine before, in current version it does not guarantee the object removal is completed. We fix this by checking if DNODE_MUST_BE_FREE returns successful instead. Also add test and remove dead code in dnode_hold_impl. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #7151 Closes #8910 Closes #9123 Closes #9145
* zfs_ioc_snapshot: check user-prop permissions on snapshotted datasetsAndriy Gapon2019-09-251-10/+16
| | | | | | | | | | | | | | | Previously, the permissions were checked on the pool which was obviously incorrect. After this change, zfs_check_userprops() only validates the properties without any permission checks. The permissions are checked individually for each snapshotted dataset. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Andriy Gapon <[email protected]> Closes #9179 Closes #9180
* Fix Plymouth passphrase prompt in initramfs scriptRichard Allen2019-09-251-8/+8
| | | | | | | | | | | | | | | Entering the ZFS encryption passphrase under Plymouth wasn't working because in the ZFS initrd script, Plymouth was calling zfs via "--command", which wasn't passing through the filesystem argument to zfs load-key properly (it was passing through the single quotes around the filesystem name intended to handle spaces literally, which zfs load-key couldn't understand). Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Garrett Fields <[email protected]> Signed-off-by: Richard Allen <[email protected]> Issue #9193 Closes #9202
* Fix deadlock in 'zfs rollback'Tom Caputi2019-09-253-1/+17
| | | | | | | | | | | | | | | | | | | | | Currently, the 'zfs rollback' code can end up deadlocked due to the way the kernel handles unreferenced inodes on a suspended fs. Essentially, the zfs_resume_fs() code path may cause zfs to spawn new threads as it reinstantiates the suspended fs's zil. When a new thread is spawned, the kernel may attempt to free memory for that thread by freeing some unreferenced inodes. If it happens to select inodes that are a a part of the suspended fs a deadlock will occur because freeing inodes requires holding the fs's z_teardown_inactive_lock which is still held from the suspend. This patch corrects this issue by adding an additional reference to all inodes that are still present when a suspend is initiated. This prevents them from being freed by the kernel for any reason. Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #9203
* Make slog test setup more robustRyan Moeller2019-09-2519-10/+27
| | | | | | | | | | | | The slog tests fail when attempting to create pools using file vdevs that already exist from previous test runs. Remove these files in the setup for the test. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9194
* zfs-mount-genrator: dependencies should be space-separatedyshui2019-09-251-1/+1
| | | | | | | Reviewed-by: Antonio Russo <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: Yuxuan Shui <[email protected]> Closes #9174
* Linux 5.3: Fix switch() fall though compiler errorsTony Hutter2019-09-253-3/+11
| | | | | | | | | | Fix some switch() fall-though compiler errors: abd.c:1504:9: error: this statement may fall through Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #9170
* Linux 5.3 compat: Makefile subdir-m no longer supportedDominic Pearson2019-09-252-12/+23
| | | | | | | | | | | Uses obj-m instead, due to kernel changes. See LKML: Masahiro Yamada, Tue, 6 Aug 2019 19:03:23 +0900 Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Dominic Pearson <[email protected]> Closes #9169
* Fix out-of-order ZIL txtype lost on hardlinked filesChunwei Chen2019-09-255-15/+27
| | | | | | | | | | | | | | | | | We should only call zil_remove_async when an object is removed. However, in current implementation, it is called whenever TX_REMOVE is called. In the case of hardlinked file, every unlink will generate TX_REMOVE and causing operations to be dropped even when the object is not removed. We fix this by only calling zil_remove_async when the file is fully unlinked. Reviewed-by: George Wilson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #8769 Closes #9061
* Increase default zcmd allocation to 256KMichael Niewöhner2019-09-251-1/+1
| | | | | | | | | | | | | | | | | | When creating hundreds of clones (for example using containers with LXD) cloning slows down as the number of clones increases over time. The reason for this is that the fetching of the clone information using a small zcmd buffer requires two ioctl calls, one to determine the size and a second to return the data. However, this requires gathering the data twice, once to determine the size and again to populate the zcmd buffer to return it to userspace. These are expensive ioctl() calls, so instead, make the default buffer size much larger: 256K. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Michael Niewöhner <[email protected]> Closes #9084
* Improve performance by using dmu_tx_hold_*_by_dnode()Matthew Ahrens2019-09-253-9/+15
| | | | | | | | | | | | | | | | | | | | In zfs_write() and dmu_tx_hold_sa(), we can use dmu_tx_hold_*_by_dnode() instead of dmu_tx_hold_*(), since we already have a dbuf from the target dnode in hand. This eliminates some calls to dnode_hold(), which can be expensive. This is especially impactful if several threads are accessing objects that are in the same block of dnodes, because they will contend for that dbuf's lock. We are seeing 10-20% performance wins for the sequential_writes tests in the performance test suite, when doing >=128K writes to files with recordsize=8K. This also removes some unnecessary casts that are in the area. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #9081
* Fix channel programs on s390xBrian Behlendorf2019-09-251-1/+1
| | | | | | | | | | | | | | | When adapting the original sources for s390x the JMP_BUF_CNT was mistakenly halved due to an incorrect assumption of the size of a unsigned long. They are 8 bytes for the s390x architecture. Increase JMP_BUF_CNT accordingly. Authored-by: Don Brady <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reported-by: Colin Ian King <canonical.com> Tested-by: Colin Ian King <canonical.com> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8992 Closes #9080
* Race between zfs-share and zfs-mount servicesGeorge Wilson2019-09-251-0/+1
| | | | | | | | | | | | | | | | | | | When a system boots the zfs-mount.service and the zfs-share.service can start simultaneously. What may be unclear is that sharing a filesystem will first mount the filesystem if it's not already mounted. This means that both service can race to mount the same fileystem. This race can result in a SEGFAULT or EBUSY conditions. This change explicitly defines the start ordering between the two services such that the zfs-mount.service is solely responsible for mounting filesystems eliminating the race between "zfs mount -a" and "zfs share -a" commands. Reviewed-by: Sebastien Roy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Wilson <[email protected]> Closes #9083
* Implement secpolicy_vnode_setid_retain()Tomohiro Kusumi2019-09-2513-1/+435
| | | | | | | | | | | | | | | Don't unconditionally return 0 (i.e. retain SUID/SGID). Test CAP_FSETID capability. https://github.com/pjd/pjdfstest/blob/master/tests/chmod/12.t which expects SUID/SGID to be dropped on write(2) by non-owner fails without this. Most filesystems make this decision within VFS by using a generic file write for fops. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #9035 Closes #9043
* zed crashes when devid not presentMatthew Ahrens2019-09-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | zed core dumps due to a NULL pointer in zfs_agent_iter_vdev(). The gs_devid is NULL, but the nvl has a "devid" entry. zfs_agent_post_event() checks that ZFS_EV_VDEV_GUID or DEV_IDENTIFIER is present in nvl, but then later it and zfs_agent_iter_vdev() assume that DEV_IDENTIFIER is present and thus gs_devid is set. Typically this is not a problem because usually either all vdevs have devid's, or none of them do. Since zfs_agent_iter_vdev() first checks if the vdev has devid before dereferencing gs_devid, the problem isn't typically encountered. However, if some vdevs have devid's and some do not, then the problem is easily reproduced. This can happen if the pool has been moved from a system that has devid's to one that does not. The fix is for zfs_agent_iter_vdev() to only try to match the devid's if both nvl and gsp have devid's present. Reviewed-by: Prashanth Sreenivasa <[email protected]> Reviewed-by: Don Brady <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> External-issue: DLPX-65090 Closes #9054 Closes #9060
* Don't directly cast unsigned long to void*Tomohiro Kusumi2019-09-251-2/+3
| | | | | | | | | Cast to uintptr_t first for portability on integer to/from pointer conversion. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #9065
* Fix module_param() type for zfs_read_chunk_sizeTomohiro Kusumi2019-09-251-2/+4
| | | | | | | | zfs_read_chunk_size is unsigned long. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #9051
* Move some tests to cli_user/zpool_statusTony Hutter2019-09-2511-10/+81
| | | | | | | | | | | | | | | | | | | | | The tests in tests/functional/cli_root/zpool_status should all require root. However, linux.run has "user =" specified for those tests, which means they run as a normal user. When I removed that line to run them as root, the following tests did not pass: zpool_status_003_pos zpool_status_-c_disable zpool_status_-c_homedir zpool_status_-c_searchpath These tests need to be run as a normal user. To fix this, move these tests to a new tests/functional/cli_user/zpool_status directory. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #9057
* Race condition between spa async threads and exportSerapheim Dimitropoulos2019-09-256-2/+42
| | | | | | | | | | | | | | | | | | | In the past we've seen multiple race conditions that have to do with open-context threads async threads and concurrent calls to spa_export()/spa_destroy() (including the one referenced in issue #9015). This patch ensures that only one thread can execute the main body of spa_export_common() at a time, with subsequent threads returning with a new error code created just for this situation, eliminating this way any race condition bugs introduced by concurrent calls to this function. Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #9015 Closes #9044
* hdr_recl calls zthr_wakeup() on destroyed zthrSerapheim Dimitropoulos2019-09-251-4/+16
| | | | | | | | | | | | | | | | | | | | | | | There exists a race condition were hdr_recl() calls zthr_wakeup() on a destroyed zthr. The timeline is the following: [1] hdr_recl() runs first and goes intro zthr_wakeup() because arc_initialized is set. [2] arc_fini() is called by another thread, zeroes that flag, destroying the zthr, and goes into buf_init(). [3] hdr_recl() tries to enter the destroyed mutex and we blow up. This patch ensures that the ARC's zthrs are not offloaded any new work once arc_initialized is set and then destroys them after all of the ARC state has been deleted. Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #9047
* Fix wrong comment on zcr_blksz_{min,max}Tomohiro Kusumi2019-09-251-5/+6
| | | | | | | | | | | These aren't tunable; illumos has this comment fixed in "3742 zfs comments need cleaner, more consistent style", so sync with that. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #9052
* Retire unused spl_{mutex,rwlock}_{init_fini}Brian Behlendorf2019-09-256-92/+13
| | | | | | | | | | These functions are unused and can be removed along with the spl-mutex.c and spl-rwlock.c source files. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tomohiro Kusumi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9029
* Linux 5.3 compat: retire rw_tryupgrade()Brian Behlendorf2019-09-252-154/+7
| | | | | | | | | | | | | | | | | | | | | | | The Linux kernel's rwsem's have never provided an interface to allow a reader to be upgraded to a writer. Historically, this functionality has been implemented by a SPL wrapper function. However, this approach depends on internal knowledge of the rw_semaphore and is therefore rather brittle. Since the ZFS code must always be able to fallback to rw_exit() and rw_enter() when an rw_tryupgrade() fails; this functionality isn't critical. Furthermore, the only potentially performance sensitive consumer is dmu_zfetch() and no decrease in performance was observed with this change applied. See the PR comments for additional testing details. Therefore, it is being retired to make the build more robust and to simplify the rwlock implementation. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tomohiro Kusumi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9029
* Linux 5.3 compat: rw_semaphore ownerBrian Behlendorf2019-09-252-66/+5
| | | | | | | | | | | | | | Commit https://github.com/torvalds/linux/commit/94a9717b updated the rwsem's owner field to contain additional flags describing the rwsem's state. Rather then update the wrappers to mask out these bits, the code no longer relies on the owner stored by the kernel. This does increase the size of a krwlock_t but it makes the implementation less sensitive to future kernel changes. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tomohiro Kusumi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9029
* Fix lockdep recursive locking false positive in dbuf_destroyjdike2019-09-253-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockdep reports a possible recursive lock in dbuf_destroy. It is true that dbuf_destroy is acquiring the dn_dbufs_mtx on one dnode while holding it on another dnode. However, it is impossible for these to be the same dnode because, among other things,dbuf_destroy checks MUTEX_HELD before acquiring the mutex. This fix defines a class NESTED_SINGLE == 1 and changes that lock to call mutex_enter_nested with a subclass of NESTED_SINGLE. In order to make the userspace code compile, include/sys/zfs_context.h now defines mutex_enter_nested and NESTED_SINGLE. This is the lockdep report: [ 122.950921] ============================================ [ 122.950921] WARNING: possible recursive locking detected [ 122.950921] 4.19.29-4.19.0-debug-d69edad5368c1166 #1 Tainted: G O [ 122.950921] -------------------------------------------- [ 122.950921] dbu_evict/1457 is trying to acquire lock: [ 122.950921] 0000000083e9cbcf (&dn->dn_dbufs_mtx){+.+.}, at: dbuf_destroy+0x3c0/0xdb0 [zfs] [ 122.950921] but task is already holding lock: [ 122.950921] 0000000055523987 (&dn->dn_dbufs_mtx){+.+.}, at: dnode_evict_dbufs+0x90/0x740 [zfs] [ 122.950921] other info that might help us debug this: [ 122.950921] Possible unsafe locking scenario: [ 122.950921] CPU0 [ 122.950921] ---- [ 122.950921] lock(&dn->dn_dbufs_mtx); [ 122.950921] lock(&dn->dn_dbufs_mtx); [ 122.950921] *** DEADLOCK *** [ 122.950921] May be due to missing lock nesting notation [ 122.950921] 1 lock held by dbu_evict/1457: [ 122.950921] #0: 0000000055523987 (&dn->dn_dbufs_mtx){+.+.}, at: dnode_evict_dbufs+0x90/0x740 [zfs] [ 122.950921] stack backtrace: [ 122.950921] CPU: 0 PID: 1457 Comm: dbu_evict Tainted: G O 4.19.29-4.19.0-debug-d69edad5368c1166 #1 [ 122.950921] Hardware name: Supermicro H8SSL-I2/H8SSL-I2, BIOS 080011 03/13/2009 [ 122.950921] Call Trace: [ 122.950921] dump_stack+0x91/0xeb [ 122.950921] __lock_acquire+0x2ca7/0x4f10 [ 122.950921] lock_acquire+0x153/0x330 [ 122.950921] dbuf_destroy+0x3c0/0xdb0 [zfs] [ 122.950921] dbuf_evict_one+0x1cc/0x3d0 [zfs] [ 122.950921] dbuf_rele_and_unlock+0xb84/0xd60 [zfs] [ 122.950921] dnode_evict_dbufs+0x3a6/0x740 [zfs] [ 122.950921] dmu_objset_evict+0x7a/0x500 [zfs] [ 122.950921] dsl_dataset_evict_async+0x70/0x480 [zfs] [ 122.950921] taskq_thread+0x979/0x1480 [spl] [ 122.950921] kthread+0x2e7/0x3e0 [ 122.950921] ret_from_fork+0x27/0x50 Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jeff Dike <[email protected]> Closes #8984
* Add missing __GFP_HIGHMEM flag to vmallocMichael Niewöhner2019-09-251-1/+2
| | | | | | | | | | | | Make use of __GFP_HIGHMEM flag in vmem_alloc, which is required for some 32-bit systems to make use of full available memory. While kernel versions >=4.12-rc1 add this flag implicitly, older kernels do not. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Sebastian Gottschall <[email protected]> Signed-off-by: Michael Niewöhner <[email protected]> Closes #9031
* Use zfsctl_snapshot_hold() wrapperTomohiro Kusumi2019-09-251-3/+3
| | | | | | | | | zfs_refcount_*() are to be wrapped by zfsctl_snapshot_*() in this file. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #9039
* Minor style cleanupBrian Behlendorf2019-09-259-42/+57
| | | | | | | | | | | Resolve an assortment of style inconsistencies including use of white space, typos, capitalization, and line wrapping. There is no functional change. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9030
* Fix get_special_prop() build failureBrian Behlendorf2019-09-251-4/+2
| | | | | | | | | | | | | | The cast of the size_t returned by strlcpy() to a uint64_t by the VERIFY3U can result in a build failure when CONFIG_FORTIFY_SOURCE is set. This is due to the additional hardening. Since the token is expected to always fit in strval the VERIFY3U has been removed. If somehow it doesn't, it will still be safely truncated. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #8999 Closes #9020
* systemd encryption key supportAntonio Russo2019-09-253-5/+55
| | | | | | | | | | | | | | | | | | | | | | Modify zfs-mount-generator to produce a dependency on new zfs-import-key-*.service units, dynamically created at boot to call zfs load-key for the encryption root, before attempting to mount any encrypted datasets. These units are created by zfs-mount-generator, and RequiresMountsFor on the keyfile, if present, or call systemd-ask-password if a passphrase is requested. This patch includes suggestions from @Fabian-Gruenbichler, @ryanjaeb and @rlaager, as well an adaptation of @rlaager's script to retry on incorrect password entry. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Fabian Grünbichler <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #8750 Closes #8848
* Drop redundant POSIX ACL check in zpl_init_acl()Tomohiro Kusumi2019-09-251-7/+4
| | | | | | | | | | ZFS_ACLTYPE_POSIXACL has already been tested in zpl_init_acl(), so no need to test again on POSIX ACL access. Reviewed by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #9009
* Export dnode symbolsBrian Behlendorf2019-09-251-0/+10
| | | | | | | | | | | | External consumers such as Lustre require access to the dnode interfaces in order to correctly manipulate dnodes. Reviewed-by: James Simmons <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #8994 Closes #9027
* Ensure dsl_destroy_head() decrypts objsetsTom Caputi2019-09-251-3/+4
| | | | | | | | | | | | | This patch corrects a small issue where the dsl_destroy_head() code that runs when the async_destroy feature is disabled would not properly decrypt the dataset before beginning processing. If the dataset is not able to be decrypted, the optimization code now simply does not run and the dataset is completely destroyed in the DSL sync task. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #9021
* Disable unused pathname::pn_path* (unneeded in Linux)Tomohiro Kusumi2019-09-252-4/+13
| | | | | | | | | | | | | | | | | struct pathname is originally from Solaris VFS, and it has been used in ZoL to merely call VOP from Linux VFS interface without API change, therefore pathname::pn_path* are unused and unneeded. Technically, struct pathname is a wrapper for C string in ZoL. Saves stack a bit on lookup and unlink. (#if0'd members instead of removing since comments refer to them.) Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #9025