| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
META file and changelog updated.
Signed-off-by: Tony Hutter <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a disk is replaced with another on a pool with the resilver_defer
feature present, but not enabled the resilver activity restarts during
each spa_sync. This patch checks to make sure that the resilver_defer
feature is first enabled before requesting a deferred resilver.
This was originally fixed in illumos-joyent as OS-7982.
Reviewed-by: Chris Dunlop <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Reviewed by: Jerry Jelinek <[email protected]>
Signed-off-by: Kody A Kantor <[email protected]>
External-issue: illumos-joyent OS-7982
Closes #9299
Closes #9338
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.
In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued. A duplicate insertion would be attempted and
its failure would result in a panic.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Andriy Gapon <[email protected]>
Closes #9140
Closes #9163
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally the zfs_vdev_elevator module option was added as a
convenience so the requested elevator would be automatically set
on the underlying block devices. At the time this was simple
because the kernel provided an API function which did exactly this.
This API was then removed in the Linux 4.12 kernel which prompted
us to add compatibly code to set the elevator via a usermodehelper.
Unfortunately changing the evelator via usermodehelper requires reading
some userland binaries, most notably modprobe(8) or sh(1), from a zfs
dataset on systems with root-on-zfs. This can deadlock the system if
used during the following call path because it may need, if the data
is not already cached in the ARC, reading directly from disk while
holding the spa config lock as a writer:
zfs_ioc_pool_scan()
-> spa_scan()
-> spa_scan()
-> vdev_reopen()
-> vdev_elevator_switch()
-> call_usermodehelper()
While the usermodehelper waits sh(1), modprobe(8) is blocked in the
ZIO pipeline trying to read from disk:
INFO: task modprobe:2650 blocked for more than 10 seconds.
Tainted: P OE 5.2.14
modprobe D 0 2650 206 0x00000000
Call Trace:
? __schedule+0x244/0x5f0
schedule+0x2f/0xa0
cv_wait_common+0x156/0x290 [spl]
? do_wait_intr_irq+0xb0/0xb0
spa_config_enter+0x13b/0x1e0 [zfs]
zio_vdev_io_start+0x51d/0x590 [zfs]
? tsd_get_by_thread+0x3b/0x80 [spl]
zio_nowait+0x142/0x2f0 [zfs]
arc_read+0xb2d/0x19d0 [zfs]
...
zpl_iter_read+0xfa/0x170 [zfs]
new_sync_read+0x124/0x1b0
vfs_read+0x91/0x140
ksys_read+0x59/0xd0
do_syscall_64+0x4f/0x130
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This commit changes how we use the usermodehelper functionality from
synchronous (UMH_WAIT_PROC) to asynchronous (UMH_NO_WAIT) which prevents
scrubs, and other vdev_elevator_switch() consumers, from triggering the
aforementioned issue.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Issue #8664
Closes #9321
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Fix issue: Kernel BUG with QAT during decompression #9276.
Now it is uninterruptible for a specific given QAT request,
but Ctrl-C interrupt still works in user-space process.
2. Copy the digest result to the buffer only when doing encryption,
and vise-versa for decryption.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chengfei Zhu <[email protected]>
Closes #9276
Closes #9303
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Determine the location of depmod on the system, either /sbin/depmod or
/usr/sbin/depmod. Then use that path when generating the specfile.
Additionally, update the Requires lines to reference the package which
provides depmod rather than the binary itself. For CentOS/RHEL 7+8
and all supported Fedora releases this is the kmod package, and for
CentOS/RHEL 6 it is the module-init-tools package.
Reviewed-by: Minh Diep <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8724
Closes #9310
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify(). A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool. This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer. For example, to perform a
`zpool attach` as shown in the abbreviated stack below.
To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property. The cached
value is then relied upon for subsequent accesses.
Call Trace:
spa_config_enter+0x1e8/0x350 [zfs]
zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
zio_read+0x6c/0x140 [zfs]
...
vfs_read+0xfc/0x1e0
kernel_read+0x50/0x90
...
spa_get_hostid+0x1c/0x38 [zfs]
spa_config_generate+0x1a0/0x610 [zfs]
vdev_label_init+0xa0/0xc80 [zfs]
vdev_create+0x98/0xe0 [zfs]
spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock
Reviewed-by: loli10K <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9256
Closes #9285
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Building against RHEL 8 requires libtirpc-devel, as with fedora 28.
Add rhel8 and centos8 options to the test, to account for that.
BuildRequires Originally added for fedora 28 via commit
1a62a305be01972ef1b81469134faa4937836096
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #9289
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both 'detach' and 'online' zpool subcommands, when provided with an
unsupported option, forget to print it in the error message:
# zpool online -t rpool vda3
invalid option ''
usage:
online [-e] <pool> <device> ...
This changes fixes the error message in order to include the actual
option that is not supported.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #9270
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Debian zfs-dkms package generated by alien doesn't call the prerm script
(rpm's %preun) with an integer as first parameter, which results in the
following warning when the package is uninstalled:
"zfs-dkms.prerm: line 3: [: remove: integer expression expected"
Modify the if-condition to avoid the warning.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #9271
|
|
|
|
|
|
|
|
|
|
| |
Partially received zvols won't have links in /dev/zvol.
Reviewed-by: Sebastien Roy <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Pavel Zakharov <[email protected]>
Closes #9260
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The zfs-volume-wait.service scans existing zvols and waits for their
links under /dev to be created. Any service that depends on zvol
links to be there should add a dependency on zfs-volumes.target.
By default, this target is not enabled.
Reviewed-by: Fabian Grünbichler <[email protected]>
Reviewed-by: Antonio Russo <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: loli10K <[email protected]>
Reviewed-by: John Gallagher <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Pavel Zakharov <[email protected]>
Closes #8975
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a hole in the situation where the resume state is left from
receiving a new dataset and, so, the state is set on the dataset itself
(as opposed to %recv child).
Additionally, distinguish incremental and resume streams in error
messages.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Andriy Gapon <[email protected]>
Closes #9252
|
|
|
|
|
|
|
|
|
| |
This change use the compat code introduced in 9cc1844a.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #9268
Closes #9269
|
|
|
|
|
|
|
|
|
| |
Remove the x86_64 warning, it's no longer the case that this is the
only supported architecture.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Georgy Yakovlev <[email protected]>
Closes: #9177
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a typical case of use after free. We would call zfs_close(zhp)
which would free the handle, and then call zfs_iter_children() on that
handle later. This change ensures that the zfs_handle is only closed
when we are ready to return.
Running `zfs inherit -r sharenfs pool` was failing with an error
code without any error messages. After some debugging I've pinpointed
the issue to be memory corruption, which would cause zfs to try to
issue an ioctl to the wrong device and receive ENOTTY.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Sebastien Roy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Signed-off-by: Pavel Zakharov <[email protected]>
Issue #7967
Closes #9165
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.
We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #7151
Closes #8910
Closes #9123
Closes #9145
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the permissions were checked on the pool which was obviously
incorrect.
After this change, zfs_check_userprops() only validates the properties
without any permission checks. The permissions are checked individually
for each snapshotted dataset.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Andriy Gapon <[email protected]>
Closes #9179
Closes #9180
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Entering the ZFS encryption passphrase under Plymouth wasn't working
because in the ZFS initrd script, Plymouth was calling zfs via
"--command", which wasn't passing through the filesystem argument to
zfs load-key properly (it was passing through the single quotes around
the filesystem name intended to handle spaces literally,
which zfs load-key couldn't understand).
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Garrett Fields <[email protected]>
Signed-off-by: Richard Allen <[email protected]>
Issue #9193
Closes #9202
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the 'zfs rollback' code can end up deadlocked due to
the way the kernel handles unreferenced inodes on a suspended fs.
Essentially, the zfs_resume_fs() code path may cause zfs to spawn
new threads as it reinstantiates the suspended fs's zil. When a
new thread is spawned, the kernel may attempt to free memory for
that thread by freeing some unreferenced inodes. If it happens to
select inodes that are a a part of the suspended fs a deadlock
will occur because freeing inodes requires holding the fs's
z_teardown_inactive_lock which is still held from the suspend.
This patch corrects this issue by adding an additional reference
to all inodes that are still present when a suspend is initiated.
This prevents them from being freed by the kernel for any reason.
Reviewed-by: Alek Pinchuk <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #9203
|
|
|
|
|
|
|
|
|
|
|
|
| |
The slog tests fail when attempting to create pools using file vdevs
that already exist from previous test runs. Remove these files in the
setup for the test.
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9194
|
|
|
|
|
|
|
| |
Reviewed-by: Antonio Russo <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Signed-off-by: Yuxuan Shui <[email protected]>
Closes #9174
|
|
|
|
|
|
|
|
|
|
| |
Fix some switch() fall-though compiler errors:
abd.c:1504:9: error: this statement may fall through
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #9170
|
|
|
|
|
|
|
|
|
|
|
| |
Uses obj-m instead, due to kernel changes.
See LKML: Masahiro Yamada, Tue, 6 Aug 2019 19:03:23 +0900
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Dominic Pearson <[email protected]>
Closes #9169
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.
We fix this by only calling zil_remove_async when the file is fully
unlinked.
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Prakash Surya <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #8769
Closes #9061
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When creating hundreds of clones (for example using containers with
LXD) cloning slows down as the number of clones increases over time.
The reason for this is that the fetching of the clone information
using a small zcmd buffer requires two ioctl calls, one to determine
the size and a second to return the data. However, this requires
gathering the data twice, once to determine the size and again to
populate the zcmd buffer to return it to userspace.
These are expensive ioctl() calls, so instead, make the default buffer
size much larger: 256K.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Michael Niewöhner <[email protected]>
Closes #9084
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In zfs_write() and dmu_tx_hold_sa(), we can use dmu_tx_hold_*_by_dnode()
instead of dmu_tx_hold_*(), since we already have a dbuf from the target
dnode in hand. This eliminates some calls to dnode_hold(), which can be
expensive. This is especially impactful if several threads are
accessing objects that are in the same block of dnodes, because they
will contend for that dbuf's lock.
We are seeing 10-20% performance wins for the sequential_writes tests in
the performance test suite, when doing >=128K writes to files with
recordsize=8K.
This also removes some unnecessary casts that are in the area.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #9081
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long. They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.
Authored-by: Don Brady <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8992
Closes #9080
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a system boots the zfs-mount.service and the
zfs-share.service can start simultaneously. What may be
unclear is that sharing a filesystem will first mount
the filesystem if it's not already mounted. This means
that both service can race to mount the same fileystem.
This race can result in a SEGFAULT or EBUSY conditions.
This change explicitly defines the start ordering between the
two services such that the zfs-mount.service is solely
responsible for mounting filesystems eliminating the race
between "zfs mount -a" and "zfs share -a" commands.
Reviewed-by: Sebastien Roy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Wilson <[email protected]>
Closes #9083
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't unconditionally return 0 (i.e. retain SUID/SGID).
Test CAP_FSETID capability.
https://github.com/pjd/pjdfstest/blob/master/tests/chmod/12.t
which expects SUID/SGID to be dropped on write(2) by non-owner fails
without this. Most filesystems make this decision within VFS by using
a generic file write for fops.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #9035
Closes #9043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zed core dumps due to a NULL pointer in zfs_agent_iter_vdev(). The
gs_devid is NULL, but the nvl has a "devid" entry.
zfs_agent_post_event() checks that ZFS_EV_VDEV_GUID or DEV_IDENTIFIER is
present in nvl, but then later it and zfs_agent_iter_vdev() assume that
DEV_IDENTIFIER is present and thus gs_devid is set.
Typically this is not a problem because usually either all vdevs have
devid's, or none of them do. Since zfs_agent_iter_vdev() first checks if
the vdev has devid before dereferencing gs_devid, the problem isn't
typically encountered. However, if some vdevs have devid's and some do
not, then the problem is easily reproduced. This can happen if the pool
has been moved from a system that has devid's to one that does not.
The fix is for zfs_agent_iter_vdev() to only try to match the devid's if
both nvl and gsp have devid's present.
Reviewed-by: Prashanth Sreenivasa <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: loli10K <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
External-issue: DLPX-65090
Closes #9054
Closes #9060
|
|
|
|
|
|
|
|
|
| |
Cast to uintptr_t first for portability on integer to/from pointer
conversion.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #9065
|
|
|
|
|
|
|
|
| |
zfs_read_chunk_size is unsigned long.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #9051
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tests in tests/functional/cli_root/zpool_status should all require
root. However, linux.run has "user =" specified for those tests, which
means they run as a normal user. When I removed that line to run them
as root, the following tests did not pass:
zpool_status_003_pos
zpool_status_-c_disable
zpool_status_-c_homedir
zpool_status_-c_searchpath
These tests need to be run as a normal user. To fix this, move these
tests to a new tests/functional/cli_user/zpool_status directory.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #9057
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the past we've seen multiple race conditions that have
to do with open-context threads async threads and concurrent
calls to spa_export()/spa_destroy() (including the one
referenced in issue #9015).
This patch ensures that only one thread can execute the
main body of spa_export_common() at a time, with subsequent
threads returning with a new error code created just for
this situation, eliminating this way any race condition
bugs introduced by concurrent calls to this function.
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #9015
Closes #9044
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There exists a race condition were hdr_recl() calls
zthr_wakeup() on a destroyed zthr. The timeline is the
following:
[1] hdr_recl() runs first and goes intro zthr_wakeup()
because arc_initialized is set.
[2] arc_fini() is called by another thread, zeroes
that flag, destroying the zthr, and goes into
buf_init().
[3] hdr_recl() tries to enter the destroyed mutex
and we blow up.
This patch ensures that the ARC's zthrs are not offloaded
any new work once arc_initialized is set and then destroys
them after all of the ARC state has been deleted.
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #9047
|
|
|
|
|
|
|
|
|
|
|
| |
These aren't tunable; illumos has this comment fixed in
"3742 zfs comments need cleaner, more consistent style",
so sync with that.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #9052
|
|
|
|
|
|
|
|
|
|
| |
These functions are unused and can be removed along
with the spl-mutex.c and spl-rwlock.c source files.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Tomohiro Kusumi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9029
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Linux kernel's rwsem's have never provided an interface to
allow a reader to be upgraded to a writer. Historically, this
functionality has been implemented by a SPL wrapper function.
However, this approach depends on internal knowledge of the
rw_semaphore and is therefore rather brittle.
Since the ZFS code must always be able to fallback to rw_exit()
and rw_enter() when an rw_tryupgrade() fails; this functionality
isn't critical. Furthermore, the only potentially performance
sensitive consumer is dmu_zfetch() and no decrease in performance
was observed with this change applied. See the PR comments for
additional testing details.
Therefore, it is being retired to make the build more robust and
to simplify the rwlock implementation.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Tomohiro Kusumi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9029
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit https://github.com/torvalds/linux/commit/94a9717b updated the
rwsem's owner field to contain additional flags describing the rwsem's
state. Rather then update the wrappers to mask out these bits, the
code no longer relies on the owner stored by the kernel. This does
increase the size of a krwlock_t but it makes the implementation
less sensitive to future kernel changes.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Tomohiro Kusumi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9029
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lockdep reports a possible recursive lock in dbuf_destroy.
It is true that dbuf_destroy is acquiring the dn_dbufs_mtx
on one dnode while holding it on another dnode. However,
it is impossible for these to be the same dnode because,
among other things,dbuf_destroy checks MUTEX_HELD before
acquiring the mutex.
This fix defines a class NESTED_SINGLE == 1 and changes
that lock to call mutex_enter_nested with a subclass of
NESTED_SINGLE.
In order to make the userspace code compile,
include/sys/zfs_context.h now defines mutex_enter_nested and
NESTED_SINGLE.
This is the lockdep report:
[ 122.950921] ============================================
[ 122.950921] WARNING: possible recursive locking detected
[ 122.950921] 4.19.29-4.19.0-debug-d69edad5368c1166 #1 Tainted: G O
[ 122.950921] --------------------------------------------
[ 122.950921] dbu_evict/1457 is trying to acquire lock:
[ 122.950921] 0000000083e9cbcf (&dn->dn_dbufs_mtx){+.+.}, at: dbuf_destroy+0x3c0/0xdb0 [zfs]
[ 122.950921]
but task is already holding lock:
[ 122.950921] 0000000055523987 (&dn->dn_dbufs_mtx){+.+.}, at: dnode_evict_dbufs+0x90/0x740 [zfs]
[ 122.950921]
other info that might help us debug this:
[ 122.950921] Possible unsafe locking scenario:
[ 122.950921] CPU0
[ 122.950921] ----
[ 122.950921] lock(&dn->dn_dbufs_mtx);
[ 122.950921] lock(&dn->dn_dbufs_mtx);
[ 122.950921]
*** DEADLOCK ***
[ 122.950921] May be due to missing lock nesting notation
[ 122.950921] 1 lock held by dbu_evict/1457:
[ 122.950921] #0: 0000000055523987 (&dn->dn_dbufs_mtx){+.+.}, at: dnode_evict_dbufs+0x90/0x740 [zfs]
[ 122.950921]
stack backtrace:
[ 122.950921] CPU: 0 PID: 1457 Comm: dbu_evict Tainted: G O 4.19.29-4.19.0-debug-d69edad5368c1166 #1
[ 122.950921] Hardware name: Supermicro H8SSL-I2/H8SSL-I2, BIOS 080011 03/13/2009
[ 122.950921] Call Trace:
[ 122.950921] dump_stack+0x91/0xeb
[ 122.950921] __lock_acquire+0x2ca7/0x4f10
[ 122.950921] lock_acquire+0x153/0x330
[ 122.950921] dbuf_destroy+0x3c0/0xdb0 [zfs]
[ 122.950921] dbuf_evict_one+0x1cc/0x3d0 [zfs]
[ 122.950921] dbuf_rele_and_unlock+0xb84/0xd60 [zfs]
[ 122.950921] dnode_evict_dbufs+0x3a6/0x740 [zfs]
[ 122.950921] dmu_objset_evict+0x7a/0x500 [zfs]
[ 122.950921] dsl_dataset_evict_async+0x70/0x480 [zfs]
[ 122.950921] taskq_thread+0x979/0x1480 [spl]
[ 122.950921] kthread+0x2e7/0x3e0
[ 122.950921] ret_from_fork+0x27/0x50
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jeff Dike <[email protected]>
Closes #8984
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make use of __GFP_HIGHMEM flag in vmem_alloc, which is required for
some 32-bit systems to make use of full available memory.
While kernel versions >=4.12-rc1 add this flag implicitly, older
kernels do not.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Sebastian Gottschall <[email protected]>
Signed-off-by: Michael Niewöhner <[email protected]>
Closes #9031
|
|
|
|
|
|
|
|
|
| |
zfs_refcount_*() are to be wrapped by zfsctl_snapshot_*() in this file.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #9039
|
|
|
|
|
|
|
|
|
|
|
| |
Resolve an assortment of style inconsistencies including
use of white space, typos, capitalization, and line wrapping.
There is no functional change.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9030
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cast of the size_t returned by strlcpy() to a uint64_t by the
VERIFY3U can result in a build failure when CONFIG_FORTIFY_SOURCE
is set. This is due to the additional hardening. Since the token
is expected to always fit in strval the VERIFY3U has been removed.
If somehow it doesn't, it will still be safely truncated.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #8999
Closes #9020
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Modify zfs-mount-generator to produce a dependency on new
zfs-import-key-*.service units, dynamically created at boot to call
zfs load-key for the encryption root, before attempting to mount any
encrypted datasets.
These units are created by zfs-mount-generator, and RequiresMountsFor on
the keyfile, if present, or call systemd-ask-password if a passphrase is
requested.
This patch includes suggestions from @Fabian-Gruenbichler, @ryanjaeb and
@rlaager, as well an adaptation of @rlaager's script to retry on
incorrect password entry.
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Fabian Grünbichler <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Antonio Russo <[email protected]>
Closes #8750
Closes #8848
|
|
|
|
|
|
|
|
|
|
| |
ZFS_ACLTYPE_POSIXACL has already been tested in zpl_init_acl(),
so no need to test again on POSIX ACL access.
Reviewed by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #9009
|
|
|
|
|
|
|
|
|
|
|
|
| |
External consumers such as Lustre require access to the dnode
interfaces in order to correctly manipulate dnodes.
Reviewed-by: James Simmons <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #8994
Closes #9027
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch corrects a small issue where the dsl_destroy_head()
code that runs when the async_destroy feature is disabled would
not properly decrypt the dataset before beginning processing.
If the dataset is not able to be decrypted, the optimization
code now simply does not run and the dataset is completely
destroyed in the DSL sync task.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #9021
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
struct pathname is originally from Solaris VFS, and it has been used
in ZoL to merely call VOP from Linux VFS interface without API change,
therefore pathname::pn_path* are unused and unneeded. Technically,
struct pathname is a wrapper for C string in ZoL.
Saves stack a bit on lookup and unlink.
(#if0'd members instead of removing since comments refer to them.)
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #9025
|