| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During a receive operation zvol_create_minors_impl() can wait
needlessly for the prefetch thread because both share the same tasks
queue. This results in hung tasks:
<3>INFO: task z_zvol:5541 blocked for more than 120 seconds.
<3> Tainted: P O 3.16.0-4-amd64
<3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
The first z_zvol:5541 (zvol_task_cb) is waiting for the long running
traverse_prefetch_thread:260
root@linux:~# cat /proc/spl/taskq
taskq act nthr spwn maxt pri mina
spl_system_taskq/0 1 2 0 64 100 1
active: [260]traverse_prefetch_thread [zfs](0xffff88003347ae40)
wait: 5541
spl_delay_taskq/0 0 1 0 4 100 1
delay: spa_deadman [zfs](0xffff880039924000)
z_zvol/1 1 1 0 1 120 1
active: [5541]zvol_task_cb [zfs](0xffff88001fde6400)
pend: zvol_task_cb [zfs](0xffff88001fde6800)
This change adds a dedicated, per-pool, prefetch taskq to prevent the
traverse code from monopolizing the global (and limited) system_taskq by
inappropriately scheduling long running tasks on it.
Reviewed-by: Albert Lee <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #6330
Closes #6890
Closes #7343
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes zfs-mount initscript trying to mount swap volumes
as filesystems or anything that has 'none' as a mountpoint
in /etc/fstab. Additionally, fixes trying to mount swap volumes
as a filesystem on RHEL. RHEL defines mountpoint for swap
as `swap`.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Georgy Yakovlev <[email protected]>
Closes #7346
Closes #7347
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Andriy Gapon <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Don Brady <[email protected]>
Reviewed-by: loli10K <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Approved by: Richard Lowe <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
Porting Notes:
* Re-enabled and tweaked the zpool_upgrade_007_pos test case
to successfully run in under 5 minutes.
OpenZFS-issue: https://www.illumos.org/issues/9164
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/0e776dc06a
Closes #6112
Closes #7336
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a devid for nvme devices. This is very similar to how the
other 'bus' (scsi|sata|usb) devids are generated. The devid
resides in a name/value pair in the leaf vdevs in a zpool config.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #7356
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, when ZFS wants to accelerate compression with QAT, it
passes a destination buffer of the same size as the source buffer.
Unfortunately, if the data is incompressible, QAT can actually
"compress" the data to be larger than the source buffer. When this
happens, the QAT driver will return a FAILED error code and print
warnings to dmesg. This patch fixes these issues by providing the
QAT driver with an additional buffer to work with so that even
completely incompressible source data will not cause an overflow.
This patch also resolves an error handling issue where
incompressible data attempts compression twice: once by QAT and
once in software. To fix this issue, a new (and fake) error code
CPA_STATUS_INOMPRESSIBLE has been added so that the calling code
can correctly account for the difference between a hardware
failure and data that simply cannot be compressed.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Weigang Li <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #7338
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes an issue where dsl_scan_prefetch_cb() might
add more prefetch I/Os to the prefetch queue after prefetching
has been completed. This was happening because that code was
checking scn->scn_suspending instead of scn->scn_prefetch_stop.
This occasionally triggered an ASSERT during ztest runs in
dsl_scan_fini() when the code attempted to destroy an AVL tree
that still had entires in it. This patch also includes a number
of spelling corrections and comment cleanups throughout
dsl_scan.c
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #7353
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clear executable bit on zfs-import.in, zfs-mount.in,
zfs-share.in, and zfs-zed.in. These are automake files and
should not be marked executable. This fixes a RPM build error
on Fedora 28.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #7355
Closes #7327
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calling uiomove() in mappedread() under the page lock can result
in a deadlock if the user space page needs to be faulted in.
Resolve the issue by dropping the page lock before the uiomove().
The inode range lock protects against concurrent updates via
zfs_read() and zfs_write().
Reviewed-by: Albert Lee <[email protected]>
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7335
Closes #7339
|
|
|
|
|
|
|
|
|
|
| |
RHEL/CentOS 6 supports sys/xattr.h eliminating the need for
libattr-devel as a dependency.
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: DHE <[email protected]>
Closes #7344
Closes #7351
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the wrong value
Authored by: Allan Jude <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Approved by: Garrett D'Amore <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/9321
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/92b05f3a18
Closes #7333
|
|
|
|
|
|
|
|
|
|
|
| |
If you run ./configure --with-config=srpm, it will not trigger
the user m4 scripts to populate the dracut and udev directories.
This causes a build error on Fedora 28. Make the dracut and
udev lines conditional to get around this.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #7326
Closes #7328
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, ZFS tracks statistics about calls to arc_read()
via the /proc/spl/kstat/zfs/<pool>/reads file for debugging.
Unfortunately, this file currently counts embedded bps as
disk reads since they are technically processed by the ZIO
layer. This pollutes the log since the ARC will never cache
embedded bps. This patch corrects this issue by preventing
the logging of embedded bp reads.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #7334
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When given an empty string as a rootds value, bootcfg -C fails with
the error message 'could not set nextboot: '' is an invalid name'.
This should be allowed because it represents clearing the nextboot
configuration.
Authored by: Paul Dagnelie <[email protected]>
Reviewed by: Chris Williamson <[email protected]>
Reviewed by: Sebastien Roy <[email protected]>
Reviewed by: Igor Kozhukhov <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/9193
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/504645d227
Closes #7230
|
|
|
|
|
|
|
|
|
|
| |
Updated the "Intent Log" section of the "zpool" manual page to
properly reflect the actions that may be performed.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Peter Ashford <[email protected]>
Closes #6938
Closes #7318
|
|
|
|
|
|
|
|
|
|
|
| |
Resolves importing root pool during boot in dracut. This case was
inadvertently broken with the module autoloading change in #7287.
Reviewed-by: Matthew Thode <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Kash Pande <[email protected]>
Closes #7322
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zfs_dbgmsg() should record a message by default. As a general
principal, these messages shouldn't be too verbose. Furthermore, the
amount of memory used is limited to 4MB (by default).
dprintf() should only record a message if this is a debug build, and
ZFS_DEBUG_DPRINTF is set in zfs_flags. This flag is not set by default
(even on debug builds). These messages are extremely verbose, and
sometimes nontrivial to compute.
SET_ERROR() should only record a message if ZFS_DEBUG_SET_ERROR is set
in zfs_flags. This flag is not set by default (even on debug builds).
This brings our behavior in line with illumos. Note that the message
format is unchanged (including file, line, and function, even though
these are not recorded on illumos).
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Prakash Surya <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #7314
|
|
|
|
|
|
|
|
|
|
| |
This patch simply corrects some spelling / grammar errors in
the QAT and encryption code comments. No functional changes
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #7319
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This treats /dev/nvme.. devices the same way as /dev/sd... devices. The
motivation behind this is that whole disk detection did not work on nvme
SSDs without that, because it DKC_UNKNOWN was returned for such devices.
Perhaps there should be a separate DKC_ type for this, but I don't know
enough about the code to know the implications of that.
Reviewed-by: Don Brady <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: timor <[email protected]>
Closes #7304
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds some comments describing the purpose of "portable"
dnode and objset flags so that it is clear when new flags should
be added to the repective flag masks. This patch includes no
functional changes.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #7313
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The changes piggyback JSON output support on top of channel programs
(#6558). This way the JSON output support is targeted to scripting
use cases and is easily maintainable since it really only touches
one function (zfs_do_channel_program()).
This patch ports Joyent's JSON nvlist library from illumos to enable
easy JSON printing of channel program output nvlist. To keep the
delta small I also took advantage of the fact that printing in
zfs_do_channel_program() was almost always done before exiting
the program.
Reviewed by: Matt Ahrens <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alek Pinchuk <[email protected]>
Closes #7281
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In vdev_queue_aggregate() the zio_execute() bypass should not be
called under the vdev queue lock. This can result in a deadlock
as shown in the stack traces below.
Drop the vdev queue lock then walk the parents of the aggregate IO
to determine the list of component IOs to be bypassed. This can
be done safely without holding the io_lock since the new aggregate
IO has not yet been returned and its parents cannot change.
--- THREAD 1 ---
arc_read()
zio_nowait()
zio_vdev_io_start()
vdev_queue_io() <--- mutex_enter(vq->vq_lock)
vdev_queue_io_to_issue()
vdev_queue_aggregate()
zio_execute()
zio_vdev_io_assess()
zio_wait_for_children() <- mutex_enter(zio->io_lock)
--- THREAD 2 --- (inverse order)
arc_read()
zio_change_priority() <- mutex_enter(zio->zio_lock)
vdev_queue_change_io_priority() <- mutex_enter(vq->vq_lock)
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7307
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds some handling to the QAT acceleration functions
that allows them to handle buffers that are not aligned with the
page cache. At the moment this never happens since callers only
happen to work with page-aligned buffers, but this code should
prevent headaches if this isn't always true in the future. This
patch also adds some cleanups to align the QAT compression code
with the encryption and checksumming code.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Weigang Li <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #7305
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the pool is suspended, record whether it was due to an I/O error or
due to MMP writes failing to succeed within the required time.
Change spa_suspended from uint8_t to zio_suspend_reason_t to store the
reason.
When userspace queries pool status via spa_tryimport(), report the
reason the pool was suspended in a new key,
ZPOOL_CONFIG_SUSPENDED_REASON.
In libzfs, when interpreting the returned config nvlist, report
suspension due to MMP with a new pool status enum value,
ZPOOL_STATUS_IO_FAILURE_MMP.
In status_callback(), which generates and emits the message when 'zpool
status' is executed, add a case to print an appropriate message for the
new pool status enum value.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #7296
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables acceleration of SHA256 checksums using Intel
Quick Assist Technology. This patch also fixes up and refactors
some of the code from QAT encryption to make the behavior
consistent.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chengfeix Zhu <[email protected]>
Signed-off-by: Weigang Li <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #7295
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZFS Performance test concurrency should be lowered for better latency
Work by Stephen Blinick.
Nightly performance runs typically consist of two levels of concurrency;
and both are fairly high.
Since the IO runs are to a ZFS filesystem, within a zpool, which is
based on some variable number of vdev's, the amount of IO driven to each
device is variable. Additionally, different device types (HDD vs SSD,
etc) can generally handle a different amount of concurrent IO before
saturating.
Nevertheless, in practice, it appears that most tests are well past the
concurrency saturation point and therefore both perform with the same
throughput, the maximum of the device. Because the queuedepth to the
device(s) is so high however, the latency is much higher than the best
possible at that throughput, and increases linearly with the increase in
concurrency.
This means that changes in code that impact latency during normal
operation (before saturation) may not be apparent when a large component
of the measured latency is from the IO sitting in a queue to be
serviced. Therefore, changing the concurrency settings is recommended
Authored by: Stephen Blinick <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Dan Kimmel <[email protected]>
Reviewed by: John Wren Kennedy <[email protected]>
Ported-by: John Wren Kennedy <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/9076
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/562
Upstream bug: DLPX-45477
Closes #7302
|
|
|
|
|
|
|
|
|
| |
In zfs receive, the function receive_spill should account
for spill block endian conversion as a defensive measure.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Paul Zuchowski <[email protected]>
Closes #7300
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
decompression
With compressed ARC (bug 6950) we use up to 25% of our CPU to decompress
indirect blocks, under a workload of random cached reads. To reduce this
decompression cost, we would like to increase the size of the dbuf cache so
that more indirect blocks can be stored uncompressed.
If we are caching entire large files of recordsize=8K, the indirect blocks
use 1/64th as much memory as the data blocks (assuming they have the same
compression ratio). We suggest making the dbuf cache be 1/32nd of all memory,
so that in this scenario we should be able to keep all the indirect blocks
decompressed in the dbuf cache. (We want it to be more than the 1/64th that
the indirect blocks would use because we need to cache other stuff in the dbuf
cache as well.)
In real world workloads, this won't help as dramatically as the example above,
but we think it's still worth it because the risk of decreasing performance is
low. The potential negative performance impact is that we will be slightly
reducing the size of the ARC (by ~3%).
Porting Notes:
* Added modules options to zfs-module-parameters.5 man page.
* Preserved scaling based on target ARC size rather than max ARC size.
Authored by: George Wilson <[email protected]>
Reviewed by: Dan Kimmel <[email protected]>
Reviewed by: Prashanth Sreenivasa <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Ported-by: Brian Behlendorf <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/9188
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/564
Upstream bug: DLPX-46942
Closes #7273
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Historically a dynamic misc minor number was registered for the
/dev/zfs device in order to prevent minor number collisions. This
was fine but it prevented us from being able to use the kernel
module auto-loaded which requires a known reserved value.
Resolve this issue by adding a configure test to find an available
misc minor number which can then be used in MODULE_ALIAS_MISCDEV at
build time. By adding this alias the zfs kmod is added to the list
of known static-nodes and the systemd-tmpfiles-setup-dev service
will create a /dev/zfs character device at boot time.
This in turn allows us to update the 90-zfs.rules file to make it
aware this is a static node. The upshot of this is that whenever
a process (zpool, zfs, zed) opens the /dev/zfs the kmods will be
automatic loaded. This even works for unprivileged users so there
is no longer a need to manually load the modules at boot time.
As an additional bonus the zed now no longer needs to start after
the zfs-import.service since it will trigger the module load.
In the unlikely event the minor number we selected conflicts with
another out of tree unregistered minor number the code falls back
to dynamically allocating it. In this case the modules again
must be manually loaded.
Note that due to the change in the method of registering the minor
number the zimport.sh test case may incorrectly fail when the
static node for the installed packages is created instead of the
dynamic one. This issue will only transiently impact zimport.sh
for this single commit when we transition and are mixing and
matching methods.
Reviewed-by: Fabian Grünbichler <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
TEST_ZIMPORT_SKIP="yes"
Closes #7287
|
|
|
|
|
|
|
|
|
| |
When it's set, a DTL range will be cleared even if its scan/scrub had
errors. This allows to work around resilver/scrub upon import when the
pool has errors.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #7293
|
|
|
|
|
|
|
|
|
|
| |
Change zfs destroy logic so destroying begins before
the entire list of snapshots is built.
Reviewed-by: loli10K <[email protected]>
Reviewed-by: Kash Pande <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Paul Zuchowski <[email protected]>
Closes #7271
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We hit an illegal memory access in the zrlock trace point. The problem
is that zrl->zr_owner and zrl->zr_caller are assigned locklessly. And if
zrl->zr_owner got assigned a longer string between when __string()
calculate the strlen, and when __assign_str() does strcpy. The copy will
overflow the buffer.
==
For example:
Initial condition:
zrl->zr_owner = A
zrl->zr_caller = "abc"
Thread A Thread B
-------------------------------------------------
if (zrl->zr_owner == A) {
DTRACE_PROBE2() {
__string() {
strlen(zrl->zr_caller) -> 3
allocate buf[4]
}
zrl->zr_owner = B
zrl->zr_caller = "abcd"
__assign_str() {
strcpy(buf, zrl->zr_caller) <- buffer overflow
==
Dereferencing zrl->zr_owner->pid may also be problematic, in that the
zrl->zr_owner got changed to other task, and that task exits, freeing
the task_struct. This should be very unlikely, as the other task need to
zrl_remove and exit between the dereferencing zr->zr_owner and
zr->zr_owner->pid. Nevertheless, we'll deal with it as well.
To fix the zrl->zr_caller issue, instead of copy the string content, we
just copy the pointer, this is safe because it always points to
__func__, which is static. As for the zrl->zr_owner issue, we pass in
curthread instead of using zrl->zr_owner.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #7291
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a single pool contains more vdevs than the CONFIG_HZ for
for the kernel the mmp thread will not delay properly. Switch
to using cv_timedwait_sig_hires() to handle higher resolution
delays.
This issue was reported on Arch Linux where HZ defaults to only
100 and this could be fairly easily reproduced with a reasonably
large pool. Most distribution kernels set CONFIG_HZ=250 or
CONFIG_HZ=1000 and thus are unlikely to be impacted.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7205
Closes #7289
|
|
|
|
|
|
|
|
|
|
|
|
| |
A config lock should be held while vdev_count_leaves() walks the tree,
otherwise the pointers reference may become invalid during the walk.
SCL_VDEV is a minimal lock provided for such uses cases.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #7286
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When multihost is disabled on a pool, and the pool is resumed via zpool
clear, within a single cycle of the mmp thread's loop (e.g. while it's
in the cv_timedwait call), both mmp_last_write and mmp_delay should be
updated.
The original code mistakenly treated the two cases as if they could not
occur at the same time.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #7286
|
|
|
|
|
|
|
|
|
|
|
|
| |
PR #7208 was a patch to allow non-reserved pool names which begin with
mirror, raidz, spare (but do not equal), however we'd rather document
it in the man page for compatibility with other OpenZFS implementations,
to avoid pool names that may not work on non-Linux platforms.
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #7216
|
|
|
|
|
|
|
|
|
|
|
| |
With rpm-software-management/rpm@5e94633 a package version containing
invalid characters (most commonly a double '-') causes the kmod package
generation to terminate with an error. This change takes advantage of
the newly introduced rpm macro "_wrong_version_format_terminate_build"
to allow kmod packages to be built.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #7284
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
get_format_prompt_string() and zpool_state_to_name() return
a string literal which is read-only, thus they should return
`const char*`.
zpool_get_prop_string() returns a non-const string after
successful nv-lookup, and returns a string literal otherwise.
Since this function is designed to be used for read-only purpose,
the return type should also be `const char*`.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #7285
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for acceleration of AES-GCM encryption
with Intel Quick Assist Technology.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chengfeix Zhu <[email protected]>
Signed-off-by: Weigang Li <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #7282
|
|
|
|
|
|
|
|
|
|
| |
Due to zpool create auto-partioning in Linux (i.e. sdb1),
certain utilities need to use the parition (sdb1) while
others use the whole disk name (sdb).
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Paul Zuchowski <[email protected]>
Closes #6939
Closes #7261
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change file related checks to use user namespaces and make
sure involved uids/gids are mappable in the current
namespace.
Note that checks without file ownership information will
still not take user namespaces into account, as some of
these should be handled via 'zfs allow' (otherwise root in a
user namespace could issue commands such as `zpool export`).
This also adds an initial user namespace regression test
for the setgid bit loss, with a user_ns_exec helper usable
in further tests.
Additionally, configure checks for the required user
namespace related features are added for:
* ns_capable
* kuid/kgid_has_mapping()
* user_ns in cred_t
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Wolfgang Bumiller <[email protected]>
Closes #6800
Closes #7270
|
|
|
|
|
|
|
|
|
|
|
|
| |
The test could fail when attempting to write to a newly created
volume which was missing its device node. Resolve the issue by
calling block_device_wait() which blocks until udev creates the
needed entry.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7276
Closes #7277
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement a new helper within_tolerance to test if a value
is within range of a target.
Because the dbufstats and dbufs kstat file are being read
at slightly different times, it is possible for stats to be
slightly off. Use within_tolerance to determine if the value
is "close enough" to the target.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Giuseppe Di Natale <[email protected]>
Closes #7239
Closes #7266
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some usage patterns like send/recv of replication streams can
produce a large number of events. In such a case, the current
all-syslog.sh zedlet will hold up to its name, and flood the
logs with mostly redundant information. Two mitigate this
situation, this changeset introduces to new variables
ZED_SYSLOG_SUBCLASS_INCLUDE and ZED_SYSLOG_SUBCLASS_EXCLUDE
to zed.rc that give more control over which event classes end
up in the syslog.
Reviewed-by: loli10K <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Signed-off-by: Daniel Kobras <[email protected]>
Closes #6886
Closes #7260
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Once per pass through the MMP thread's loop, the vdev tree is walked to
find a suitable leaf to write the next MMP block to. If no such leaf is
found, the thread sleeps for a while and resumes at the top of the loop.
Add an entry to multihost_history when no leaf can be found, and record
the reason in the error column. The error code for such entries is a
bitfield, displayed in hex:
0x1 At least one vdev (interior or leaf) was not writeable.
0x2 At least one writeable leaf vdev was found, but it had a pending
MMP write.
timestamp = the time in seconds since the epoch when no leaf could be
found originally.
duration = the time (in ns) during which no MMP block was written for
this reason. This does not include the preceeding inter-write period
nor the following inter-write period.
vdev_guid = the number of sequential cycles of the MMP thread looop when
this occurred.
Sample output, truncated to fit:
For records of skipped MMP writes the right-most column, vdev_path, is
reported as "-".
id txg timestamp error duration mmp_delay vdev_guid ...
936 11 1520036441 0 146264 891422313 1740883117838 ...
937 11 1520036441 0 163956 888356657 7320395061548 ...
938 11 1520036442 0 130690 885314969 7320395061548 ...
939 11 1520036442 0 2001068577 882296582 1740883117838 ...
940 11 1520036443 0 161806 882296582 7320395061548 ...
941 11 1520036443 0x2 0 998020546 1 ...
942 11 1520036444 0 136585 998020546 7320395061548 ...
943 11 1520036444 0x2 0 998020257 1 ...
944 11 1520036445 5 2002662964 994160219 1740883117838 ...
945 11 1520036445 0x2 998073118 994160219 3 ...
946 11 1520036447 0 247136 994160219 7320395061548 ...
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #7212
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If something holds the config lock as a writer for too long, MMP will
fail to issue MMP writes in a timely manner. This will result either in
the pool being suspended, or in an extreme case, in the pool not being
protected.
If the time to acquire the config lock exceeds 1/10 of the minimum
zfs_multihost_interval, report it in the zfs debug log.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #7212
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Datasets can be busy when calling zfs destroy. Introduce
a helper function to destroy datasets and use it to destroy
datasets in zfs_allow_004_pos, zfs_promote_008_pos, and
zfs_destroy_002_pos.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Giuseppe Di Natale <[email protected]>
Closes #7224
Closes #7246
Closes #7249
Closes #7267
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) The Coverity Scan reports some issues for the project
quota patch, including:
1.1) zfs_prop_get_userquota() directly uses the const quota
type value as the condition check by wrong.
1.2) dmu_objset_userquota_get_ids() may cause dnode::dn_newgid
to be overwritten by dnode::dn->dn_oldprojid.
2) This patch fixes related issues. It also enhances the logic
for zfs_project_item_alloc() to avoid buffer overflow.
3) Skip project quota ability check if does not change project
quota related things (id or flag). Otherwise, it will cause
chattr (for other non project quota flags) operation failed
if project quota disabled.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Fan Yong <[email protected]>
Closes #7251
Closes #7265
|
|
|
|
|
|
|
|
|
|
| |
As of https://github.com/torvalds/linux/commit/fb6d47a, get_disk()
is now get_disk_and_module(). Add a configure check to determine
if we need to use get_disk_and_module().
Reviewed-by: loli10K <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Giuseppe Di Natale <[email protected]>
Closes #7264
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change checksum & IO delay ratelimit thresholds from 5/sec to 20/sec.
This allows zed to actually trigger if a bunch of these events arrive in
a short period of time (zed has a threshold of 10 events in 10 sec).
Previously, if you had, say, 100 checksum errors in 1 sec, it would get
ratelimited to 5/sec which wouldn't trigger zed to fault the drive.
Also, convert the checksum and IO delay thresholds to module params for
easy testing.
Reviewed-by: loli10K <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #7252
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In zil_lwb_commit() with TX_WRITE, we copy the log write record (lrw)
into the log write block (lwb) and send it off using zil_lwb_add_txg().
If we also have WR_NEED_COPY, we additionally copy the lwr's data into
the lwb to be sent off. If the lwr + data doesn't fit into the lwb, we
send the lrw and as much data as will fit (dnow bytes), then go back
and do the same with the remaining data.
Each time through this loop we're sending dnow data bytes. I.e.
zil_itx_needcopy_bytes should be incremented by dnow.
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chris Dunlop <[email protected]>
Closes #6988
Closes #7176
|