| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update many return and assignment statements to follow the convention
of using the SET_ERROR macro when returning a hard-coded non-zero
value from a function. This aids debugging by recording the error
codes in the debug log.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Signed-off-by: Ned Bass <[email protected]>
Closes #6441
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While investigating https://github.com/zfsonlinux/zfs/issues/6425 I
noticed that ioctl ZIOs were not setting zio->io_delay correctly. They
would set the start time in zio_vdev_io_start(), but never set the end
time in zio_vdev_io_done(), since ioctls skip it and go straight to
zio_done(). This was causing spurious "delayed IO" events to appear,
which would eventually get rate-limited and displayed as
"Missed events" messages in zed.
To get around the problem, this patch only sets zio->io_delay for read
and write ZIOs, since that's all we care about anyway.
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #6425
Closes #6440
|
|
|
|
|
|
|
|
|
|
|
| |
At import time spa_import() calls zvol_create_minors() directly: with
the current implementation we have no way to avoid device node
creation when volmode=none.
Fix this by enforcing volmode=none directly in zvol_alloc().
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #6426
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we are in the middle of an incremental 'zfs receive', the child
.../%recv will exist. If we run 'zfs promote' .../%recv, it will "work",
but then zfs gets confused about the status of the new dataset.
Attempting to do this promote should be an error.
Similarly renaming .../%recv datasets should not be allowed.
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #4843
Closes #6339
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Andriy Gapon <[email protected]>
Reviewed by: Dan Kimmel <[email protected]>
Reviewed by: Prakash Surya <[email protected]>
Approved by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/7915
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/836a00c
Closes #6375
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Andriy Gapon <[email protected]>
Reviewed by: Prakash Surya <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Approved by: Dan McDonald <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/8373
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7f04961
Closes #6403
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Justin Hibbits <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Approved by: Dan McDonald <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/8508
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/15fc257
Closes #6404
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Redefine the SET_ERROR macro in terms of __dprintf() so the error
return codes get logged as both tracepoint events (if tracepoints are
enabled) and as ZFS debug log entries. This also allows us to use
the same definition of SET_ERROR() in kernel and user space.
Define a new debug flag ZFS_DEBUG_SET_ERROR=512 that may be bitwise
or'd into zfs_flags. Setting this flag enables both dprintf() and
SET_ERROR() messages in the debug log. That is, setting
ZFS_DEBUG_SET_ERROR and ZFS_DEBUG_DPRINTF|ZFS_DEBUG_SET_ERROR are
equivalent (this was done for sake of simplicity). Leaving
ZFS_DEBUG_SET_ERROR unset suppresses the SET_ERROR() messages which
helps avoid cluttering up the logs.
To enable SET_ERROR() logging, run:
echo 1 > /sys/module/zfs/parameters/zfs_dbgmsg_enable
echo 512 > /sys/module/zfs/parameters/zfs_flags
Remove the zfs_set_error_class tracepoints event class since
SET_ERROR() now uses __dprintf(). This sacrifices a bit of
granularity when selecting individual tracepoint events to enable but
it makes the code simpler.
Include file, function, and line number information in debug log
entries. The information is now added to the message buffer in
__dprintf() and as a result the zfs_dprintf_class tracepoints event
class was changed from a 4 parameter interface to a single parameter.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ned Bass <[email protected]>
Closes #6400
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Check in the DMU whether an object record in a send stream being
received contains an unsupported dnode slot count, and return an
error if it does. Failure to catch an unsupported dnode slot count
would result in a panic when the SPA attempts to increment the
reference count for the large_dnode feature and the pool has the
feature disabled. This is not normally an issue for a well-formed
send stream which would have the DMU_BACKUP_FEATURE_LARGE_DNODE flag
set if it contains large dnodes, so it will be rejected as
unsupported if the required feature is disabled. This change adds a
missing object record field validation.
Add missing stream feature flag checks in
dmu_recv_resume_begin_check().
Consolidate repetitive comment blocks in dmu_recv_begin_check().
Update zstreamdump to print the dnode slot count (dn_slots) for an
object record when running in verbose mode.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Signed-off-by: Ned Bass <[email protected]>
Closes #6396
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'zpool clear' should be able to resume I/O on suspended, but otherwise
healthy, pools.
4a283c7 accidentally introduced a new code path where we call
txg_wait_synced() on the suspended pool before we had the chance to
resume I/O via zio_resume(): this results in the 'zpool clear'
command hanging indefinitely, waiting for a TXG that cannot be synced.
Fix this by avoiding the call to txg_wait_synced().
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: loli10K <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #6399
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no need to perform the activity check before detecting that the
user must set the system hostid, because the pool's multihost property
is on, but spa_get_hostid() returned 0. The initial call to
vdev_uberblock_load() provided the information required.
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #6388
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a callback to wake all running mmp threads when
zfs_multihost_interval is changed.
This is necessary when the interval is changed from a very large value
to a significantly lower one, while pools are imported that have the
multihost property enabled.
Without this commit, the mmp thread does not wake up and detect the new
interval until after it has waited the old multihost interval time. A
user monitoring mmp writes via the provided kstat would be led to
believe that the changed setting did not work.
Added a test in the ZTS under mmp to verify the new functionality is
working.
Added a test to ztest which starts and stops mmp threads, and calls into
the code to signal sleeping mmp threads, to test for deadlocks or
similar locking issues.
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #6387
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The config lock must be held for the duration of the MMP write.
Since the I/Os are executed via map_nowait(), the done function
is the only place where we know the write has completed.
Since SCL_STATE is taken as reader, overlapping I/Os do not
create a deadlock. The refcount is simply increased when new
I/Os are queued and decreased when I/Os complete.
Test case added which exercises the probe IO call path to
verify the fix and prevent a regression.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #6394
|
|
|
|
|
|
|
|
|
| |
This reverts commit cc9c6bc, which has been causing intermittent
test failures on buildbot. A correct fix for this locking issue
has been applied in a separate patch.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 379ca9c Multi-modifier protection (MMP) used HZ to convert
nanoseconds to ticks for use with cv_timedwait() and ddi_get_lbolt().
The correct macro is hz, which is defined within the SPL for kernel
space, and within zfs_context.h for user space.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #6357
Closes #6360
|
|
|
|
|
|
|
|
|
| |
CID 165755: Division or modulo by zero (DIVIDE_BY_ZERO)
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Signed-off-by: Giuseppe Di Natale <[email protected]>
Closes #6352
|
|
|
|
|
|
|
|
|
| |
Commit torvalds/linux@4e4cbee9. The bio->bi_error field was
replaced with bio->bi_status which is an enum that describes
all possible error types.
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #6351
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an IO fails then zio_vdev_io_done() can call vdev_probe()
to determine the health of the vdev. This is safe as long as
the original zio was submitted with zio_wait() and holds the
SCL_STATE_ALL lock over the operation.
If zio_no_wait() was used then the done callback will submit
the probe IO outside the SCL_STATE_ALL lock and hit this
ASSERT in zio_create()
ASSERT(!vd || spa_config_held(spa, SCL_STATE_ALL, RW_READER));
Resolve the issue by only allowing vdev_probe() to be called
when there's a waiter indicating the caller is using zio_wait().
This assumes that caller is still holding SCL_STATE_ALL.
This issue isn't MMP specific but was surfaced when testing.
Without this patch it can be reproduced by running:
zpool set multihost on <pool>
zinject -d <vdev> -e io -T write -f 50 <pool> -L uber
Reviewed-by: Olaf Faaland <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #745
Closes #6279
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add multihost=on|off pool property to control MMP. When enabled
a new thread writes uberblocks to the last slot in each label, at a
set frequency, to indicate to other hosts the pool is actively imported.
These uberblocks are the last synced uberblock with an updated
timestamp. Property defaults to off.
During tryimport, find the "best" uberblock (newest txg and timestamp)
repeatedly, checking for change in the found uberblock. Include the
results of the activity test in the config returned by tryimport.
These results are reported to user in "zpool import".
Allow the user to control the period between MMP writes, and the
duration of the activity test on import, via a new module parameter
zfs_multihost_interval. The period is specified in milliseconds. The
activity test duration is calculated from this value, and from the
mmp_delay in the "best" uberblock found initially.
Add a kstat interface to export statistics about Multiple Modifier
Protection (MMP) updates. Include the last synced txg number, the
timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV
label that received the last MMP update, and the VDEV path. Abbreviated
output below.
$ cat /proc/spl/kstat/zfs/mypool/multihost
31 0 0x01 10 880 105092382393521 105144180101111
txg timestamp mmp_delay vdev_guid vdev_label vdev_path
20468 261337 250274925 68396651780 3 /dev/sda
20468 261339 252023374 6267402363293 1 /dev/sdc
20468 261340 252000858 6698080955233 1 /dev/sdx
20468 261341 251980635 783892869810 2 /dev/sdy
20468 261342 253385953 8923255792467 3 /dev/sdd
20468 261344 253336622 042125143176 0 /dev/sdab
20468 261345 253310522 1200778101278 2 /dev/sde
20468 261346 253286429 0950576198362 2 /dev/sdt
20468 261347 253261545 96209817917 3 /dev/sds
20468 261349 253238188 8555725937673 3 /dev/sdb
Add a new tunable zfs_multihost_history to specify the number of MMP
updates to store history for. By default it is set to zero meaning that
no MMP statistics are stored.
When using ztest to generate activity, for automated tests of the MMP
function, some test functions interfere with the test. For example, the
pool is exported to run zdb and then imported again. Add a new ztest
function, "-M", to alter ztest behavior to prevent this.
Add new tests to verify the new functionality. Tests provided by
Giuseppe Di Natale.
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Ned Bass <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #745
Closes #6279
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Dave Eddy <[email protected]>
Reviewed by: Patrick Mooney <[email protected]>
Reviewed by: Joshua M. Clulow <[email protected]>
Reviewed by: Josh Wilsdon <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Reviewed by: Alan Somers <[email protected]>
Reviewed by: Andrew Stormont <[email protected]>
Approved by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/6939
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ce1577b
Closes #6328
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The volmode property may be set to control the visibility of ZVOL
block devices.
This allow switching ZVOL between three modes:
full - existing fully functional behaviour (default)
dev - hide partitions on ZVOL block devices
none - not exposing volumes outside ZFS
Additionally the new zvol_volmode module parameter can be used to
control the default behaviour.
This functionality can be used, for instance, on "backup" pools to
avoid cluttering /dev with unneeded zd* devices.
Original-patch-by: mav <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: loli10K <[email protected]>
Signed-off-by: loli10K <[email protected]>
FreeBSD-commit: https://github.com/freebsd/freebsd/commit/dd28e6bb
Closes #1796
Closes #3438
Closes #6233
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Yuri Pankov <[email protected]>
Reviewed by: Robert Mustacchi <[email protected]>
Approved by: Joshua M. Clulow <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Ported-by: Brian Behlendorf <[email protected]>
Porting Notes:
* All hunks unrelated to ZFS were dropped.
OpenZFS-issue: https://www.illumos.org/issues/5428
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4585130
Closes #6326
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sync thread is concurrently modifying dn_phys->dn_nlevels
while dbuf_dirty() is trying to assert something about it, without
holding the necessary lock. We need to move this assertion further down
in the function, after we have acquired the dn_struct_rwlock.
Authored by: Matthew Ahrens <[email protected]>
Reviewed by: Pavel Zakharov <[email protected]>
Reviewed by: Serapheim Dimitropoulos <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/8126
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/0ef125d
Closes #6314
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Matthew Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Alex Reece <[email protected]>
Reviewed by: Yuri Pankov <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/8067
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8173085
Closes #6319
|
|
|
|
|
|
|
|
|
|
| |
Illumos 4080 inadvertently allows 'zpool clear' on readonly pools: fix
this by reintroducing a check (POOL_CHECK_READONLY) in zfs_ioc_clear
registration code.
Signed-off-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #6306
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, there is no way to pause a scrub. Pausing may
be useful when the pool is busy with other I/O to preserve
bandwidth.
This patch adds the ability to pause and resume scrubbing.
This is achieved by maintaining a persistent on-disk scrub state.
While the state is 'paused' we do not scrub any more blocks.
We do however perform regular scan housekeeping such as
freeing async destroyed and deadlist blocks while paused.
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Thomas Caputi <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alek Pinchuk <[email protected]>
Closes #6167
|
|
|
|
|
|
|
|
|
|
|
|
| |
On the single core machine the system may hang when the
spa_namespare_lock acquisition fails in the zvol_first_open
function. It returns -ERESTARTSYS error what causes the
endless loop in __blkdev_get function.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Arkadiusz Bubała <[email protected]>
Closes #6283
Closes #6312
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Matthew Ahrens <[email protected]>
Reviewed by: Prakash Surya <[email protected]>
Reviewed by: George Wilson <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
The problem is that zfs_get_data() supplies a stale zgd_bp to
dmu_sync(), which we then nopwrite against.
zfs_get_data() doesn't hold any DMU-related locks, so after it
copies db_blkptr to zgd_bp, dbuf_write_ready() could change
db_blkptr, and dbuf_write_done() could remove the dirty record.
dmu_sync() then sees the stale BP and that the dbuf it not dirty,
so it is eligible for nop-writing.
The fix is for dmu_sync() to copy db_blkptr to zgd_bp after
acquiring the db_mtx. We could still see a stale db_blkptr,
but if it is stale then the dirty record will still exist and
thus we won't attempt to nopwrite.
OpenZFS-issue: https://www.illumos.org/issues/8378
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3127742
Closes #6293
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Andriy Gapon <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Pavel Zakharov <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
The existing kernel-side code only provides a method to rollback to a
latest snapshot, whatever it happens to be at the time when the rollback
is actually done. That could be unsafe or confusing in environments
where concurrent DSL changes are possible as the resulting state could
correspond to a newer or older snapshot than the originally requested
one.
This change allows to amend that method such that the rollback is
performed only when the latest snapshot has a specific name. That is,
if a new snapshot is concurrently created or the target snapshot is
destroyed, then no rollback is done and EXDEV error is returned.
New libzfs_core function lzc_rollback_to() is provided for the new
functionality. libzfs is changed to use lzc_rollback_to() to implement
zfs rollback command.
Perhaps we should return different errors to distinguish the case where
the desired snapshot exists but it's not the latest snapshot and the
case where the desired snapshot does not exist.
OpenZFS-issue: https://www.illumos.org/issues/7600
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3d645eb
Closes #6292
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Andriy Gapon <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Dan Kimmel <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/7910
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/cb6af4b
Closes #6291
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Matthew Ahrens <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Pavel Zakharov <[email protected]>
Reviewed by: George Wilson <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
The problem is that when dsl_bookmark_destroy_check() is
executed from open context (the pre-check), it fills in
dbda_success based on the existence of the bookmark. But
the bookmark (or containing filesystem as in this case)
can be destroyed before we get to syncing context. When
we re-run dsl_bookmark_destroy_check() in syncing context,
it will not add the deleted bookmark to dbda_success,
intending for dsl_bookmark_destroy_sync() to not process
it. But because the bookmark is still in dbda_success from
the open-context call, we do try to destroy it.
The fix is that dsl_bookmark_destroy_check() should not
modify dbda_success when called from open context.
OpenZFS-issue: https://www.illumos.org/issues/8377
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b0b6fe3
Closes #6286
|
|
|
|
|
|
|
|
|
|
|
|
| |
Resolves issues discovered when porting to OpenZFS.
* Lint warnings.
* Made dnode_move_impl() large dnode aware. This
functionality is currently unused on Linux.
Reviewed-by: Ned Bass <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #6262
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make zfs_arc_meta_limit_percent and zfs_arc_dnode_limit_percent behave
as you would expect from zfs-module-parameters.5.
- recalculate arc_meta_limit if zfs_arc_meta_limit_percent changes
- recalculate arc_dnode_limit if zfs_arc_dnode_limit_percent changes
- correctly set arc_meta_limit and arc_dnode_limit if zfs_arc_max or
zfs_arc_meta_min changes
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chris Dunlop <[email protected]>
Closes #6269
|
|
|
|
|
|
|
|
|
|
|
| |
GCC 7.1 with will warn when we're not checking the snprintf()
return code in cases where the buffer could be truncated. This
patch either checks the snprintf return code (where applicable),
or simply disables the warnings (ztest.c).
Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #6253
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 8542ef8 allowed optional IOs to be aggregated beyond
the specified aggregation limit. Since the aggregation limit
was also used to enforce the maximum block size, setting
`zfs_vdev_aggregation_limit=16777216` could result in an
attempt to allocate an ABD larger than 16M.
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #6259
Closes #6270
|
|
|
|
|
|
|
|
|
|
| |
Use zv_state_lock to protect all members of zvol_state structure, add
relevant ASSERT()s. Take zv_suspend_lock before zv_state_lock, do not
hold zv_state_lock across suspend/resume.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Boris Protopopov <[email protected]>
Closes #6226
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Andriy Gapon <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Dan Kimmel <[email protected]>
Reviewed by: Saso Kiselkov <[email protected]>
Approved by: Dan McDonald <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/5220
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/403a8da
Closes #6260
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Andrew Stormont <[email protected]>
Reviewed by: Andriy Gapon <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Approved by: Dan McDonald <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/8264
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/a4b8c9a
Closes #6254
|
|
|
|
|
|
|
|
| |
In bqueue_dequeue(), call cv_signal() with bq_lock held.
Re-enable rsend_009_pos to test the fix.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Boris Protopopov <[email protected]>
Closes #5887
|
|
|
|
|
|
|
|
|
|
|
| |
In zfs/dmu_object and icp/core/kcf_sched, the CPU_SEQID macro
should be surrounded by `kpreempt_disable` and `kpreempt_enable`
calls to avoid a Linux kernel BUG warning. These code paths use
the cpuid to minimize lock contention and is is safe to reschedule
the process to a different processor at any time.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Morgan Jones <[email protected]>
Closes #6239
|
|
|
|
|
|
|
|
|
|
|
| |
In the original form of device error injection, it was an all or nothing
situation. To help simulate intermittent error conditions, you can now
specify a real number percentage value. This is also very useful for our
ZFS fault diagnosis testing and for injecting intermittent errors during
load testing.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #6227
|
|
|
|
|
|
|
|
| |
Use queue_flag_set_unlocked() in zvol_alloc().
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Boris Protopopov <[email protected]>
Issue #6226
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5559ba0 added zv_state_lock to protect zvol_state_t internal data:
this, however, doesn't guard zv->zv_open_count and
zv->zv_disk->private_data in zvol_remove_minors_impl().
Fix this by taking zv->zv_state_lock before we check its zv_open_count.
P1 (z_zvol) P2 (systemd-udevd)
--- ---
zvol_remove_minors_impl()
: zv->zv_open_count==0
zvol_open()
->mutex_enter(zv_state_lock)
: zv->zv_open_count++
->mutex_exit(zv_state_lock)
->mutex_enter(zv->zv_state_lock)
->zvol_remove(zv)
->mutex_exit(zv->zv_state_lock)
: zv->zv_disk->private_data = NULL
->zvol_free()
-->ASSERT(zv->zv_open_count==0) *
zvol_release()
: zv = disk->private_data
->ASSERT(zv && zv->zv_open_count>0) *
--- ---
* ASSERT() fails
Reviewed by: Boris Protopopov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]
Signed-off-by: loli10K <[email protected]>
Closes #6213
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This continues what was started in
0eef1bde31d67091d3deed23fe2394f5a8bf2276 by fully converting zvols
to avoid unnecessary dnode_hold() calls. This saves a small amount
of CPU time and slightly improves latencies of operations on zvols.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes #6058
|
|
|
|
|
|
|
|
| |
Buildbots and zfs-tests regularly see 7 kilobytes of stack
usage with this function. Convert self-calls to iterations
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: DHE <[email protected]>
Closes #6219
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Paul Dagnelie <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Pavel Zakharov <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Reviewed-by: Kash Pande <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
The send size estimate for a zvol can be too low, if the size of the
record headers (dmu_replay_record_t's) is a significant portion of the
size. This is typically the case when the data is highly compressible,
especially with embedded blocks.
The problem is that dmu_adjust_send_estimate_for_indirects() assumes
that blocks are the size of the "recordsize" property (128KB). However,
for zvols, the blocks are the size of the "volblocksize" property (8KB).
Therefore, we estimate that there will be 16x less record headers than
there really will be.
The fix is to check the type of the object set (whether it is a zvol or
not) and pick the appropriate property. In addition, while we are at it,
we also add the size of the BEGIN and END records to the estimate.
OpenZFS-issue: https://www.illumos.org/issues/8056
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/faf09cd
Closes #6205
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Matthew Ahrens <[email protected]>
Reviewed by: Dan Kimmel <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
dbuf_evict_notify() holds the dbuf_evict_lock while checking if it should
do the eviction itself (because the evict thread is not able to keep up).
This can result in massive lock contention. It isn't necessary to hold
the lock, because if we make the wrong choice occasionally, nothing bad
will happen. This commit results in a ~60% performance improvement for
ARC-cached sequential reads.
OpenZFS-issue: https://www.illumos.org/issues/8156
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f73e5d9
Closes #6204
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dmu_object_alloc() is single-threaded, so when multiple threads are
creating files in a single filesystem, they spend a lot of time waiting
for the os_obj_lock. To improve performance of multi-threaded file
creation, we must make dmu_object_alloc() typically not grab any
filesystem-wide locks.
The solution is to have a "next object to allocate" for each CPU. Each
of these "next object"s is in a different block of the dnode object, so
that concurrent allocation holds dnodes in different dbufs. When a
thread's "next object" reaches the end of a chunk of objects (by default
4 blocks worth -- 128 dnodes), it will be reset to the per-objset
os_obj_next, which will be increased by a chunk of objects (128). Only
when manipulating the os_obj_next will we need to grab the os_obj_lock.
This decreases lock contention dramatically, because each thread only
needs to grab the os_obj_lock briefly, once per 128 allocations.
This results in a 70% performance improvement to multi-threaded object
creation (where each thread is creating objects in its own directory),
from 67,000/sec to 115,000/sec, with 8 CPUs.
Work sponsored by Intel Corp.
Authored by: Matthew Ahrens <[email protected]>
Reviewed-by: Ned Bass <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: Matthew Ahrens <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/8199
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/374
Closes #4703
Closes #6117
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- After some ZIL changes 6 years ago zil_slog_limit got partially broken
due to zl_itx_list_sz not updated when async itx'es upgraded to sync.
Actually because of other changes about that time zl_itx_list_sz is not
really required to implement the functionality, so this patch removes
some unneeded broken code and variables.
- Original idea of zil_slog_limit was to reduce chance of SLOG abuse by
single heavy logger, that increased latency for other (more latency critical)
loggers, by pushing heavy log out into the main pool instead of SLOG. Beside
huge latency increase for heavy writers, this implementation caused double
write of all data, since the log records were explicitly prepared for SLOG.
Since we now have I/O scheduler, I've found it can be much more efficient
to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE
to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG.
- Existing ZIL implementation had problem with space efficiency when it
has to write large chunks of data into log blocks of limited size. In some
cases efficiency stopped to almost as low as 50%. In case of ZIL stored on
spinning rust, that also reduced log write speed in half, since head had to
uselessly fly over allocated but not written areas. This change improves
the situation by offloading problematic operations from z*_log_write() to
zil_lwb_commit(), which knows real situation of log blocks allocation and
can split large requests into pieces much more efficiently. Also as side
effect it removes one of two data copy operations done by ZIL code WR_COPIED
case.
- While there, untangle and unify code of z*_log_write() functions.
Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing
block boundary, that may also improve efficiency if ZPL is made to do that.
Sponsored by: iXsystems, Inc.
Authored by: Alexander Motin <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Prakash Surya <[email protected]>
Reviewed by: Andriy Gapon <[email protected]>
Reviewed by: Steven Hartland <[email protected]>
Reviewed by: Brad Lewis <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Yao <[email protected]>
Ported-by: Giuseppe Di Natale <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/7578
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/aeb13ac
Closes #6191
|