| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add -h switch to zfs send command to send dataset holds. If
holds are present in the stream, zfs receive will create them
on the target dataset, unless the zfs receive -h option is used
to skip receive of holds.
Reviewed-by: Alek Pinchuk <[email protected]>
Reviewed-by: loli10K <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Signed-off-by: Paul Zuchowski <[email protected]>
Closes #7513
|
|
|
|
|
|
|
|
|
|
|
| |
The 4.20 kernel changed the meaning of the rw_semaphore.owner bits,
causing an assertion when loading the module under the 4.20 kernel.
This patch fixes the issue.
Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #8360
Closes #8389
|
|
|
|
|
|
|
|
| |
5d43cc9a59 renamed it to rangelock_enter().
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8408
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
with builds on illumos based platform we can see build issue
because label_t has been redefined.
for reduce build issues on others platforms we should rename
label_t to zdb_label_t.
Reviewed-by: loli10K <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Igor Kozhukhov <[email protected]>
Closes #8397
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deletion throttle currently does not account for holes in a file.
This means that it can activate when it shouldn't.
To fix it we switch the throttle to be based on the number of
L1 blocks we will have to dirty when freeing
Reviewed by: Tom Caputi <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alek Pinchuk <[email protected]>
Closes #7725
Closes #7888
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is an async implementation of the existing sync
zfs_unlinked_drain() function. This function is called at mount time and
is responsible for freeing znodes that we didn't get to freeing before.
We don't have to hold mounting of the dataset until the unlinked list is
fully drained as is done now. Since we can process the unlinked set
asynchronously this results in a better user experience when mounting a
dataset with entries in the unlinked set.
Reviewed by: Jorgen Lundman <[email protected]>
Reviewed by: Tom Caputi <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Signed-off-by: Alek Pinchuk <[email protected]>
Closes #8142
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Initially, metaslabs and space maps used to be the same thing
in ZFS. Later, we started differentiating them by referring
to the space map as the on-disk state of the metaslab, making
the metaslab a higher-level concept that is metadata that deals
with space accounting. Today we've managed to split that code
furthermore, with the space map being its own on-disk data
structure used in areas of ZFS besides metaslabs (e.g. the
vdev-wide space maps used for zpool checkpoint or vdev removal
features).
This patch refactors the space map code to further split the
space map code from the metaslab code. It does so by getting
rid of the idea that the space map can have a different in-core
and on-disk length (sm_length vs smp_length) which is something
that is only used for the metaslab code, and other consumers
of space maps just have to deal with. Instead, this patch
introduces changes that move the old in-core length of the
metaslab's space map to the metaslab structure itself (see
ms_synced_length field) while making the space map code only
care about the actual space map's length on-disk.
The result of this is that space map consumers no longer have
to deal with syncing two different lengths for the same
structure (e.g. space_map_update() goes away) while metaslab
specific behavior stays within the metaslab code. Specifically,
the ms_synced_length field keeps track of the amount of data
metaslab_load() can read from the metaslab's space map while
working concurrently with metaslab_sync() that may be
appending to that same space map.
As a side note, the patch also adds a few comments around
the metaslab code documenting some assumptions and expected
behavior.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Pavel Zakharov <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #8328
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zfs create, receive and rename can bypass this hierarchy rule. Update
both userland and kernel module to prevent this issue and use pyzfs
unit tests to exercise the ioctls directly.
Note: this commit slightly changes zfs_ioc_create() ABI. This allow to
differentiate a generic error (EINVAL) from the specific case where we
tried to create a dataset below a ZVOL (ZFS_ERR_WRONG_PARENT).
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: loli10K <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to an off-by-one condition in spa_preferred_class() we are picking
the "normal" allocation class instead of the "special" one for file
blocks with size equal to the special_small_blocks property value.
This change fix the small code issue, update the ZFS Test Suite and the
zfs(8) man page.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8351
Closes #8361
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-factor arc_read() to better account for embedded data blkptrs.
Previously, reading the payload from an embedded blkptr would cause
arcstats such as demand_metadata_misses to be bumped when there was
actually no cache "miss" because the data are already available in
the blkptr.
The following test procedure was used to demonstrate the problem:
zpool create tank ...
zfs create -o compression=lz4 tank/fs
echo blah > /tank/fs/blah
stat /tank/fs/blah
grep 'meta.*mis' /proc/spl/kstat/zfs/arcstats
and repeating the last two steps to watch the metadata miss counter
increment. This can also be demonstrated via the zfs_arc_miss DTRACE4
probe in arc_read().
Reviewed-by: loli10K <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #8319
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change makes additions to the ZFS test suite that allows the
performance tests to run over NFS. The test is run and performance data
collected from the server side, while IO is generated on the NFS client.
This has been tested with Linux and illumos NFS clients.
Authored by: Ahmed Ghanem <[email protected]>
Reviewed by: Dan Kimmel <[email protected]>
Reviewed by: John Kennedy <[email protected]>
Reviewed by: Kevin Greene <[email protected]>
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: John Kennedy <[email protected]>
Signed-off-by: John Kennedy <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/9185
Closes #8367
|
|
|
|
|
|
|
|
|
|
|
| |
This change simply documents the missing -d (dump contents) option in
zstreamdump(8).
Reviewed-by: bunder2015 <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8369
|
|
|
|
|
|
|
|
|
|
| |
note: which is non-standard. Use builtin 'command -v' instead. [SC2230]
note: Use -n instead of ! -z. [SC2236]
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: bunder2015 <[email protected]>
Closes #8367
|
|
|
|
|
|
|
|
|
|
| |
F632 use ==/!= to compare str, bytes, and int literals
Reviewed-by: Håkan Johansson <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: bunder2015 <[email protected]>
Closes #8368
|
|
|
|
|
|
|
|
|
|
|
| |
The zpool iostat latency histograms (-w) has column names
'sync_queue' and 'async_queue', which do not match the man page, nor
the equivalent columns in average latency. Change the column
names to be 'syncq_wait' and 'asyncq_wait' to be consistent.
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #8338
|
|
|
|
|
|
|
|
|
|
| |
Get rid of the majority metaslab metadata when removing log vdevs
in spa_vdev_remove_log() with a call to metaslab_fini() instead
of duplicating a lot of that in vdev_remove_empty_log().
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #8347
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current L2 ARC device code consistently uses psize to
increment vs_alloc but varies between psize and lsize when
decrementing it. The result of this behavior is that
vs_alloc can be decremented more that it is incremented
and underflow. This patch changes the code so asize is
used anywhere.
In addition, it ensures that vs_alloc gets incremented by
the L2 ARC device code as buffers are written and not at
the end of the l2arc_write_buffers() routine. The latter
(and old) way would temporarily underflow vs_alloc as
buffers that were just written, would be destroyed while
l2arc_write_buffers() was still looping.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #8298
|
|
|
|
|
|
|
|
|
|
|
|
| |
Address a deadlock caused by simultaneous wakeup and cancel on a zthr
by remove the hold of zthr_request_lock from zthr_wakeup. This
allows thr_wakeup to not block a thread that is in the process of
being cancelled.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Sara Hartse <[email protected]>
Closes #8333
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the point of -L option in zdb is to disable leak
tracing and the loading of space maps because they are expensive,
yet still do leak detection in terms of space. Unfortunately,
there is a scenario where this is a lie. If we are using zdb -L
on a pool where a vdev is being removed, zdb_claim_removing()
will open the metaslab space maps of that device.
This patch makes it so zdb -L skips leak detection altogether
and ensures that no space maps are loaded.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #8335
|
|
|
|
|
|
|
|
|
| |
Exclude test-runner.py from the rpmbuild shebang check to allow it to
run under Python 2 and 3.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #8331
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC 9.0 is complaining because we're trying to print strings that
are defined like this:
.zo_pool = { 'z', 't', 'e', 's', 't', '\0' },
Fix them by making them actual strings.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #8330
|
|
|
|
|
|
|
|
|
|
|
| |
The Linux 5.0 kernel updated the bio_set_dev() macro so it calls the
GPL-only bio_associate_blkg() symbol thus inadvertently converting
the entire macro. Provide a minimal version which always assigns the
request queue's root_blkg to the bio.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8287
|
|
|
|
|
|
|
|
|
|
| |
The 5.0 kernel no longer exports the functions we need to do vector
(SSE/SSE2/SSE3/AVX...) instructions. Disable vector-based checksum
algorithms when building against those kernels.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #8259
|
|
|
|
|
|
|
|
|
| |
SUBDIRs has been deprecated for a long time, and was finally removed in
the 5.0 kernel. Use "M=" instead.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #8257
|
|
|
|
|
|
|
|
|
|
|
| |
In the 5.0 kernel, only the mount namespace code should use the MS_*
macos. Filesystems should use the SB_* ones.
https://patchwork.kernel.org/patch/10552493/
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #8264
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
totalram_pages() was converted to an atomic variable in 5.0:
https://patchwork.kernel.org/patch/10652795/
Its value should now be read though the totalram_pages() helper
function.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #8263
|
|
|
|
|
|
|
|
| |
access_ok no longer needs a 'type' parameter in the 5.0 kernel.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #8261
|
|
|
|
|
|
|
|
|
| |
Newer kernels remove current_kernel_time64(). Use
ktime_get_coarse_real_ts64() in its place.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #8258
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
= Old behavior
For vdev sizes 100GB to 50TB we keep ~200 metaslabs per
vdev and the metaslab size grows from 512MB to 256GB.
For vdev's bigger than that we start increasing the
number of metaslabs until we hit the 128K limit.
= New Behavior
For vdev sizes 100GB to 3TB we keep ~200 metaslabs per
vdev and the metaslab size grows from 512MB to 16GB.
For vdev's bigger than that we start increasing the
number of metaslabs until we hit the 128K limit.
= Reasoning
The old behavior makes metaslabs grow in size when
the vdev range is between 3TB (ms_size 16GB) and
32PB (ms_size 256GB). Even though keeping the number
of metaslabs is good in terms of potential number of
I/Os per TXG, these bigger metaslabs take longer
to be loaded and after they are loaded they can
take up a lot of memory because of their range trees.
This change tries to put a boundary in memory and
loading time for the specific range of vdev sizes.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #8324
|
|
|
|
|
|
|
|
|
|
|
|
| |
The range_tree_verify function looks for a segment in a
range tree and panics if the segment is present on the
tree. This patch gives the function a more descriptive
name.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #8327
|
|
|
|
|
|
|
|
|
| |
This allows the spa config refcounts to use tracking in debug builds
without triggering the "No such hold %p on refcount" panic.
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #8326
|
|
|
|
|
|
|
|
|
|
|
| |
If you try to get the userspace, groupspace or projectspace on a ZVOL,
the generated error results in passing EINVAL to
zfs_standard_error_fmt() when we should return a specific error to
inform the user that those properties aren't available on volumes.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Tom Caputi <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8279
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When `zpool iostat` fills the terminal the headers should be
printed again. `zpool iostat -n` can be used to suppress this.
If the command is not attached to a tty, headers will not be
printed so as to not break existing scripts.
Reviewed-by: Joshua M. Clulow <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Damian Wojsław <[email protected]>
Closes #8235
Closes #8262
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, zvol_rename_minors_impl() calls kmem_asprintf()
to allocate and initialize a string. This function is a thin
wrapper around the kernel's kvasprintf() and does not call
into the SPL's kmem tracking code when it is enabled. However,
this function frees the string with the tracked kmem_free()
instead of the untracked strfree(), which causes the SPL
kmem tracking code to believe that the function is attempting
to free memory it never allocated, triggering an ASSERT. This
patch simply corrects this issue.
Reviewed by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8307
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since d8fdfc2 was integrated dsl_pool_create() does not call
dmu_objset_create_impl() for the root dataset when running in
userland (ztest): this creates a pool with a partially initialized
root dataset. Trying to import and use this pool results in both
zpool and zfs executables dumping core.
Fix this by adopting an alternative change suggested in OpenZFS 8607
code review.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Tom Caputi <[email protected]>
Original-patch-by: Robert Mustacchi <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8277
|
|
|
|
|
|
|
|
|
|
|
|
| |
This check provides no real additional protection and unnecessarily
introduces a dependency on the "oops_in_progress" kernel symbol.
Remove the check, it there are special circumstances on other
platforms which make this a requirement it can be reintroduced
for all relevant call paths in a more portable comprehensive manor.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8297
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most callers that need to operate on a loaded metaslab, always
call metaslab_load_wait() before loading the metaslab just in
case someone else is already doing the work.
Factoring metaslab_load_wait() within metaslab_load() makes the
later more robust, as callers won't have to do the load-wait
check explicitly every time they need to load a metaslab.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #8290
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, when a DRR_OBJECT record is read into memory in
receive_read_record(), memory is allocated for the bonus buffer.
However, if the object doesn't have a bonus buffer the code will
still "allocate" the zero bytes, but the memory will not be passed
to the processing thread for cleanup later. This causes the spl
kmem tracking code to report a leak. This patch simply changes the
code so that it only allocates this memory if it has a non-zero
length.
Reviewed by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8266
|
|
|
|
|
|
|
|
|
| |
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #8299
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On some Linux distributions, the kernel module build will not
default to building with debuginfo symbols, which can make it
difficult for debugging and testing.
For this case, we provide a flag to override the build to force
debuginfo to be produced for the kernel module build.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Co-authored-by: Neal Gompa <[email protected]>
Co-authored-by: Simon Watson <[email protected]>
Signed-off-by: Neal Gompa <[email protected]>
Signed-off-by: Simon Watson <[email protected]>
Closes #8304
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trying to set user properties with their length 1 byte shorter than the
maximum size triggers an assertion failure in zap_leaf_array_create():
panic[cpu0]/thread=ffffff000a092c40:
assertion failed: num_integers * integer_size < (8<<10) (0x2000 < 0x2000), file: ../../common/fs/zfs/zap_leaf.c, line: 233
ffffff000a092500 genunix:process_type+167c35 ()
ffffff000a0925a0 zfs:zap_leaf_array_create+1d2 ()
ffffff000a092650 zfs:zap_entry_create+1be ()
ffffff000a092720 zfs:fzap_update+ed ()
ffffff000a0927d0 zfs:zap_update+1a5 ()
ffffff000a0928d0 zfs:dsl_prop_set_sync_impl+5c6 ()
ffffff000a092970 zfs:dsl_props_set_sync_impl+fc ()
ffffff000a0929b0 zfs:dsl_props_set_sync+79 ()
ffffff000a0929f0 zfs:dsl_sync_task_sync+10a ()
ffffff000a092a80 zfs:dsl_pool_sync+3a3 ()
ffffff000a092b50 zfs:spa_sync+4e6 ()
ffffff000a092c20 zfs:txg_sync_thread+297 ()
ffffff000a092c30 unix:thread_start+8 ()
This patch simply corrects the assertion.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8278
|
|
|
|
|
|
|
|
|
|
|
|
| |
The point of this refactoring is to break the high-level conceptual
steps of spa_sync() to their own helper functions. In general large
functions can enhance readability if structured well, but in this
case the amount of conceptual steps taken could use the help of
helper functions.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #8293
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By design ztest will never inject non-repairable damage in to the
pool. Update the ztest_scrub() test case such that it waits for
the scrub to complete and verifies the pool is always repairable.
After enabling scrub verification two scenarios were encountered
which are the result of how ztest manages failure injection.
The first case is straight forward and pertains to detaching a
mirror vdev. In this case, the pool must always be scrubbed prior
the detach. Failure to do so can potentially lock in previously
repairable data corruption by removing all good copies of a block
leaving only damaged ones.
The second is a little more subtle. The child/offset selection
logic in ztest_fault_inject() depends on the calculated number of
leaves always remaining constant between injection passes. This
is true within a single execution of ztest, but when using zloop.sh
random values are selected for each restart. Therefore, when ztest
imports an existing pool it must be scrubbed before failure injection
can be safely enabled. Otherwise it is possible that it will inject
non-repairable damage.
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Tom Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8269
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the functions dbuf_prefetch_indirect_done() and
dmu_assign_arcbuf_by_dnode() assume that dbuf_hold_level() cannot
fail. In the event of an error the former will cause a NULL pointer
dereference and the later will trigger a VERIFY. This patch adds
error handling to these functions and their callers where necessary.
Reviewed by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8291
|
|
|
|
|
|
|
|
|
| |
The following fields from the vdev_t struct are not used anywhere.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #8285
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ztest_ddt_repair() test is designed inflict damage to the
ddt which can be repairable by a scrub. Unfortunately, this
repair logic was broken at some point and it went undetected.
This issue is not specific to ztest, but thankfully this extra
redundancy is rarely enabled and even more rarely needed.
The root cause was identified to be the ddt_bp_create()
function called by dsl_scan_ddt_entry() which did not set the
dedup bit of the generated block pointer.
The consequence of this was that the ZIO_DDT_READ_PIPELINE was
never enabled for the block pointer during the scrub, and the
dedup ditto repair logic was never run. Note that for demand
reads which don't rely on ddt_bp_create() the required pipeline
stages would be enabled and the repair performed.
This was resolved by unconditionally setting the dedup bit in
ddt_bp_create(). This way all codes paths which may need to
perform a repair from a block pointer generated from the dtt
entry will be able too. The only exception is that the dedup
bit is cleared in ddt_phys_free() which is required to avoid
leaking space.
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Tom Caputi <[email protected]>
Reviewed by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8270
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the new spacemap encoding was ported to ZoL that's no longer
a limitation. This patch updates vdev_is_spacemap_addressable()
that was performing that check.
It also updates the appropriate test to ensure that the same
functionality is tested. The test does so by creating pools that
don't have the new spacemap encoding enabled - just the checkpoint
feature. This patch also reorganizes that same tests in order to
cut in half its memory consumption.
Reviewed by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #8286
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Increase the default allowed number of reconstruction attempts.
There's not an exact right number for this setting. It needs
to be set large enough to cover any realistic failure scenarios
and small enough to avoid stalling the IO pipeline and invoking
the dead man detection.
The current value of 256 was empirically determined to be too
low based on multi-day runs of ztest. The fault injection code
would inject more damage than could be reconstructed given the
relatively small number of attempts. However, in all observed
cases the block could be reconstructed using a slightly higher
limit.
Based on local testing increasing the default value to 4096 was
determined to strike the best balance. Checking all combinations
takes less than 10s in the worst case, and has so far eliminated
the vast majority of false positives detected by ztest. This
delay is roughly on par with how long retries may be performed
to a misbehaving HDD and was deemed to be reasonable. Better to
err on the side of a brief delay rather than fail to reconstruct
the data.
Lastly, the -Y flag has been added to zdb to make it easy to try all
possible combinations when performing split block reconstruction.
For badly damaged blocks with 18 splits, they can be fully enumerated
within a few minutes. This has been done to ensure permanent errors
are never incorrectly reported when ztest verifies the pool with zdb.
Reviewed by: Tom Caputi <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8271
|
|
|
|
|
|
|
|
|
|
| |
This patch exports and re-imports the pool when these tests are
analyzed with zdb to get consistent results.
Reviewed by: Igor Kozhukhov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #8292
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The implementation of 'zfs remap' has proven to be problematic since
it modifies the objset (but not its logical contents) by dirtying
metadata without owning it. The consequence of which is that
dmu_objset_remap_indirects() is vulnerable to certain races.
For example, if we are in the middle of receiving into the filesystem
while it is being remapped. Then it is possible we could evict the
objset when the receive completes (see dsl_dataset_clone_swap_sync_impl,
or dmu_recv_end_sync), but dmu_objset_remap_indirects() may be still
using the objset. The result of which would be a panic.
Extended runs of ztest(8) have exposed other possible races which
can occur when using 'zfs remap'. Several of these have been fixed
but there may be others which have not yet been encountered and
diagnosed.
Furthermore, the ability to manually remap a filesystem is no longer
particularly useful now that the removal code can map large chunks.
Coupled with the fact that explaining what this command does and why
it may be useful requires a detailed understanding of the internals
of device removal. These are details users should not be bothered
with.
Therefore, the 'zfs remap' command is being disabled but not entirely
removed. It may be removed in the future or potentially reworked
to address the issues described above. Since 'zfs remap' has never
been part of a tagged release its removal is expected to have
minimal impact.
The ZTS tests have been updated to continue to exercise the command
to prevent atrophy, but it has been removed entirely from ztest(8).
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Tom Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8238
|