| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out, no, in fact, ZERO_RANGE and PUNCH_HOLE do
have differing semantics in some ways - in particular,
one requires KEEP_SIZE, and the other does not.
Also added a zero-range test to catch this, corrected a flaw
that made the punch-hole test succeed vacuously, and a typo
in file_write.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes #13329
Closes #13338
|
|
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rich Ercolani <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Upstream-commit: 344bbc82e7054f61d5e7b3610b119820285fd2cb
Closes #12829
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When unlinking multiple files from a pool at 100% capacity, it was
possible for ENOSPC to be returned after the first unlink. e.g.
rm -f /mnt/fs/test1.0.0 /mnt/fs/test1.1.0 /mnt/fs/test1.2.0
rm: cannot remove '/mnt/fs/test1.1.0': No space left on device
rm: cannot remove '/mnt/fs/test1.2.0': No space left on device
After waiting for the pending deferred frees from the first unlink to
be processed the remaining files can then be unlinked. This is caused
by the quota limit in dsl_dir_tempreserve_impl() being temporarily
decreased to the allocatable pool capacity less any deferred free
space.
This is resolved using the existing mechanism of returning ERESTART
when over quota as long as we know enough space will shortly be
available after processing the pending deferred frees.
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13172
|
|
|
|
|
|
|
|
|
|
|
| |
In the CI environment it's possible for events to be slightly
delayed resulting in 4, instead of 5, events appearing in the
log file. This isn't a problem and should be considered a
success to avoid false positive test results.
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #12625
|
|
|
|
|
|
|
|
|
|
|
| |
Related to commit 90b77a036. Retry the `zpool export` if the pool
is "busy" indicating there is a process accessing the mount point.
This can happen after an import, allowing it to be retried will
avoid spurious test failures.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13169
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As explained by the disclaimer in the test case,
"This test can fail since nothing guarantees that old
MOS blocks aren't overwritten."
This behavior is expected and correct, but results in a
flaky test case which is problematic for the CI. The best
we can do to resolve this is to retry the sub-test which
failed when the MOS blocks have clearly been overwritten.
When testing failures were rare enough that a single retry
should normally be sufficient. However, we allow up to
five for good measure.
Reviewed by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13119
|
|
|
|
|
|
|
|
|
|
|
|
| |
As previously noted in #12272 the receive-o-x_props_override.ksh test
reliably fails on FreeBSD. Since we don't expect this test to pass
move the exception from the "maybe" to "known" section. This way we
don't retry the FAILED test when it is not expected to pass.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13167
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On FreeBSD pools are not allowed to be created using vdevs which are
backed by ZFS volumes. This configuration is not recommended for any
supported platform, nevertheless the largest_pool_001_pos.ksh test
case makes use of it as a convenience. This causes the test case to
fail reliably on FreeBSD. The layout is still tolerated on Linux
so only perform this test on Linux.
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed by: George Melikov <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13166
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with
ashift=12 results to failure when mounting pool2/encrypted (Input/Output
error). Notably, the opposite, raw sending from a greater ashift to a
lower one does not fail.
This happens because zio_compress_write() falsely checks only
ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in
encrypted raw send streams. In this case it rounds up the psize and if
not equal to the zio->io_size it modifies the block by zeroing out
the extra bytes. Because this happens in a SA attr. registration object
(type=46), the decryption fails upon mounting the filesystem, and zpool
status falsely reports an error.
Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT
before deciding whether to zero-pad a block.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes #13067
Closes #13074
|
|
|
|
|
|
|
|
|
|
| |
Related to commit 90b77a036. Retry the `zpool export` if the pool is
"busy" indicating there is a process accessing the mount point. This
can happen after an import and allowing it to be retried will avoid
spurious test failures.
Reviewed by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13092
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The dRAID section of the zpool_expand_001_pos test would reliably fail
because the calculated expansion size assumed the dRAID top-level vdev
was created with a distributed spare. Create the vdev as expected to
resolve the test failure.
This test case flaw was accidentally caused by changing the default
number of dRAID distributed spares from one to zero while dRAID was
being developed.
Additionally, remove zpool_expand_005_pos from the list of possible
faulty tests. It appears to be passing consistently in my testing.
Reviewed by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13091
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changing volmode may need to remove minors, which could be open, so
call udev_wait() before we "zfs set volmode=<value>". This ensures
no udev process has the zvol open (i.e. blkid) and the kernel
zvol_remove_minor_impl() function won't skip removing the in use
device.
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13075
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dmu_recv_begin_check() unconditionally sets the DS_HOLD_FLAG_DECRYPT
flag before calling dsl_dataset_hold_flags(). If the key on the
receiving side isn't loaded or the send stream contains embedded
blocks, the receive check fails for a stream which is perfectly
valid and could be received without any problem. This seems like
a remnant of the initial design, where unencrypted datasets below
encrypted ones weren't allowed.
Add a condition to set `DS_HOLD_FLAG_DECRYPT` only for encrypted
datasets, modify an existing test to detect this regression and add
a test for raw replication streams.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Amanakis <[email protected]>
Co-authored-by: George Amanakis <[email protected]>
Signed-off-by: Attila Fülöp <[email protected]>
Closes #13033
Closes #13076
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 5.16 moved XSTATE_XSAVE and XSTATE_XRESTORE out of our reach,
so add our own XSAVE{,OPT,S} code and use it for Linux 5.16.
Please note that this differs from previous behavior in that it
won't handle exceptions created by XSAVE an XRSTOR. This is sensible
for three reasons.
- Exceptions during XSAVE and XRSTOR can only occur if the feature
is not supported or enabled or the memory operand isn't aligned
on a 64 byte boundary. If this happens something else went
terribly wrong, and it may be better to stop execution.
- Previously we just printed a warning and didn't handle the fault,
this is arguable for the above reason.
- All other *SAVE instruction also don't handle exceptions, so this
at least aligns behavior.
Finally add a test to catch such a regression in the future.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Attila Fülöp <[email protected]>
Closes #13042
Closes #13059
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Basenames that remain, in cmd/zed/zed.d/statechange-led.sh:
dev=$(basename "$(echo "$therest" | awk '{print $(NF-1)}')")
vdev=$(basename "$ZEVENT_VDEV_PATH")
I don't wanna interfere with #11988
scripts/zfs-tests.sh:
SINGLETESTFILE=$(basename "$SINGLETEST")
tests/zfs-tests/tests/functional/cli_user/zfs_list/zfs_list.kshlib:
ACTUAL=$(basename $dataset)
ACTUAL=$(basename $dataset)
tests/zfs-tests/tests/functional/cli_user/zpool_iostat/
zpool_iostat_-c_homedir.ksh:
typeset USER_SCRIPT=$(basename "$USER_SCRIPT_FULL")
tests/zfs-tests/tests/functional/cli_user/zpool_iostat/
zpool_iostat_-c_searchpath.ksh:
typeset CMD_1=$(basename "$SCRIPT_1")
typeset CMD_2=$(basename "$SCRIPT_2")
tests/zfs-tests/tests/functional/cli_user/zpool_status/
zpool_status_-c_homedir.ksh:
typeset USER_SCRIPT=$(basename "$USER_SCRIPT_FULL")
tests/zfs-tests/tests/functional/cli_user/zpool_status/
zpool_status_-c_searchpath.ksh
typeset CMD_1=$(basename "$SCRIPT_1")
typeset CMD_2=$(basename "$SCRIPT_2")
tests/zfs-tests/tests/functional/migration/migration.cfg:
export BNAME=`basename $TESTFILE`
tests/zfs-tests/tests/perf/perf.shlib:
typeset logbase="$(get_perf_output_dir)/$(basename \
tests/zfs-tests/tests/perf/perf.shlib:
typeset logbase="$(get_perf_output_dir)/$(basename \
These are potentially Of Directories, where basename is actually
useful
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12652
|
|
|
|
|
|
|
|
|
|
|
| |
The on-disk cost of creating a snapshot or bookmark is sufficiently low
that it is difficult to make it reliably fail even when the pool is
"full". In order to avoid false positives remove these two checks from
the test case.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13060
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
POSIX requires that set-uid and set-gid bits to be removed when an
unprivileged user writes to a file and ZFS does that during normal
operation.
The problem arrises when the write is stored in the ZIL and replayed.
During replay we have no access to original credentials of the process
doing the write, so zfs_write() will be performed with the root
credentials. When root is doing the write set-uid and set-gid bits
are not removed from the file.
To correct that, log a separate TX_SETATTR entry that removed those bits
on first write to such file.
Idea from: Christian Schwarz
Add test for ZIL replay of setuid/setgid clearing.
Improve various edge cases when clearing setid bits:
- The setid bits can be readded during a single write, so make sure to check
for them on every chunk write.
- Log TX_SETATTR record at most once per transaction group (if the setid bits
are keep coming back).
- Move zfs_log_setattr() outside of zp->z_acl_lock.
Reviewed-by: Dan McDonald <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Co-authored-by: Christian Schwarz <[email protected]>
Signed-off-by: Pawel Jakub Dawidek <[email protected]>
Closes #13027
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds enumerated names to disambiguate between the
different vdevs. Previously only 'zpool status' showed enumerated
vdev names, now 'zpool list -v' and 'zpool iostat -v' also shows
the enumerated vdev names.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Dipak Ghosh <[email protected]>
Signed-off-by: Akash B <[email protected]>
Closes #12510
Closes #13031
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In files created/modified before 4254acb there may be a corruption of
xattrs which is not reported during scrub and normal send/receive. It
manifests only as an error when raw sending/receiving. This happens
because currently only the raw receive path checks for discrepancies
between the dnode bonus length and the spill pointer flag.
In case we encounter a dnode whose bonus length is greater than the
predicted one, we should report an error. Modify in this regard
dnode_sync() with an assertion at the end, dump_dnode() to error out,
dsl_scan_recurse() to report errors during a scrub, and zstream to
report a warning when dumping. Also added a test to verify spill blocks
are sent correctly in a raw send.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes #12720
Closes #13014
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow up commit for e03a41a60 which aimed to resolve
this same test failure. The core "problem" here is that it takes
very little space to perform a clone/snapshot/bookmark, which
means if we want these commands to reliably fail the pool must
truely have exhausted all free space.
This commit increases the number of fill iterations to try and
consume every block which we can. This still can't guarantee
the clone/snapshot/bookmark will fail, but it significantly
improves the odds. The exception was kept since it's still
not a sure thing.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #12903
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Under Linux when rolling back a mounted filesystem negative dentries
may not be dropped from the cache. This can result in an ENOENT
being incorrectly returned on first access. Issuing a `df` before
the unmount results in the negative dentries being invalidated and
side steps the issue.
This is solely a workaround for the test case on Linux and not
correct behavior. The core issue of invalidating negative dentries
needs to be handled with a kernel side change. This is being
tracked as issue #6143.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #12898
Issue #6143
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following test cases may still occasionally fail and are being
added to the "maybe" list for Linux until they can be updated to be
entirely reliable.
cli_root/zfs_rename/zfs_rename_002_pos.ksh
cli_root/zpool_reopen/zpool_reopen_003_pos.ksh
refreserv/refreserv_raidz
These 6 tests consistently fail only on Fedora 31+, the failures
are related to the kernel rescanning the partition table on loopback
devices which is no longer reliable unless partprobe is used. In
order to enable the Fedora bot by default they are also being added
to the list until the tests can be updated. Any significant regression
in functionality covered by these tests will still be detected by the
FreeBSD builders.
alloc_class/alloc_class_009_pos
alloc_class/alloc_class_010_pos
cli_root/zpool_expand/zpool_expand_001_pos
cli_root/zpool_expand/zpool_expand_005_pos
rsend/rsend_007_pos
rsend/rsend_010_pos
rsend/rsend_011_pos
snapshot/rollback_003_pos
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #10489
|
|
|
|
|
|
|
|
|
|
|
|
| |
It keeps failing, on changes which aren't related at all.
So until someone runs down why, I'd like it to stop being the
sole reason for CI failures.
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes #12733
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the following test failures to the exception list for FreeBSD
to ensure we notice new unexpected failures.
pool_checkpoint/checkpoint_big_rewind
pool_checkpoint/checkpoint_indirect
And the following for Linux.
zvol/zvol_misc/zvol_misc_snapdev
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #12621
Issue #12622
Issue #12623
Closes #12624
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The zvol_misc tests, in particular zvol_misc_volmode, make use of a
common udev_wait function to wait for zvol devices in /dev to quiesce
on Linux. On other platforms this function currently only sleeps for
one second before returning. This is insufficient, and
zvol_misc_volmode has been flaky on FreeBSD as a result.
Replace udev_wait with block_device_wait, passing through the optional
device parameter where possible. Rearrange a few checks to strengthen
the verifications we are making and avoid unnecessarily sleeping. We
must keep udev_wait in a couple places to pass in Github CI workflows.
Remove zvol_misc_volmode from the maybe failing tests on FreeBSD in
zts-report.py.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #12583
|
|
|
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Ka Ho Ng <[email protected]>
Sponsored-by: The FreeBSD Foundation
Closes #12458
|
|
|
|
|
|
|
|
|
|
| |
The rerefreserv_raidz test was failing on Linux because the sync being
issued doesn't guarantee a pool sync. Switch to using the sync_pool
function and remove the ZTS exception for Linux.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #12897
|
|
|
|
|
|
|
|
| |
The build on musl needs linux/fs.h for SEEK_DATA and friends,
and sys/sysmacros.h for P2ROUNDUP. Add the needed headers.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Georgy Yakovlev <[email protected]>
Closes #12891
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With some minor tweaks several of rsend tests can be sped up
considerably without significantly reducing test coverage.
* send-c_verify_ratio: ~120s -> ~60s
* send_realloc_*_files: ~330s -> ~65s
For the send_realloc* tests this also has the advantage of removing
(most of) the linux/freebsd conditional logic. Note that for this
test more passes, and thus more incremental send/recvs, are preferable
to a larger number of files.
Total run time of the rsend test group was reduced from roughly 20 to
11 minutes in an environment similar to what's used by the CI.
Reviewed-by: Tony Nguyen <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #12876
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The rsend_007_pos test reliably fails on Linux in the cleanup
function. This is caused by an unmount error when attempting to
recursively destroy the newly received datasets. Invoking `df`
prior to the `zfs destroy` interestingly avoids the unmont error.
Why this should matter is unclear and should be investigated.
However, this minor tweak may allow us to remove the ZTS rsend
exceptions. The subsequent rsend_010_pos and rsend_011_pos
failures were a result of this initial failure. The other
"maybe" failures I was unable to reproduce and have not been
recently observed in the master branch.
Reviewed-by: Tony Nguyen <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5665
Closes #6086
Closes #6087
Closes #6446
Closes #12876
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for http and https to the keylocation properly to
allow encryption keys to be fetched from the specified URL.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Issue #9543
Closes #9947
Closes #11956
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Issue: #11956
Closes #11976
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The alloc_class_* tests may fail on Linux with an EBUSY error if
`zfs destroy` is run before the `dd` process has had a chance to
terminate. Wait on the pid after the `kill -9` to make sure.
When testing I didn't observe any failures for the alloc_class
tests. Remove them from the exceptions list, the CI was used to
verify the tests pass on all platforms.
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Rich Ercolani <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #12873
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately, #11445 means while we fail gracefully now, we still
fail, unless people want to implement a complex workaround just to
support /dev/null.
So let's just use the cheap workaround in a test for now.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes #12872
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The zpool_reopen_[1-5] tests are failing Fedora 35 with:
zpool_reopen_001_pos.ksh[64]: log_must[67]: log_pos[270]:
wait_for_resilver_end[98]: wait_for_action: line 71: func: is read only
Renaming 'func' -> 'funct' fixes the issue.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #12871
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Raw receiving a snapshot back to the originating dataset is currently
impossible because of user accounting being present in the originating
dataset.
One solution would be resetting user accounting when raw receiving on
the receiving dataset. However, to recalculate it we would have to dirty
all dnodes, which may not be preferable on big datasets.
Instead, we rely on the os_phys flag
OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is
incomplete when raw receiving. Thus, on the next mount of the receiving
dataset the local mac protecting user accounting is zeroed out.
The flag is then cleared when user accounting of the raw received
snapshot is calculated.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes #12981
Closes #10523
Closes #11221
Closes #11294
Closes #12594
Issue #11300
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Verify that all empty sectors are zero filled before using them to
calculate parity. Failure to do so can result in incorrect parity
columns being generated and written to disk if the contents of an
empty sector are non-zero. This was possible because the checksum
only protects the data portions of the buffer, not the empty sector
padding.
This issue has been addressed by updating raidz_parity_verify() to
check that all dRAID empty sectors are zero filled. Any sectors
which are non-zero will be fixed, repair IO issued, and a checksum
error logged. They can then be safely used to verify the parity.
This specific type of damage is unlikely to occur since it requires
a disk to have silently returned bad data, for an empty sector, while
performing a scrub. However, if a pool were to have been damaged
in this way, scrubbing the pool with this change applied will repair
both the empty sector and parity columns as long as the data checksum
is valid. Checksum errors will be reported in the `zpool status`
output for any repairs which are made.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Mark Maybee <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #12857
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The import_rewind_device_replaced.ksh test was never entirely reliable
because it depends on MOS data not being overwritten. The MOS data is
not protected by the snapshot so occasional failures were always
expected. However, this test is now failing reliably on all platforms
indicating something has changed in the code since the test was marked
"maybe". Convert the test to a "known" failure until the root cause
is identified and resolved.
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #12821
|
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12187
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to a possible lock inversion the zvol open call path on Linux
needs to be able to retry in the case where the spa_namespace_lock
cannot be acquired.
For Linux 5.12 an older kernel this was accomplished by returning
-ERESTARTSYS from zvol_open() to request that blkdev_get() drop
the bdev->bd_mutex lock, reaquire it, then call the open callback
again. However, as of the 5.13 kernel this behavior was removed.
Therefore, for 5.12 and older kernels we preserved the existing
retry logic, but for 5.13 and newer kernels we retry internally in
zvol_open(). This should always succeed except in the case where
a pool's vdev are layed on zvols, in which case it may fail. To
handle this case vdev_disk_open() has been updated to retry when
opening a device when -ERESTARTSYS is returned.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #12301
Closes #12759
|
|
|
|
|
|
|
|
|
|
|
| |
With the addition of functionality to rerun failing tests, some
tests that fail only sometimes still fail often enough to degrade
the reliability of the sanity runs. Remove them from the runfile
until they reliably pass.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Signed-off-by: John Kennedy <[email protected]>
Closes #12814
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was a project proposed as part of the Quality theme for the
hackthon for the 2021 OpenZFS Developer Summit. The idea is to improve
the usability of the automated tests that get run when a PR is created
by having failing tests automatically rerun in order to make flaky
tests less impactful.
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes #12740
|
|
|
|
|
|
|
|
|
|
|
| |
This test case may fail on 5.13 and newer Linux kernels if the
/dev/zvol/ device is not created by udev.
Reviewed-by: Rich Ercolani <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #12301
Closes #12738
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using lseek(2) to report data/holes memory mapped regions of
the file were ignored. This could result in incorrect results.
To handle this zfs_holey_common() was updated to asynchronously
writeback any dirty mmap(2) regions prior to reporting holes.
Additionally, while not strictly required, the dn_struct_rwlock is
now held over the dirty check to prevent the dnode structure from
changing. This ensures that a clean dnode can't be dirtied before
the data/hole is located. The range lock is now also taken to
ensure the call cannot race with zfs_write().
Furthermore, the code was refactored to provide a dnode_is_dirty()
helper function which checks the dnode for any dirty records to
determine its dirtiness.
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Rich Ercolani <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #11900
Closes #12724
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When cleaning up a test case standardize on using the convention:
datasetexists $ds && destroy_dataset $ds <flags>
By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure
that the destroy is retried in the event that a ZFS volume is busy.
This helps ensures ensure tests are fully cleaned up and prevents false
positive test failures on Linux.
Note that all of the tests which used 'zfs destroy' in cleanup have
been updated even if they don't use volumes. This was done to
clearly establish the expected convention.
Reviewed-by: Rich Ercolani <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #12663
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow up patch for PR #12515 which addresses some
additional ZTS tests which are unreliable are should explicitly
wait for the required zvols to be available.
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: @Theo13111
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #12553
|
|
|
|
|
|
|
|
|
|
| |
Issue #11854 has been resolved, so we can remove the exceptions for it.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #12527
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ZTS block_device_wait helper function should use -e when waiting
for a file to appear since it will be either a block special device
or a symlink. This didn't cause any failures but when a device path
was specified the function would wait longer than needed.
Additionally update the most flakey test cases to pass the file path
to block_device_wait to try and improve the test reliability. The
udev behavior on Fedora in particular can result in frequent false
positives.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #12515
|
|
|
|
|
|
|
|
|
| |
Reviewed-by: Tony Nguyen <[email protected]>
Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #12432
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The redacted_send tests make use of a $tmpdir variable, except in
redacted_send/redacted_panic the variable is never defined.
Use $TEST_BASE_DIR instead.
Clean up the stream file after the test.
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #12455
|