| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.
We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #7151
Closes #8910
Closes #9123
Closes #9145
|
|
|
|
|
|
|
|
|
|
|
| |
Defining a special constant to make an infinite loop is excessive,
especially when the name clashes with symbols commonly defined on
some platforms (ie FreeBSD).
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: John Kennedy <[email protected]
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9219
|
|
|
|
|
|
|
|
|
|
|
|
| |
The double-colon looked like a typo, but it's actually an obscure
feature. Rules with :: may appear multiple times and are run
independently of one another in the order they appear. The use of ::
for distclean-local was conventional, not accidental.
Add comments to indicate the intentional use of double-colon rules.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9210
|
|
|
|
|
|
|
|
|
|
| |
Other than this test, zpool list -p is not well tested by any of the
automated tests. Add a test for zpool list -p.
Reviewed-by: Prakash Surya <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes #9134
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit a887d653 updated the dbufstats such that escalated privileges
are required. Since all tests under cli_user are run with normal
privileges move this test case to a location where it will be run
required privileges.
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Michael Niewöhner <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9118
Closes #9196
|
|
|
|
|
|
|
|
|
|
|
| |
The slog tests fail when attempting to create pools using file vdevs
that already exist from previous test runs. Remove these files in the
setup for the test.
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9194
|
|
|
|
|
|
|
|
|
|
|
|
| |
The mdb_set_uint32 function requires that the values passed in be
decimal. This was overlooked initially because the matching Linux
function accepts both decimal and hexadecimal values.
Reviewed-by: John Kennedy <[email protected]>
Reviewed by: Sara Hartse <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Igor Kozhukhov <[email protected]>
Closes #9125
Closes #9195
|
|
|
|
|
|
|
|
|
|
| |
Reuse enum value ZFS_IOC_BASE for `('Z' << 8)`.
This is helpful on FreeBSD where ZFS_IOC_BASE has a different value and
`('Z' << 8)` is wrong.
Reviewed-by: Chris Dunlop <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9188
|
|
|
|
|
|
|
|
|
| |
When checking ZFS_IOC_* numbers, print which numbers are wrong rather
than silently failing.
Reviewed-by: Chris Dunlop <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9187
|
|
|
|
|
|
|
|
|
|
| |
The ancient version of blkid (v2.17.2) used in CentOS 6 will not
detect the newly created pool unless it has been written to.
Force a pool sync so `zpool import` will detect the newly created
pool.
Reviewed-by: John Kennedy <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9199
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It used to be possible for zfs receive (and other operations related
to clone swap) to bypass refquotas. This can cause a number of issues,
and there should be an automated test for it.
Added tests for rollback and receive not overriding refquota.
Reviewed-by: Pavel Zakharov <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes #9139
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.
We fix this by only calling zil_remove_async when the file is fully
unlinked.
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Prakash Surya <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #8769
Closes #9061
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consumers of ZFS Channel Programs can now list bookmarks,
and get holds from datasets. A minor-refactoring was also
applied to distinguish between user and system properties
in ZCP.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Ported-by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Dan Kimmel <[email protected]>
OpenZFS-issue: https://illumos.org/issues/8862
Closes #7902
|
|
|
|
|
|
|
|
|
| |
This patch adds a new test that sanity checks cancelling a removal.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #9101
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 693c1fc478cc8118dd0168c4815c0ae3be41c9c3. This
change resulted in a kmem leak being observed in existing code which
needs to be identified and addressed.
Reviewed-by: Paul Zuchowski <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #8978
Closes #9090
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide zfstest coverage for these two issues which
were a panic accessing extended attributes and
a problem comparing 64 bit and 32 bit generation
numbers.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Paul Zuchowski <[email protected]>
Issue #5866
Issue #8858
Closes #8978
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't unconditionally return 0 (i.e. retain SUID/SGID).
Test CAP_FSETID capability.
https://github.com/pjd/pjdfstest/blob/master/tests/chmod/12.t
which expects SUID/SGID to be dropped on write(2) by non-owner fails
without this. Most filesystems make this decision within VFS by using
a generic file write for fops.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #9035
Closes #9043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deleting a clone requires finding blocks are clone-only, not shared
with the snapshot. This was done by traversing the entire block tree
which results in a large performance penalty for sparsely
written clones.
This is new method keeps track of clone blocks when they are
modified in a "Livelist" so that, when it’s time to delete,
the clone-specific blocks are already at hand.
We see performance improvements because now deletion work is
proportional to the number of clone-modified blocks, not the size
of the original dataset.
Reviewed-by: Sean Eric Fagan <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Sara Hartse <[email protected]>
Closes #8416
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tests in tests/functional/cli_root/zpool_status should all require
root. However, linux.run has "user =" specified for those tests, which
means they run as a normal user. When I removed that line to run them
as root, the following tests did not pass:
zpool_status_003_pos
zpool_status_-c_disable
zpool_status_-c_homedir
zpool_status_-c_searchpath
These tests need to be run as a normal user. To fix this, move these
tests to a new tests/functional/cli_user/zpool_status directory.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #9057
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds the ability to sanity check zfs create arguments and to see the
value of any additional properties that will local to the dataset. For
example, automation that may need to adjust quota on a parent filesystem
before creating a volume may call `zfs create -nP -V <size> <volume>` to
obtain the value of refreservation. This adds the following options to
zfs create:
- -n dry-run (no-op)
- -v verbose
- -P parseable (implies verbose)
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Jerry Jelinek <[email protected]>
Signed-off-by: Mike Gerdts <[email protected]>
Closes #8974
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
= Motivation
At Delphix we've seen a lot of customer systems where fragmentation
is over 75% and random writes take a performance hit because a lot
of time is spend on I/Os that update on-disk space accounting metadata.
Specifically, we seen cases where 20% to 40% of sync time is spend
after sync pass 1 and ~30% of the I/Os on the system is spent updating
spacemaps.
The problem is that these pools have existed long enough that we've
touched almost every metaslab at least once, and random writes
scatter frees across all metaslabs every TXG, thus appending to
their spacemaps and resulting in many I/Os. To give an example,
assuming that every VDEV has 200 metaslabs and our writes fit within
a single spacemap block (generally 4K) we have 200 I/Os. Then if we
assume 2 levels of indirection, we need 400 additional I/Os and
since we are talking about metadata for which we keep 2 extra copies
for redundancy we need to triple that number, leading to a total of
1800 I/Os per VDEV every TXG.
We could try and decrease the number of metaslabs so we have less
I/Os per TXG but then each metaslab would cover a wider range on
disk and thus would take more time to be loaded in memory from disk.
In addition, after it's loaded, it's range tree would consume more
memory.
Another idea would be to just increase the spacemap block size
which would allow us to fit more entries within an I/O block
resulting in fewer I/Os per metaslab and a speedup in loading time.
The problem is still that we don't deal with the number of I/Os
going up as the number of metaslabs is increasing and the fact
is that we generally write a lot to a few metaslabs and a little
to the rest of them. Thus, just increasing the block size would
actually waste bandwidth because we won't be utilizing our bigger
block size.
= About this patch
This patch introduces the Log Spacemap project which provides the
solution to the above problem while taking into account all the
aforementioned tradeoffs. The details on how it achieves that can
be found in the references sections below and in the code (see
Big Theory Statement in spa_log_spacemap.c).
Even though the change is fairly constraint within the metaslab
and lower-level SPA codepaths, there is a side-change that is
user-facing. The change is that VDEV IDs from VDEV holes will no
longer be reused. To give some background and reasoning for this,
when a log device is removed and its VDEV structure was replaced
with a hole (or was compacted; if at the end of the vdev array),
its vdev_id could be reused by devices added after that. Now
with the pool-wide space maps recording the vdev ID, this behavior
can cause problems (e.g. is this entry referring to a segment in
the new vdev or the removed log?). Thus, to simplify things the
ID reuse behavior is gone and now vdev IDs for top-level vdevs
are truly unique within a pool.
= Testing
The illumos implementation of this feature has been used internally
for a year and has been in production for ~6 months. For this patch
specifically there don't seem to be any regressions introduced to
ZTS and I have been running zloop for a week without any related
problems.
= Performance Analysis (Linux Specific)
All performance results and analysis for illumos can be found in
the links of the references. Redoing the same experiments in Linux
gave similar results. Below are the specifics of the Linux run.
After the pool reached stable state the percentage of the time
spent in pass 1 per TXG was 64% on average for the stock bits
while the log spacemap bits stayed at 95% during the experiment
(graph: sdimitro.github.io/img/linux-lsm/PercOfSyncInPassOne.png).
Sync times per TXG were 37.6 seconds on average for the stock
bits and 22.7 seconds for the log spacemap bits (related graph:
sdimitro.github.io/img/linux-lsm/SyncTimePerTXG.png). As a result
the log spacemap bits were able to push more TXGs, which is also
the reason why all graphs quantified per TXG have more entries for
the log spacemap bits.
Another interesting aspect in terms of txg syncs is that the stock
bits had 22% of their TXGs reach sync pass 7, 55% reach sync pass 8,
and 20% reach 9. The log space map bits reached sync pass 4 in 79%
of their TXGs, sync pass 7 in 19%, and sync pass 8 at 1%. This
emphasizes the fact that not only we spend less time on metadata
but we also iterate less times to convergence in spa_sync() dirtying
objects.
[related graphs:
stock- sdimitro.github.io/img/linux-lsm/NumberOfPassesPerTXGStock.png
lsm- sdimitro.github.io/img/linux-lsm/NumberOfPassesPerTXGLSM.png]
Finally, the improvement in IOPs that the userland gains from the
change is approximately 40%. There is a consistent win in IOPS as
you can see from the graphs below but the absolute amount of
improvement that the log spacemap gives varies within each minute
interval.
sdimitro.github.io/img/linux-lsm/StockVsLog3Days.png
sdimitro.github.io/img/linux-lsm/StockVsLog10Hours.png
= Porting to Other Platforms
For people that want to port this commit to other platforms below
is a list of ZoL commits that this patch depends on:
Make zdb results for checkpoint tests consistent
db587941c5ff6dea01932bb78f70db63cf7f38ba
Update vdev_is_spacemap_addressable() for new spacemap encoding
419ba5914552c6185afbe1dd17b3ed4b0d526547
Simplify spa_sync by breaking it up to smaller functions
8dc2197b7b1e4d7ebc1420ea30e51c6541f1d834
Factor metaslab_load_wait() in metaslab_load()
b194fab0fb6caad18711abccaff3c69ad8b3f6d3
Rename range_tree_verify to range_tree_verify_not_present
df72b8bebe0ebac0b20e0750984bad182cb6564a
Change target size of metaslabs from 256GB to 16GB
c853f382db731e15a87512f4ef1101d14d778a55
zdb -L should skip leak detection altogether
21e7cf5da89f55ce98ec1115726b150e19eefe89
vs_alloc can underflow in L2ARC vdevs
7558997d2f808368867ca7e5234e5793446e8f3f
Simplify log vdev removal code
6c926f426a26ffb6d7d8e563e33fc176164175cb
Get rid of space_map_update() for ms_synced_length
425d3237ee88abc53d8522a7139c926d278b4b7f
Introduce auxiliary metaslab histograms
928e8ad47d3478a3d5d01f0dd6ae74a9371af65e
Error path in metaslab_load_impl() forgets to drop ms_sync_lock
8eef997679ba54547f7d361553d21b3291f41ae7
= References
Background, Motivation, and Internals of the Feature
- OpenZFS 2017 Presentation:
youtu.be/jj2IxRkl5bQ
- Slides:
slideshare.net/SerapheimNikolaosDim/zfs-log-spacemaps-project
Flushing Algorithm Internals & Performance Results
(Illumos Specific)
- Blogpost:
sdimitro.github.io/post/zfs-lsm-flushing/
- OpenZFS 2018 Presentation:
youtu.be/x6D2dHRjkxw
- Slides:
slideshare.net/SerapheimNikolaosDim/zfs-log-spacemap-flushing-algorithm
Upstream Delphix Issues:
DLPX-51539, DLPX-59659, DLPX-57783, DLPX-61438, DLPX-41227, DLPX-59320
DLPX-63385
Reviewed-by: Sean Eric Fagan <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #8442
|
|
|
|
|
|
|
|
|
|
| |
log_neg_expect was using the wrong exit status to detect if a process
got killed by SIGSEGV or SIGBUS, resulting in false positives.
Reviewed-by: loli10K <[email protected]>
Reviewed by: John Kennedy <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Signed-off-by: Attila Fülöp <[email protected]>
Closes #9003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Strategy of parallel mount is as follows.
1) Initial thread dispatching is to select sets of mount points that
don't have dependencies on other sets, hence threads can/should run
lock-less and shouldn't race with other threads for other sets. Each
thread dispatched corresponds to top level directory which may or may
not have datasets to be mounted on sub directories.
2) Subsequent recursive thread dispatching for each thread from 1)
is to mount datasets for each set of mount points. The mount points
within each set have dependencies (i.e. child directories), so child
directories are processed only after parent directory completes.
The problem is that the initial thread dispatching in
zfs_foreach_mountpoint() can be multi-threaded when it needs to be
single-threaded, and this puts threads under race condition. This race
appeared as mount/unmount issues on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.
There are currently two known patterns of input list `handles` in
`zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.
1) #8833 case where input is `/a /a /a/b` after sorting.
The problem is that libzfs_path_contains() can't correctly handle an
input list with two same top level directories.
There is a race between two POSIX threads A and B,
* ThreadA for "/a" for test1 and "/a/b"
* ThreadB for "/a" for test0/a
and in case of #8833, ThreadA won the race. Two threads were created
because "/a" wasn't considered as `"/a" contains "/a"`.
2) #8450 case where input is `/ /var/data /var/data/test` after sorting.
The problem is that libzfs_path_contains() can't correctly handle an
input list containing "/".
There is a race between two POSIX threads A and B,
* ThreadA for "/" and "/var/data/test"
* ThreadB for "/var/data"
and in case of #8450, ThreadA won the race. Two threads were created
because "/var/data" wasn't considered as `"/" contains "/var/data"`.
In other words, if there is (at least one) "/" in the input list,
the initial thread dispatching must be single-threaded since every
directory is a child of "/", meaning they all directly or indirectly
depend on "/".
In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result the initial thread
dispatching creates another thread when it needs to be single-threaded.
Fix a conditional in libzfs_path_contains() to consider above two.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Sebastien Roy <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8450
Closes #8833
Closes #8878
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a volume is created in a pool with raidz vdevs and
volblocksize != 128k, the volume can reference more space than is
reserved with the automatically calculated refreservation. There
are two deficiencies in vol_volsize_to_reservation that contribute
to this:
1) Skip blocks may be added to keep each allocation a multiple
of parity + 1. This is the dominating factor when volblocksize
is close to 2^ashift.
2) raidz deflation for 128 KB blocks is different for most other
block sizes.
See "The theory of raidz space accounting" comment in
libzfs_dataset.c for a full explanation.
Authored by: Mike Gerdts <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Reviewed by: Sanjay Nadkarni <[email protected]>
Reviewed by: Jerry Jelinek <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Kody Kantor <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Approved by: Dan McDonald <[email protected]>
Ported-by: Mike Gerdts <[email protected]>
Porting Notes:
* ZTS: wait for zvols to exist before writing
* ZTS: use log_must_busy with {zpool|zfs} destroy
OpenZFS-issue: https://www.illumos.org/issues/9318
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/b73ccab0
Closes #8973
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After device removal, performing nopwrites on a dmu_sync-ed block
will result in a panic. This panic can show up in two ways:
1. an attempt to issue an IOCTL in vdev_indirect_io_start()
2. a failed comparison of zio->io_bp and zio->io_bp_orig in
zio_done()
To resolve both of these panics, nopwrites of blocks on indirect
vdevs should be ignored and new allocations should be performed on
concrete vdevs.
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Pavel Zakharov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: George Wilson <[email protected]>
Closes #8957
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the ability for the user to unload keys for
datasets as they are being unmounted. This is analogous to
'zfs mount -l'.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes: #8917
Closes: #8952
|
|
|
|
|
|
|
|
| |
The -Y option was added for ztest to test split block reconstruction.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Elling <[email protected]>
Signed-off-by: Igor Kozhukhov <[email protected]>
Closes #8926
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "zfs remap" command was disabled by
6e91a72fe3ff8bb282490773bd687632f3e8c79d, because it has little utility
and introduced some tricky bugs. This commit removes the code for it,
the associated ZFS_IOC_REMAP ioctl, and tests.
Note that the ioctl and property will remain, but have no functionality.
This allows older software to fail gracefully if it attempts to use
these, and avoids a backwards incompatibility that would be introduced if
we renumbered the later ioctls/props.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #8944
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Resolve the incorrect use of srcdir and builddir references for
various files in the build system. These have crept in over time
and went unnoticed because when building in the top level directory
srcdir and builddir are identical.
With this change it's again possible to build in a subdirectory.
$ mkdir obj
$ cd obj
$ ../configure
$ make
Reviewed-by: loli10K <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8921
Closes #8943
|
|
|
|
|
|
|
|
|
| |
30af21b025 needed to ignore get_diff executable.
Reviewed-by: loli10K <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8950
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem Statement
=================
ZFS Channel program scripts currently require a timeout, so that hung or
long-running scripts return a timeout error instead of causing ZFS to get
wedged. This limit can currently be set up to 100 million Lua instructions.
Even with a limit in place, it would be desirable to have a sys admin
(support engineer) be able to cancel a script that is taking a long time.
Proposed Solution
=================
Make it possible to abort a channel program by sending an interrupt signal.In
the underlying txg_wait_sync function, switch the cv_wait to a cv_wait_sig to
catch the signal. Once a signal is encountered, the dsl_sync_task function can
install a Lua hook that will get called before the Lua interpreter executes a
new line of code. The dsl_sync_task can resume with a standard txg_wait_sync
call and wait for the txg to complete. Meanwhile, the hook will abort the
script and indicate that the channel program was canceled. The kernel returns
a EINTR to indicate that the channel program run was canceled.
Porting notes: Added missing return value from cv_wait_sig()
Authored by: Don Brady <[email protected]>
Reviewed by: Sebastien Roy <[email protected]>
Reviewed by: Serapheim Dimitropoulos <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Sara Hartse <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Ported-by: Don Brady <[email protected]>
Signed-off-by: Don Brady <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/9425
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/d0cb1fb926
Closes #8904
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Count the bytes of payload for each replication record type
Count the bytes of overhead (replication records themselves)
Include these counters in the output summary at the end of the run.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Allan Jude <[email protected]>
Sponsored-By: Klara Systems and Catalogic
Closes #8432
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
30af21b025 broke build on Fedora. gcc can detect potential overflow
on compile-time. Consider strlen of already copied string.
Also change strn to strl variants per suggestion from @behlendorf
and @ofaaland.
--
libzfs_input_check.c: In function 'test_redact':
libzfs_input_check.c:711:2: error: 'strncat' specified bound 288 equals
destination size [-Werror=stringop-overflow=]
strncat(bookmark, "#testbookmark", sizeof (bookmark));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8939
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When encryption was first added to ZFS, we made a decision to
prevent users from creating unencrypted children of encrypted
datasets. The idea was to prevent users from inadvertently
leaving some of their data unencrypted. However, since the
release of 0.8.0, some legitimate reasons have been brought up
for this behavior to be allowed. This patch simply removes this
limitation from all code paths that had checks for it and updates
the tests accordingly.
Reviewed-by: Jason King <[email protected]>
Reviewed-by: Sean Eric Fagan <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8737
Closes #8870
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If dedup is in use, the `dedupditto` property can be set, causing ZFS to
keep an extra copy of data that is referenced many times (>100x). The
idea was that this data is more important than other data and thus we
want to be really sure that it is not lost if the disk experiences a
small amount of random corruption.
ZFS (and system administrators) rely on the pool-level redundancy to
protect their data (e.g. mirroring or RAIDZ). Since the user/sysadmin
doesn't have control over what data will be offered extra redundancy by
dedupditto, this extra redundancy is not very useful. The bulk of the
data is still vulnerable to loss based on the pool-level redundancy.
For example, if particle strikes corrupt 0.1% of blocks, you will either
be saved by mirror/raidz, or you will be sad. This is true even if
dedupditto saved another 0.01% of blocks from being corrupted.
Therefore, the dedupditto functionality is rarely enabled (i.e. the
property is rarely set), and it fulfills its promise of increased
redundancy even more rarely.
Additionally, this feature does not work as advertised (on existing
releases), because scrub/resilver did not repair the extra (dedupditto)
copy (see https://github.com/zfsonlinux/zfs/pull/8270).
In summary, this seldom-used feature doesn't work, and even if it did it
wouldn't provide useful data protection. It has a non-trivial
maintenance burden (again see https://github.com/zfsonlinux/zfs/pull/8270).
We should remove the dedupditto functionality. For backwards
compatibility with the existing CLI, "zpool set dedupditto" will still
"succeed" (exit code zero), but won't have any effect. For backwards
compatibility with existing pools that had dedupditto enabled at some
point, the code will still be able to understand dedupditto blocks and
free them when appropriate. However, ZFS won't write any new dedupditto
blocks.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Issue #8270
Closes #8310
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The mmp_interval test case was failing on Fedora 30 due to the built-in
'echo' command terminating the script when it was unable to write to
the sysfs module parameter. This change in behavior was observed with
ksh-2020.0.0-alpha1. Resolve the issue by using the external cat
command which fails gracefully as expected.
Additionally, remove some incorrect quotes around the $? return values.
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8906
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Prashanth Sreenivasa <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Chris Williamson <[email protected]>
Reviewed-by: Pavel Zhakarov <[email protected]>
Reviewed-by: Sebastien Roy <[email protected]>
Reviewed-by: Prakash Surya <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes #7958
|
|
|
|
|
|
|
|
|
|
|
| |
This change restricts filesystem creation if the given name
contains either '.' or '..'
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Elling <[email protected]>
Signed-off-by: TulsiJain <[email protected]>
Closes #8842
Closes #8564
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The udevadm settle timeout can be 120 or 180 seconds by default
for some distributions. If a long delay is experienced, it could
be due to some strangeness in a malfunctioning device that isn't
related to the devices under test. To help debug this condition,
a notice is given if settle takes too long.
Arguments can now be passed to block_device_wait. The expected
arguments are block device pathnames.
Reviewed by: John Kennedy <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Elling <[email protected]>
Closes #8839
|
|
|
|
|
|
|
|
|
| |
Reviewed by: John Kennedy <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Elling <[email protected]>
Closes #8839
|
|
|
|
|
|
|
|
|
| |
Reviewed by: John Kennedy <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Elling <[email protected]>
Closes #8839
|
|
|
|
|
|
|
|
|
| |
Reviewed by: John Kennedy <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Elling <[email protected]>
Closes #8839
|
|
|
|
|
|
|
|
|
|
|
| |
The build for test binary hkdf_test was linking both against libicp
and libzpool. This results in two instances of libicp inside the
binary but the call to icp_init() only initializes one of them!
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #8850
|
|
|
|
|
|
|
|
|
|
| |
Add tests for
97aa3ba44("Fix link count of root inode when snapdir is visible")
as suggested in #8727.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8732
|
|
|
|
|
|
|
|
|
|
| |
This commits fixes a double-free in zfs_ioc_pool_create() triggered by
specifying an unsupported combination of properties when creating a pool
with encryption enabled.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8791
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
files in dist_*_SCRIPTS get installed with 0755, those in dist_*_DATA
with 0644. This commit moves all .kshlib, .shlib and .cfg files in the
testsuite to dist_pkgdata_DATA, and removes the shebang from
zpool_import.kshlib.
This ensures that the files are installed with appropriate permissions
and silences some warnings from lintian
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: loli10K <[email protected]>
Reviewed by: John Kennedy <[email protected]>
Signed-off-by: Stoiko Ivanov <[email protected]>
Closes #8803
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 6e72a5b9b61066146deafda39ab8158c559f5f15 python scripts which
work with python2 and python3 changed the shebang from /usr/bin/python
to /usr/bin/python3. This gets adapted by the build-system on systems
which don't provide python3.
This commit changes test-runner.py to also use /usr/bin/python3,
enabling the change during buildtime and fixing a minor lintian issue
for those Debian packages, which depend on a specific python version
(python3/python2).
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: loli10K <[email protected]>
Reviewed by: John Kennedy <[email protected]>
Signed-off-by: Stoiko Ivanov <[email protected]>
Closes #8803
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change prevents the following warning when packaging some zfs-tests
files:
*** WARNING: ./usr/src/zfs-0.8.0/tests/zfs-tests/include/zpool_script.shlib
is executable but has empty or no shebang, removing executable bit
Reviewed by: John Kennedy <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8787
|
|
|
|
|
|
|
|
|
|
| |
This commit updates the ZFS Test Suite to detect incorrect wrapping of
both zfs(8) and zpool(8) help message
Reviewed by: John Kennedy <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8785
|
|
|
|
|
|
|
|
|
|
|
| |
The test in zfs-tests/tests/perf/regression/random_readwrite_fixed.ksh
is the only file to use /usr/bin/ksh in the shebang.
Change it to /bin/ksh for consistency.
Reviewed by: John Kennedy <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by: Stoiko Ivanov <[email protected]>
Closes #8779
|