| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some issues with the way the seq_file interface is implemented
for kstats backed by linked lists (zfs_dbgmsgs and certain per-pool
debugging info):
* We don't account for the fact that seq_file sometimes visits a node
multiple times, which results in missing messages when read through
procfs.
* We don't keep separate state for each reader of a file, so concurrent
readers will receive incorrect results.
* We don't account for the fact that entries may have been removed from
the list between read syscalls, so reading from these files in procfs
can cause the system to crash.
This change fixes these issues and adds procfs_list, a wrapper around a
linked list which abstracts away the details of implementing the
seq_file interface for a list and exposing the contents of the list
through procfs.
Reviewed by: Don Brady <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Reviewed by: Brad Lewis <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: John Gallagher <[email protected]>
External-issue: LX-1211
Closes #7819
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ran zts-report.py and test-runner.py from ./tests/test-runner/bin/
through the 2to3 (https://docs.python.org/2/library/2to3.html).
Checked the result, fixed:
- 'maxint' -> 'maxsize' that 2to3 missed.
- 'cmp=' parameter for a 'sorted()' with a 'key=' version.
- try/except wrapping of configparser import as there are still
python 2.7 systems that lack a compatibility shim
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Gregor Kopka <[email protected]>
Closes #7925
Closes #7952
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
torvalds/linux@59b57717f ("blkcg: delay blkg destruction until
after writeback has finished") added a refcount_t to the blkcg
structure. Due to the refcount_t compatibility code, zfs_refcount_t
was used by mistake.
Resolve this by removing the compatibility code and replacing the
occurrences of refcount_t with zfs_refcount_t.
Reviewed-by: Franz Pletz <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Schumacher <[email protected]>
Closes #7885
Closes #7932
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When zfs_kobj_init() is called with an attr_cnt of 0 only the
kobj->zko_default_attrs is allocated. It subsequently won't
get freed in zfs_kobj_release since the free is wrapped in
a kobj->zko_attr_count != 0 conditional.
Split the block in zfs_kobj_release() to make sure the
kobj->zko_default_attrs are freed in this case.
Additionally, fix a minor spelling mistake and typo in
zfs_kobj_init() which could also cause a leak but in practice
is almost certain not to fail.
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: John Gallagher <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7957
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bandwidth and iops are average per second while *_wait are averages
per request for latency or, for queue depths, an instantaneous
measurement at the end of an interval (according to man zpool).
When calculating the first two it makes sense to do
x/interval_duration (x being the increase in total bytes or number of
requests over the duration of the interval, interval_duration in
seconds) to 'scale' from amount/interval_duration to amount/second.
But applying the same math for the latter (*_wait latencies/queue) is
wrong as there is no interval_duration component in the values (these
are time/requests to get to average_time/request or already an
absulute number).
This bug leads to the only correct continuous *_wait figures for both
latencies and queue depths from 'zpool iostat -l/q' being with
duration=1 as then the wrong math cancels itself (x/1 is a nop).
This removes temporal scaling from latency and queue depth figures.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Gregor Kopka <[email protected]>
Closes #7945
Closes #7694
|
|
|
|
|
|
|
|
| |
This reverts commit b8fd4310c54444eecb66140d99a6156f4353b29b which
accidentally introduced a regression for some versions of python.
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #7929
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When handling a 32-bit statfs() system call the returned fields,
although 64-bit in the kernel, must be limited to 32-bits or an
EOVERFLOW error will be returned.
This is less of an issue for block counts since the default
reported block size in 128KiB. But since it is possible to
set a smaller block size, these values will be scaled as
needed to fit in a 32-bit unsigned long.
Unlike most other filesystems the total possible file counts
are more likely to overflow because they are calculated based
on the available free space in the pool. In order to prevent
this the reported value must be capped at 2^32-1. This is
only for statfs(2) reporting, there are no changes to the
internal ZFS limits.
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #7927
Closes #7122
Closes #7937
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change simplify the test case removing part of the logic which was
introducing a race condition and thus causing spurious failures: we use
attempt_during_removal() from removal.kshlib instead which has been
observed to be more stable.
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #7894
Closes #7913
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ran zts-report.py and test-runner.py from ./tests/test-runner/bin/
through the 2to3 (https://docs.python.org/2/library/2to3.html).
Checked the result, fixed:
- 'maxint' -> 'maxsize' that 2to3 missed.
- 'cmp=' parameter for a 'sorted()' with a 'key=' version.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: John Wren Kennedy <[email protected]>
Signed-off-by: Gregor Kopka <[email protected]>
Closes #7925
Closes #7929
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently vdev_disk_error() prepends its messages sent to the internal
ZFS debug log with KERN_WARNING, which is currently defined as follows:
#define KERN_SOH "\001"
#define KERN_WARNING KERN_SOH "4"
Since "\001" (ASCII Start Of Header) is not printable this results in
weird characters displayed when inspecting the debug log. This commit
simply removes this superfluous prefix passed to zfs_dbgmsg().
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #7936
|
|
|
|
|
|
|
|
| |
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Signed-off-by: DHE <[email protected]>
Closes #7938
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds limits to the possible spa_slop_shift values set via
the sysfs interface. Accepted values are from a minimum of 1 to a
maximum of 31 (inclusive): these limits are based on the following
values observed on a 128PB file-vdev test pool:
spa_slop_shift=1, spa_get_slop_space=63.5PiB
spa_slop_shift=2, spa_get_slop_space=31.8PiB
spa_slop_shift=3, spa_get_slop_space=15.9PiB
spa_slop_shift=4, spa_get_slop_space=7.9PiB
spa_slop_shift=5, spa_get_slop_space=4PiB
spa_slop_shift=6, spa_get_slop_space=2PiB
...
spa_slop_shift=25, spa_get_slop_space=4GiB
spa_slop_shift=26, spa_get_slop_space=2GiB
spa_slop_shift=27, spa_get_slop_space=1016MiB
spa_slop_shift=28, spa_get_slop_space=508MiB
spa_slop_shift=29, spa_get_slop_space=254MiB
spa_slop_shift=30, spa_get_slop_space=128MiB
spa_slop_shift=31, spa_get_slop_space=128MiB
spa_slop_shift=32, spa_get_slop_space=128MiB
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #7876
Closes #7900
|
|
|
|
|
|
|
|
|
|
|
|
| |
I received a request for a NEWS file. That needs to be handled by Tony
and Brian, but for now, we can at least provide one that provides a link
to github so that users who expect NEWS files from their packages will
know where to go for release information.
Reviewed-by: Neal Gompa <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes #7918
|
|
|
|
|
|
|
|
| |
This change prevents zstreamdump from crashing when trying to print
invalid nvlist data (DRR_BEGIN record) from a truncated send stream.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #7917
|
|
|
|
|
|
|
| |
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Signed-off-by: DHE <[email protected]>
Closes #7920
|
|
|
|
|
|
|
|
|
|
|
| |
A new wiki page has been added for users who may be new to Git or
GitHub. Adding link to CONTRIBUTING.
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: Gregor Kopka <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: bunder2015 <[email protected]>
Closes #7923
Closes #7915
|
|
|
|
|
|
|
|
|
|
| |
The man pages for zpool and zfs (get command)
listed the pool/dataset parameter as required,
but these are optional. Fixed that.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Gregor Kopka <[email protected]>
Closes #7916
|
|
|
|
|
|
|
|
|
|
|
| |
Update zpool(8) to clarify what type of vdevs may be safely
removed and that the existence of any top-level raidz device
which is part of the primary pool will prevent device removal.
Reviewed by: Matthew Ahrens <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7880
Closes #7893
|
|
|
|
|
|
|
|
|
|
|
| |
This change improve the handling of invalid filesystem properties when
specified at pool creation: this is useful when 'zpool create -n'
(dry run) is executed to detect invalid fs-level options (-O) before
the actual command is run.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #7620
Closes #7878
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the removal_resume_export test case to the possible failure
section of the zts-report.py and reference the Github issue. In
the CI environment this test has proven to be unreliable due to
the way it detects the removal thread. This is a flaw in the test
and not device removal so update the result summary accordingly.
Additionally, increase the allowed timeout in an effort to reduce
the observed rate of false positves.
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7895
Issue #7894
|
|
|
|
|
|
|
|
|
|
|
| |
Added vdev_resilver_needed() check to verify VDEVs are fully
synced, so that after split the new pool will not be corrupted.
Reviewed by: Pavel Zakharov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: loli10K <[email protected]>
Signed-off-by: Roman Strashkin <[email protected]>
Closes #7865
Closes #7881
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Major new features:
- Native encryption
- Device removal
- Allocation classes
- Pool checkpoints
- Sequential scrub and resilver
- Project quota
- Channel programs
- Direct IO
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The recent sysfs zfs properties feature breaks the in-kernel
builds of zfs (sans module). When not built as a module add
the sysfs entries under /sys/fs/zfs/.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #7868
Closes #7872
|
|
|
|
|
|
|
|
| |
The ZTS zfs_sysfs_live test fails occasionally due to an uninitialized
string on an error path.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #7869
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When no permission set is defined for a dataset the create time
permissions are incorrectly shown as if they were a permission set.
This change simply correct how allow permissions are displayed.
This commit also fixes a small manpage formatting issue and adds the
"zfs_allow_003_pos" test case to the ZFS Test Suite.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #7519
Closes #7860
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allocation Classes add the ability to have allocation classes in a
pool that are dedicated to serving specific block categories, such
as DDT data, metadata, and small file blocks. A pool can opt-in to
this feature by adding a 'special' or 'dedup' top-level VDEV.
Reviewed by: Pavel Zakharov <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Reviewed-by: HÃ¥kan Johansson <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: DHE <[email protected]>
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: Gregor Kopka <[email protected]>
Reviewed-by: Kash Pande <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #5182
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As a regular kernel function, kern_path() returns errors as negative
errnos, such as -ELOOP. zfsctl_snapdir_vget() must convert these into
the positive errnos used throughout the ZFS code when it returns them
to other ZFS functions so that the ZFS code properly sees them as
errors.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chris Siebenmann <[email protected]>
Closes #7764
Closes #7864
|
|
|
|
|
|
|
|
|
|
|
| |
Re-adds a recalculation step for the ARC stats after the MRU
eviction so that we don't pathologically attempt to evict the MFU.
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Authored-by: Mark Johnston <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes #7855
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit a6214a0ae9e78d3cac0e495e2fcf7af0858a872f.
Disabling zfs_admin_snapshot by default results in multiple ZTS
tests failing which depend on this functionality. Revert this
change until the relevant test cases can be updated.
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #7838
|
|
|
|
|
|
|
|
|
|
| |
It's disabled by default, update code to reflect
the documentation.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Gregor Kopka <[email protected]>
Signed-off-by: George Melikov <[email protected]>
Closes #7835
Closes #7838
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Relax allocation throttling for ditto blocks. Due to random imbalances
in allocation it tends to push block copies to one vdev, that looks
slightly better at the moment. Slightly less strict policy allows both
improve data security and surprisingly write performance, since we don't
need to touch extra metaslabs on each vdev to respect the min distance.
Sponsored by: iXsystems, Inc.
Authored by: mav <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Pavel Zakharov <[email protected]>
Ported-by: Brian Behlendorf <[email protected]>
OpenZFS-issue: https://illumos.org/issues/9751
FreeBSD-commit: https://github.com/freebsd/freebsd/commit/8253837ac3
Closes #7857
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use METASLAB_WEIGHT_CLAIM weight to allocate tertiary blocks.
Previous use of METASLAB_WEIGHT_SECONDARY for that caused errors
later on metaslab_activate_allocator() call, leading to massive
load of unneeded metaslabs and write freezes.
Authored by: mav <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Ported-by: Brian Behlendorf <[email protected]>
OpenZFS-issue: https://illumos.org/issues/9738
FreeBSD-commit: https://github.com/freebsd/freebsd/commit/63e7138
Closes #7858
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #7780
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extends our sysfs '/sys/module/zfs' entry to include feature
and property attributes. The primary consumer of this information
is user processes, like the zfs CLI, that need to know what the
current loaded ZFS module supports. The libzfs binary will consult
this information when instantiating the zfs and zpool property
tables and the pool features table.
This introduces 4 kernel objects (dirs) into '/sys/module/zfs'
with corresponding attributes (files):
features.runtime
features.pool
properties.dataset
properties.pool
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #7706
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's possible for an unrelated process, like blkid, to have the
volume open when 'zfs destroy' is run. Switch the cleanup functions
to the destroy_dataset() helper which handles this case by retrying
the destroy when the dataset is busy. This was done not only for
volumes but also for file systems for consistency.
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7854
|
|
|
|
|
|
|
|
|
|
|
| |
The checkpoint space map object may not be accessible from the
vdev's ZAP when it has been damaged. This may be the case when
performing an extreme rewind when importing the pool.
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Reviewed by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7809
Closes #7853
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can simplify the dbuf_hold code by allocating dbuf_hold_arg_t's on
demand, rather than allocating a big array of them up front. While this
can occasionally increase the number of allocations, typically only one
allocation is needed since the indirect block is already cached.
The performance test suite gets the same results with this change.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #7841
|
|
|
|
|
|
|
|
|
| |
Removing hardcoded paths in pool_checkpoint.kshlib
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: bunder2015 <[email protected]>
Closes #7840
|
|
|
|
|
|
| |
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Elling <[email protected]>
Closes #7848
|
|
|
|
|
|
|
|
|
| |
Removing hardcoded paths in zvol_swap_003
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: bunder2015 <[email protected]>
Closes #7839
|
|
|
|
|
|
|
|
|
| |
Removing hardcoded paths in many scripts.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: bernie1995 <[email protected]>
Issue #7507
Closes #7843
|
|
|
|
|
|
|
|
|
|
| |
It's possible for an unrelated process, like blkid, to have the
volume open when 'zfs destroy' is run. Switch the cleanup function
to the destroy_dataset() helper which handles this case by retrying
the destroy when the dataset is busy.
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7847
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Assertion failed in arc_buf_destroy() when concurrently reading
block with checksum error.
Porting notes:
* The ability to zinject decompression errors has been added, but
this only works at the zio_decompress() level, where we have all
of the info we need to match against the user's zinject options.
* The decompress_fault test has been added to test the new zinject
functionality
* We attempted to set zio_decompress_fail_fraction to (1 << 18) in
ztest for further test coverage. Although this did uncover a few
low priority issues, this unfortuantely also causes ztest to
ASSERT in many locations where the code is working correctly since
it is designed to fail on IO errors. Developers can manually set
this variable with the '-o' option to find and debug issues.
Authored by: Matt Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Pavel Zakharov <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Approved by: Matt Ahrens <[email protected]>
Ported-by: Tom Caputi <[email protected]>
OpenZFS-issue: https://illumos.org/issues/9403
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/fa98e487a9
Closes #7822
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, when unmounting a filesystem, ZFS will only wait for
a txg sync if the dataset is dirty and not readonly. However, this
can be problematic in cases where a dataset is remounted readonly
immediately before being unmounted, which often happens when the
system is being shut down. Since encrypted datasets require that
all I/O is completed before the dataset is disowned, this issue
causes problems when write I/Os leak into the txgs after the
dataset is disowned, which can happen when sync=disabled.
While looking into fixes for this issue, it was discovered that
dsl_dataset_is_dirty() does not return B_TRUE when the dataset has
been removed from the txg dirty datasets list, but has not actually
been processed yet. Furthermore, the implementation is comletely
different from dmu_objset_is_dirty(), adding to the confusion.
Rather than relying on this function, this patch forces the umount
code path (and the remount readonly code path) to always perform a
txg sync on read-write datasets and removes the function altogether.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #7753
Closes #7795
|
|
|
|
|
|
|
|
|
|
| |
This patch simply adds some missing locking to the txg_list
functions and refactors txg_verify() so that it is only compiled
in for debug builds.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #7795
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Direct IO via the O_DIRECT flag was originally introduced in XFS by
IRIX for database workloads. Its purpose was to allow the database
to bypass the page and buffer caches to prevent unnecessary IO
operations (e.g. readahead) while preventing contention for system
memory between the database and kernel caches.
On Illumos, there is a library function called directio(3C) that
allows user space to provide a hint to the file system that Direct IO
is useful, but the file system is free to ignore it. The semantics
are also entirely a file system decision. Those that do not
implement it return ENOTTY.
Since the semantics were never defined in any standard, O_DIRECT is
implemented such that it conforms to the behavior described in the
Linux open(2) man page as follows.
1. Minimize cache effects of the I/O.
By design the ARC is already scan-resistant which helps mitigate
the need for special O_DIRECT handling. Data which is only
accessed once will be the first to be evicted from the cache.
This behavior is in consistent with Illumos and FreeBSD.
Future performance work may wish to investigate the benefits of
immediately evicting data from the cache which has been read or
written with the O_DIRECT flag. Functionally this behavior is
very similar to applying the 'primarycache=metadata' property
per open file.
2. O_DIRECT _MAY_ impose restrictions on IO alignment and length.
No additional alignment or length restrictions are imposed.
3. O_DIRECT _MAY_ perform unbuffered IO operations directly
between user memory and block device.
No unbuffered IO operations are currently supported. In order
to support features such as transparent compression, encryption,
and checksumming a copy must be made to transform the data.
4. O_DIRECT _MAY_ imply O_DSYNC (XFS).
O_DIRECT does not imply O_DSYNC for ZFS. Callers must provide
O_DSYNC to request synchronous semantics.
5. O_DIRECT _MAY_ disable file locking that serializes IO
operations. Applications should avoid mixing O_DIRECT
and normal IO or mmap(2) IO to the same file. This is
particularly true for overlapping regions.
All I/O in ZFS is locked for correctness and this locking is not
disabled by O_DIRECT. However, concurrently mixing O_DIRECT,
mmap(2), and normal I/O on the same file is not recommended.
This change is implemented by layering the aops->direct_IO operations
on the existing AIO operations. Code already existed in ZFS on Linux
for bypassing the page cache when O_DIRECT is specified.
References:
* http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch02s09.html
* https://blogs.oracle.com/roch/entry/zfs_and_directio
* https://ext4.wiki.kernel.org/index.php/Clarifying_Direct_IO's_Semantics
* https://illumos.org/man/3c/directio
Reviewed-by: Richard Elling <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #224
Closes #7823
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the %changelog section from the spec files since it does
not get updated in the master branch. Not only does this mean
the information is stale, but it can result in 'make deb' failing
to build packages, issue #7825. This section should be updated
for tagged releases.
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7825
Closes #7827
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a bunch of truncation compiler warnings that show up
on Fedora 28 (GCC 8.0.1).
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Issue #7368
Closes #7826
Closes #7830
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BuildRequires tags for "-devel" packages in the RPM spec file do not
work when building on Debian-based distributions.
Fix this issue by making this requirement conditional to RPM-based
distributions.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #7829
Closes #7831
|
|
|
|
|
|
|
|
|
|
| |
The zfs-test package needs a build requirement on the libaio-devel
package. Without it ./configure will correctly determine that
mmap_libaio cannot be built and it will be skipped.
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7821
Closes #7824
|