| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
On openSUSE the initrd has systemctl in /usr/bin, check this path as
well.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Signed-off-by: Lorenz Hüdepohl <[email protected]>
Closes #11487
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zgenhostid(8) is used to modify or create /etc/hostid. This
administrative tool is currently installed to bindir. System utilities
are typically placed in sbin.
Modify the installation directory for zgenhostid. Additionally, track
this change in its use in dracut and the rpm installation.
Authored-by: наб <[email protected]>
Authored-by: Antonio Russo <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Antonio Russo <[email protected]>
Closes #11485
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ZFS_IOC_POOL_TRYIMPORT ioctl returns an nvlist from the kernel to a
preallocated buffer in userland. Userland must guess how large the
buffer should be. If it undersizes it, it must reallocate and try
again. That can cost a lot of time for large pools.
OpenZFS commit 28b40c8a6e3 set the guess at "zc.zc_nvlist_conf_size * 4"
without explanation. On my system, that is too small. From experiment,
x 32 is a better multiplier. But I don't know how to calculate it
theoretically.
Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Closes #11469
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Read all labels in parallel instead of sequentially.
Originally committed as
https://cgit.freebsd.org/src/commit/?id=b49e9abcf44cafaf5cfad7029c9a6adbb28346e8
Obtained from: FreeBSD
Sponsored by: Spectra Logic, Axcient
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Closes #11467
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zpool_read_label doesn't need the full labels including uberblocks. It
only needs the vdev_phys_t. This reduces by half the amount of data
read to check for a label, speeding up "zpool import", "zpool
labelclear", etc.
Originally committed as
https://cgit.freebsd.org/src/commit/?id=63f8025d6acab1b334373ddd33f940a69b3b54cc
Obtained from: FreeBSD
Sponsored by: Spectra Logic Corp, Axcient
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Closes #11467
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of commit 1c2358c1 the custom uio_prefaultpages() code
was removed in favor of using the generic kernel provided
iov_iter_fault_in_readable() interface. Unfortunately, it
turns out that up until the Linux 4.7 kernel the function would
only ever fault in the first iovec of the iov_iter. The result
being uiomove_iov() may hang waiting for the page.
This commit effectively restores the custom uio_prefaultpages()
pages code for Linux 4.9 and earlier kernels which contain the
troublesome version of iov_iter_fault_in_readable().
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #11463
Closes #11484
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In FreeBSD the struct uio was just a typedef to uio_t. In order to
extend this struct, outside of the definition for the struct uio, the
struct uio has been embedded inside of a uio_t struct.
Also renamed all the uio_* interfaces to be zfs_uio_* to make it clear
this is a ZFS interface.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Brian Atkinson <[email protected]>
Closes #11438
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `abd_get_offset_*()` routines create an abd_t that references
another abd_t, and doesn't allocate any pages/buffers of its own. In
some workloads, these routines may be called frequently, to create many
abd_t's representing small pieces of a single large abd_t. In
particular, the upcoming RAIDZ Expansion project makes heavy use of
these routines.
This commit adds the ability for the caller to allocate and provide the
abd_t struct to a variant of `abd_get_offset_*()`. This eliminates the
cost of allocating the abd_t and performing the accounting associated
with it (`abdstat_struct_size`). The RAIDZ/DRAID code uses this for
the `rc_abd`, which references the zio's abd. The upcoming RAIDZ
Expansion project will leverage this infrastructure to increase
performance of reads post-expansion by around 50%.
Additionally, some of the interfaces around creating and destroying
abd_t's are cleaned up. Most significantly, the distinction between
`abd_put()` and `abd_free()` is eliminated; all types of abd_t's are
now disposed of with `abd_free()`.
Reviewed-by: Brian Atkinson <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Issue #8853
Closes #11439
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to util-linux 2.36.2, if a file or directory in the
current working directory was named 'dataset' then mount(8)
would prepend the current working directory to the dataset.
Eventually, we should be able to drop this workaround.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Sterling Jensen <[email protected]>
Closes #11295
Closes #11462
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As described in #11445, the kernel interface kernel_{read,write} no
longer act on special devices. In the ZTS, zfs send and receive are
tested by piping to these devices, leading to spurious failures (for
positive tests) and may mask errors (for negative tests).
Until a more permanent mechanism to address this deficiency is
developed, clean up the output from the ZTS by avoiding directly piping
to or from /dev/null and /dev/zero.
For /dev/zero input, simply use a pipe: `cat </dev/zero |` .
However, for /dev/null output, the shell semantics for pipe failures
means that zfs send error codes will be masked by the successful
`| cat >/dev/null` command execution. In that case, use a temporary
file under $TEST_BASE_DIR for output in favor.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Attila Fülöp <[email protected]>
Signed-off-by: Antonio Russo <[email protected]>
Closes #11478
|
|
|
|
|
|
|
|
| |
Use verified variants of nvlist/nvpair functions where applicable.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #11460
|
|
|
|
|
|
|
| |
Try to use more appropriate ASSERT and VERIFY variants in ztest.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #11454
|
|
|
|
|
|
|
|
|
|
|
|
| |
The zfs_rollback_001 test modifies files in a temporary, test dataset
repeatedly. Before each iteration, any preexisting dataset is removed,
after unmounted with umount -f, if necessary.
Add a short delay after the forced unmount, avoiding a race that can
prevent zfs destroy from succeeding, leading to a test failure.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Antonio Russo <[email protected]>
Closes #11451
|
|
|
|
|
|
|
|
| |
Simplify ztest by using fnvlist functions to verify success.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #11441
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Each zfs ioctl that changes on-disk state (e.g. set property, create
snapshot, destroy filesystem) is recorded in the zpool history, and is
printed by `zpool history -i`.
For performance diagnostic purposes, it would be useful to know how long
each of these ioctls took to run. This commit adds that functionality,
with a new `ZPOOL_HIST_ELAPSED_NS` member of the history nvlist.
Additionally, the time recorded in this history log is currently the
time that the history record is written to disk. But in many cases (CLI
args logging and ioctl logging), this happens asynchronously,
potentially many seconds after the operation completed. This commit
changes the timestamp to reflect when the history event was created,
rather than when it was written to disk.
Reviewed-by: Mark Maybee <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #11440
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the system is very low on memory (specifically,
`arc_free_memory() < arc_sys_free/2`, i.e. less than 1/16th of RAM
free), `arc_evict_state_impl()` will defer wakups. In this case, the
arc_evict_waiter_t's remain on the list, even though `arc_evict_count`
has been incremented past their `aew_count`.
The problem is that `arc_wait_for_eviction()` assumes that if there are
waiters on the list, the count they are waiting for has not yet been
reached. However, the deferred wakeups may violate this, causing
`ASSERT(last->aew_count > arc_evict_count)` to fail.
This commit resolves the issue by having new waiters use the greater of
`arc_evict_count` and the last `aew_count`.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: George Amanakis <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #11285
Closes #11397
|
|
|
|
|
|
|
|
|
| |
zfsdev_close sets zs_minor to -1 to avoid duplicate calls to
destroy. This doesn't mix well with the current u_int used.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #11437
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Several m4 macros have been retired in autoconf 2.70. Update the
the build system to use the new macros provided to replace them.
* Replaced AC_HELP_STRING with AS_HELP_STRING.
* Replaced AC_TRY_COMPILE with AC_COMPILE_IFELSE/AC_LANG_PROGRAM.
* Replaced AC_CANONICAL_SYSTEM with AC_CANONICAL_TARGET
* Replaced AC_PROG_LIBTOOL with LT_INIT
* $CPP is not defined in ZFS_AC_KERNEL and really shouldn't be
directly used like this. Replace it with an $AWK command
to extract the kernel source version.
Reviewed-by: Eli Schwartz <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #11413
Closes #11419
|
|
|
|
|
|
|
|
|
| |
if pool root is not mounted, then zpool umount in next test will leave
dataset mountpoint directory around and next zfs mount -a will fail
with error: cannot mount '/testpool': directory is not empty
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Toomas Soome <[email protected]>
Closes #11417
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Virtuozzo 7 kernels starting 3.10.0-1127.18.2.vz7.163.46
have the following configuration:
* no HAVE_VFS_RW_ITERATE
* HAVE_VFS_DIRECT_IO_ITER_RW_OFFSET
=> let's add implementation of zpl_direct_IO() via
zpl_aio_{read,write}() in this case.
https://bugs.openvz.org/browse/OVZ-7243
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Konstantin Khorenko <[email protected]>
Closes #11410
Closes #11411
|
|
|
|
|
|
|
|
| |
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #11396
|
|
|
|
|
|
|
|
| |
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #11396
|
|
|
|
|
|
|
|
| |
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #11396
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In `zpool_find_config()`, the `pools` nvlist is leaked. Part of it (a
sub-nvlist) is returned in `*configp`, but the callers also leak that.
Additionally, in `zdb.c:main()`, the `searchdirs` is leaked.
The leaks were detected by ASAN (`configure --enable-asan`).
This commit resolves the leaks.
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #11396
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Build error on illumos with gcc 10 did reveal:
In function 'dmu_objset_refresh_ownership':
../../common/fs/zfs/dmu_objset.c:857:25: error: implicit conversion
from 'boolean_t' to 'ds_hold_flags_t' {aka 'enum ds_hold_flags'}
[-Werror=enum-conversion]
857 | dsl_dataset_disown(ds, decrypt, tag);
| ^~~~~~~
cc1: all warnings being treated as errors
libzfs_input_check.c: In function 'zfs_ioc_input_tests':
libzfs_input_check.c:754:28: error: implicit conversion from
'enum dmu_objset_type' to 'enum lzc_dataset_type'
[-Werror=enum-conversion]
754 | err = lzc_create(dataset, DMU_OST_ZFS, NULL, NULL, 0);
| ^~~~~~~~~~~
cc1: all warnings being treated as errors
The same issue is present in openzfs, and also the same issue about
ds_hold_flags_t, which currently defines exactly one valid value.
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Toomas Soome <[email protected]>
Closes #11406
|
|
|
|
|
|
|
|
|
|
|
|
| |
As of 5.11 the blk_register_region() and blk_unregister_region()
functions have been retired. This isn't a problem since add_disk()
has implicitly allocated minor numbers for a very long time.
Reviewed-by: Rafael Kitover <[email protected]>
Reviewed-by: Coleman Kane <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #11387
Closes #11390
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both revalidate_disk_size() and revalidate_disk() have been removed.
Functionally this isn't a problem because we only relied on these
functions to call zvol_revalidate_disk() for us and to perform any
additional handling which might be needed for that kernel version.
When neither are available we know there's no additional handling
needed and we can directly call zvol_revalidate_disk().
Reviewed-by: Rafael Kitover <[email protected]>
Reviewed-by: Coleman Kane <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #11387
Closes #11390
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bd_contains member was removed from the block_device structure.
Callers needing to determine if a vdev is a whole block device should
use the new bdev_whole() wrapper. For older kernels we provide our
own bdev_whole() wrapper which relies on bd_contains for compatibility.
Reviewed-by: Rafael Kitover <[email protected]>
Reviewed-by: Coleman Kane <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #11387
Closes #11390
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The generic IO accounting functions have been removed in favor of the
bio_start_io_acct() and bio_end_io_acct() functions which provide a
better interface. These new functions were introduced in the 5.8
kernels but it wasn't until the 5.11 kernel that the previous generic
IO accounting interfaces were removed.
This commit updates the blk_generic_*_io_acct() wrappers to provide
and interface similar to the updated kernel interface. It's slightly
different because for older kernels we need to pass the request queue
as well as the bio.
Reviewed-by: Rafael Kitover <[email protected]>
Reviewed-by: Coleman Kane <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #11387
Closes #11390
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lookup_bdev() function has been updated to require a dev_t
be passed as the second argument. This is actually pretty nice
since the major number stored in the dev_t was the only part we
were interested in. This allows to us avoid handling the bdev
entirely. The vdev_lookup_bdev() wrapper was updated to emulate
the behavior of the new lookup_bdev() for all supported kernels.
Reviewed-by: Rafael Kitover <[email protected]>
Reviewed-by: Coleman Kane <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #11387
Closes #11390
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update the ZFS_LINUX_TEST_PROGRAM macro to always set the module
license. As of the 5.11 kernel not setting a license has been
converted from a warning to an error.
Reviewed-by: Rafael Kitover <[email protected]>
Reviewed-by: Coleman Kane <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #11387
Closes #11390
|
|
|
|
|
|
|
| |
Replace "is" with "==" and "is not" with "!=".
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #11394
|
|
|
|
|
|
|
|
| |
Increase the Linux-Maximum version in the META file to 5.10.
All of the required compatibility patches have been merged.
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #11391
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Individual transactions may not be larger than DMU_MAX_ACCESS.
This is enforced by the assertions in dmu_tx_hold_write() and
dmu_tx_hold_write_by_dnode(). There's an additional check in
dmu_tx_count_write() however it has no effect and only sets a
local err variable. We could enable this check, however since
it's already enforced by ASSERTs elsewhere I opted to remove it
instead.
Reviewed-by: Matthew Ahrens <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3731
Closes #11384
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this patch, dracut wouldn't find zfs.ko for inclusion in
initramfs. This was caused by the packages installing in to
/lib/modules instead of /usr/lib/modules. Correcting this allows
dracut to do the right thing, even without
# /etc/dracut.conf
add_drivers+=" zfs "
Notably, rpm/redhat/zfs-kmod.spec.in does not contain the definition of
the `prefix` macro that this commit removes in the generic kmod spec.
And https://rpmfusion.org/Packaging/KernelModules/Kmods2 does not
mention `prefix` at all.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Christian Schwarz <[email protected]>
Closes #11381
|
|
|
|
|
|
|
|
| |
Instead of creating issues with type "question"
Forward to the GitHub Discussion system.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Kjeld Schouten-Lebbing <[email protected]>
Closes #11383
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After porting the fix for https://github.com/openzfs/zfs/issues/5295
over to illumos, we started hitting an assertion failure when running
the testsuite:
assertion failed: rc->rc_count == number, file: .../refcount.c
and the unexpected hold has this stack:
dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f
The simplest reproducer for this in illumos is
zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool
which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Andy Fiddaman <[email protected]>
Closes #11368
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The CentOS stream 4.18.0-257 kernel appears to have backported
the Linux 5.9 change to make_request_fn and the associated API.
To maintain weak modules compatibility the original symbol was
retained and the new interface blk_alloc_queue_rh() was added.
Unfortunately, blk_alloc_queue() was replaced in the blkdev.h
header by blk_alloc_queue_bh() so there doesn't seem to be a way
to build new kmods against the old interfces. Even though they
appear to still be available for weak module binding.
To accommodate this a configure check is added for the new _rh()
variant of the function and used if available. If compatibility
code gets added to the kernel for the original blk_alloc_queue()
interface this should be fine. OpenZFS will simply continue to
prefer the new interface and only fallback to blk_alloc_queue()
when blk_alloc_queue_rh() isn't available.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #11374
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 1c2358c12 restructured this code and introduced a warning
about the variable maybe not being initialized. This cannot happen
with the updated code but we should initialize the variable anyway
to silence the warning.
zpl_file.c: In function ‘zpl_iter_write’:
zpl_file.c:324:9: warning: ‘count’ may be used uninitialized
in this function [-Wmaybe-uninitialized]
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #11373
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's no need to call iov_iter_advance() in zpl_iter_read().
This was preserved from the previous code where it wasn't needed
but also didn't cause any problems. Now that the iter functions
also handle pipes that's no longer the case. When fully reading a
pipe buffer iov_iter_advance() may results in the pipe buf release
function being called which will not be registered resulting in
a NULL dereference.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #11375
Closes #11378
|
|
|
|
|
|
|
|
| |
Based on a conversation with Matt on the OpenZFS Slack.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Signed-off-by: Christian Schwarz <[email protected]>
Closes #11370
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 59b68723 added a configure check for 5.10, which removed
revalidate_disk(), and conditionally replaced it's usage with a call to
the new revalidate_disk_size() function. However, the old function also
invoked the device's registered callback, in our case
zvol_revalidate_disk(). This commit adds a call to zvol_revalidate_disk()
in zvol_update_volsize() to make sure the code path stays the same.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Michael D Labriola <[email protected]>
Closes #11358
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Check for the history_event type instead.
The zfs-list-cacher.sh script currently respects the event types
excluded from syslog(!) in ZED_SYSLOG_SUBCLASS_EXCLUDE.
This makes little sense in this single-purpose script and
silently breaks when history_events are excluded from syslog,
which is the default since 13d65987a9d9958de77422f5d9d25b47e486537d.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: InsanePrawn <[email protected]>
Closes #11164
Closes #11347
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As of the 5.10 kernel the generic splice compatibility code has been
removed. All filesystems are now responsible for registering a
->splice_read and ->splice_write callback to support this operation.
The good news is the VFS provided generic_file_splice_read() and
iter_file_splice_write() callbacks can be used provided the ->iter_read
and ->iter_write callback support pipes. However, this is currently
not the case and only iovecs and bvecs (not pipes) are ever attached
to the uio structure.
This commit changes that by allowing full iov_iter structures to be
attached to uios. Ever since the 4.9 kernel the iov_iter structure
has supported iovecs, kvecs, bvevs, and pipes so it's desirable to
pass the entire thing when possible. In conjunction with this the
uio helper functions (i.e uiomove(), uiocopy(), etc) have been
updated to understand the new UIO_ITER type.
Note that using the kernel provided uio_iter interfaces allowed the
existing Linux specific uio handling code to be simplified. When
there's no longer a need to support kernel's older than 4.9, then
it will be possible to remove the iovec and bvec members from the
uio structure and always use a uio_iter. Until then we need to
maintain all of the existing types for older kernels.
Some additional refactoring and cleanup was included in this change:
- Added checks to configure to detect available iov_iter interfaces.
Some are available all the way back to the 3.10 kernel and are used
when available. In particular, uio_prefaultpages() now always uses
iov_iter_fault_in_readable() which is available for all supported
kernels.
- The unused UIO_USERISPACE type has been removed. It is no longer
needed now that the uio_seg enum is platform specific.
- Moved zfs_uio.c from the zcommon.ko module to the Linux specific
platform code for the zfs.ko module. This gets it out of libzfs
where it was never needed and keeps this Linux specific code out
of the common sources.
- Removed unnecessary O_APPEND handling from zfs_iter_write(), this
is redundant and O_APPEND is already handled in zfs_write();
Reviewed-by: Colin Ian King <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #11351
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider the test to be a success as long as the initializing pattern
is found at least once per metaslab. This indicates that at least
part of the free space was initialized. Ideally we'd check that the
pattern was written to all free space but that's much trickier so this
check is a reasonable compromise.
Using a here-string to feed the loop in this test causes an empty
string to still trigger the loop so we miss the `spacemaps=0` case.
Pipe into the loop instead.
While here, we can use `zpool wait -t initialize $TESTPOOL` to wait for
the pool to initialize.
Co-authored-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #11365
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The space in special devices is not included in spa_dspace (or
dsl_pool_adjustedsize(), or the zfs `available` property). Therefore
there is always at least as much free space in the normal class, as
there is allocated in the special class(es). And therefore, there is
always enough free space to remove a special device.
However, the checks for free space when removing special devices did not
take this into account. This commit corrects that.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #11329
|
|
|
|
|
|
|
|
|
|
|
| |
After e357046 it should not be necessary to periodically update ARC
kstats and tunables. Tunable updates are applied when modified, and
kstats are updated on demand.
Update kstats in `arc_evict_cb_check()` for `ZFS_DEBUG` builds only.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #11237
|
|
|
|
|
|
|
|
| |
Use the correct return type for getopt otherwise clang complains
about tautological-constant-out-of-range-compare.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Sterling Jensen <[email protected]>
Closes #11359
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On a system with very high fragmentation, we may need to do lots of gang
allocations (e.g. most indirect block allocations (~50KB) may need to
gang). Before failing a "normal" allocation and resorting to ganging, we
try every metaslab. This has the impact of loading every metaslab (not
a huge deal since we now typically keep all metaslabs loaded), and also
iterating over every metaslab for every failing allocation. If there are
many metaslabs (more than the typical ~200, e.g. due to vdev expansion
or very large vdevs), the CPU cost of this iteration can be very
impactful. This iteration is done with the mg_lock held, creating long
hold times and high lock contention for concurrent allocations,
ultimately causing long txg sync times and poor application performance.
To address this, this commit changes the behavior of "normal" (not
try_hard, not ZIL) allocations. These will now only examine the 100
best metaslabs (as determined by their ms_weight). If none of these
have a large enough free segment, then the allocation will fail and
we'll fall back on ganging.
To accomplish this, we will now (normally) gang before doing a
`try_hard` allocation. Non-try_hard allocations will only examine the
100 best metaslabs of each vdev. In summary, we will first try normal
allocation. If that fails then we will do a gang allocation. If that
fails then we will do a "try hard" gang allocation. If that fails then
we will have a multi-layer gang block.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #11327
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Metaslab rotor and aliquot are used to distribute workload between
vdevs while keeping some locality for logically adjacent blocks. Once
multiple allocators were introduced to separate allocation of different
objects it does not make much sense for different allocators to write
into different metaslabs of the same metaslab group (vdev) same time,
competing for its resources. This change makes each allocator choose
metaslab group independently, colliding with others only sporadically.
Test including simultaneous write into 4 files with recordsize of 4KB
on a striped pool of 30 disks on a system with 40 logical cores show
reduction of vdev queue lock contention from 54 to 27% due to better
load distribution. Unfortunately it won't help much ZVOLs yet since
only one dataset/ZVOL is synced at a time, and so for the most part
only one allocator is used, but it may improve later.
While there, to reduce the number of pointer dereferences change
per-allocator storage for metaslab classes and groups from several
separate malloc()'s to variable length arrays at the ends of the
original class and group structures.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes #11288
|