| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The normal lock order is that the dp_config_rwlock must be held before
the ds_opening_lock. For example, dmu_objset_hold() does this.
However, dmu_objset_open_impl() is called with the ds_opening_lock held,
and if the dp_config_rwlock is not already held, it will attempt to
acquire it. This may lead to deadlock, since the lock order is
reversed.
Looking at all the callers of dmu_objset_open_impl() (which is
principally the callers of dmu_objset_from_ds()), almost all callers
already have the dp_config_rwlock. However, there are a few places in
the send and receive code paths that do not. For example:
dsl_crypto_populate_key_nvlist, send_cb, dmu_recv_stream,
receive_write_byref, redact_traverse_thread.
This commit resolves the problem by requiring all callers ot
dmu_objset_from_ds() to hold the dp_config_rwlock. In most cases, the
code has been restructured such that we call dmu_objset_from_ds()
earlier on in the send and receive processes, when we already have the
dp_config_rwlock, and save the objset_t until we need it in the middle
of the send or receive (similar to what we already do with the
dsl_dataset_t). Thus we do not need to acquire the dp_config_rwlock in
many new places.
I also cleaned up code in dmu_redact_snap() and send_traverse_thread().
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Paul Zuchowski <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #9662
Closes #10115
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
__zio_execute() calls zio_taskq_member() to determine if we are running
in a zio interrupt taskq, in which case we may need to switch to
processing this zio in a zio issue taskq. The call to
zio_taskq_member() can become a performance bottleneck when we are
processing a high rate of zio's.
zio_taskq_member() calls taskq_member() on each of the zio interrupt
taskqs, of which there are 21. This is slow because each call to
taskq_member() does tsd_get(taskq_tsd), which on Linux is relatively
slow.
This commit improves the performance of zio_taskq_member() by having it
cache the value of tsd_get(taskq_tsd), reducing the number of those
calls to 1/21th of the current behavior.
In a test case running `zfs send -c >/dev/null` of a filesystem with
small blocks (average 2.5KB/block), zio_taskq_member() was using 6.7% of
one CPU, and with this change it is reduced to 1.3%. Overall time to
perform the `zfs send` reduced by 10% (~150,000 block/sec to ~165,000
blocks/sec).
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #10070
|
|
|
|
|
|
|
|
|
|
|
| |
This function should only return "linux" on Linux.
Move the kernel part of the function out of common code.
Fix the tests for FreeBSD.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10079
|
|
|
|
|
|
|
|
|
|
| |
By adding a zfs_file_private accessor to the common
interfaces and some extensions to FreeBSD platform
code it is now possible to share the implementations
for the aforementioned functions.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #10073
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit https://github.com/torvalds/linux/commit/9e8d42a0f accidentally
converted the static inline function blkg_tryget() to GPL-only for
kernels built with CONFIG_PREEMPT_RCU=y and CONFIG_BLK_CGROUP=y.
Resolve the build issue by providing our own equivalent functionality
when needed which uses rcu_read_lock_sched() internally as before.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9745
Closes #10072
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of the Linux kernel's y2038 changes the time_t type has been
fully retired. Callers are now required to use the time64_t type.
Rather than move to the new type, I've removed the few remaining
places where a time_t is used in the kernel code. They've been
replaced with a uint64_t which is already how ZFS internally
handled these values.
Going forward we should work towards updating the remaining user
space time_t consumers to the 64-bit interfaces.
Reviewed-by: Matthew Macy <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #10052
Closes #10064
|
|
|
|
|
|
|
|
|
|
|
| |
* Add dedicated donde_set_dirtyctx routine.
* Add empty dirty record on destroy assertion.
* Make much more extensive use of the SET_ERROR macro.
Reviewed-by: Will Andrews <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9924
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `convoff` function is called only in one code path in `zfs_space`.
Each caller of `zfs_space` is called with a `flock64_t` that has
`l_whence` set to `SEEK_SET`. This means that `convoff` always results
in a no-op as the `bfp` parameter has `l_whence` set to `SEEK_SET` and
`int whence` is `SEEK_SET` as well.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Dirkjan Bussink <[email protected]>
Closes #10006
|
|
|
|
|
|
|
|
|
|
|
| |
Unlinked files don't respect synchronous flush commands, but when they get relinked
their state is unknown. Previously we force flushed all such files even when
sync=disabled. Correct this case.
Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: DHE <[email protected]>
Closes #10005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When growing the size of a (VMEM or KVMEM) kmem cache, spl_cache_grow()
always does taskq_dispatch(spl_cache_grow_work), and then waits for the
KMC_BIT_GROWING to be cleared by the taskq thread.
The taskq thread (spl_cache_grow_work()) does:
1. allocate new slab and add to list
2. wake_up_all(skc_waitq)
3. clear_bit(KMC_BIT_GROWING)
Therefore, the waiting thread can wake up before GROWING has been
cleared. It will see that the growing has not yet completed, and go
back to sleep until it hits the 100ms timeout.
This can have an extreme performance impact on workloads that alloc/free
more than fits in the (statically-sized) magazines. These workloads
allocate and free slabs with high frequency.
The problem can be observed with `funclatency spl_cache_grow`, which on
some workloads shows that 99.5% of the time it takes <64us to allocate
slabs, but we spend ~70% of our time in outliers, waiting for the 100ms
timeout.
The fix is to do `clear_bit(KMC_BIT_GROWING)` before
`wake_up_all(skc_waitq)`.
A future investigation should evaluate if we still actually need to
taskq_dispatch() at all, and if so on which kernel versions.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Pavel Zakharov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #9989
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to do the same thing to update all spas on any OS for these
tunables, so let's share the code.
While here let's match the types of the literals initializing the
variables with the type of the variable.
Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9964
|
|
|
|
|
|
|
|
|
|
|
|
| |
The timestamp_truncate() function was added, it replaces the existing
timespec64_trunc() function. This change renames our wrapper function
to be consistent with the upstream name and updates the compatibility
code for older kernels accordingly.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9956
Closes #9961
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The proc_ops structure was introduced to replace the use of of the
file_operations structure when registering proc handlers. This
change creates a new kstat_proc_op_t typedef for compatibility
which can be used to pass around the correct structure.
This change additionally adds the 'const' keyword to all of the
existing proc operations structures.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9961
|
|
|
|
|
|
|
| |
This solves the issue of loading the spl module on RISC-V.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Romain Dolbeau <[email protected]>
Closes #9942
|
|
|
|
|
|
|
|
|
|
|
|
| |
It violated sequence described in kstat.h, and at least on FreeBSD
kstat_install() uses provided names to create the sysctls. If the
names are not available at the time, it ends up bad.
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored-By: iXsystems, Inc.
Closes #9933
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we finish a zfs receive, dmu_recv_end_sync() calls
zvol_create_minors(async=TRUE). This kicks off some other threads that
create the minor device nodes (in /dev/zvol/poolname/...). These async
threads call zvol_prefetch_minors_impl() and zvol_create_minor(), which
both call dmu_objset_own(), which puts a "long hold" on the dataset.
Since the zvol minor node creation is asynchronous, this can happen
after the `ZFS_IOC_RECV[_NEW]` ioctl and `zfs receive` process have
completed.
After the first receive ioctl has completed, userland may attempt to do
another receive into the same dataset (e.g. the next incremental
stream). This second receive and the asynchronous minor node creation
can interfere with one another in several different ways, because they
both require exclusive access to the dataset:
1. When the second receive is finishing up, dmu_recv_end_check() does
dsl_dataset_handoff_check(), which can fail with EBUSY if the async
minor node creation already has a "long hold" on this dataset. This
causes the 2nd receive to fail.
2. The async udev rule can fail if zvol_id and/or systemd-udevd try to
open the device while the the second receive's async attempt at minor
node creation owns the dataset (via zvol_prefetch_minors_impl). This
causes the minor node (/dev/zd*) to exist, but the udev-generated
/dev/zvol/... to not exist.
3. The async minor node creation can silently fail with EBUSY if the
first receive's zvol_create_minor() trys to own the dataset while the
second receive's zvol_prefetch_minors_impl already owns the dataset.
To address these problems, this change synchronously creates the minor
node. To avoid the lock ordering problems that the asynchrony was
introduced to fix (see #3681), we create the minor nodes from open
context, with no locks held, rather than from syncing contex as was
originally done.
Implementation notes:
We generally do not need to traverse children or prefetch anything (e.g.
when running the recv, snapshot, create, or clone subcommands of zfs).
We only need recursion when importing/opening a pool and when loading
encryption keys. The existing recursive, asynchronous, prefetching code
is preserved for use in these cases.
Channel programs may need to create zvol minor nodes, when creating a
snapshot of a zvol with the snapdev property set. We figure out what
snapshots are created when running the LUA program in syncing context.
In this case we need to remember what snapshots were created, and then
try to create their minor nodes from open context, after the LUA code
has completed.
There are additional zvol use cases that asynchronously own the dataset,
which can cause similar problems. E.g. changing the volmode or snapdev
properties. These are less problematic because they are not recursive
and don't touch datasets that are not involved in the operation, there
is still potential for interference with subsequent operations. In the
future, these cases should be similarly converted to create the zvol
minor node synchronously from open context.
The async tasks of removing and renaming minors do not own the objset,
so they do not have this problem. However, it may make sense to also
convert these operations to happen synchronously from open context, in
the future.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Prakash Surya <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
External-issue: DLPX-65948
Closes #7863
Closes #9885
|
|
|
|
|
|
|
|
| |
With recent SPL changes there is no longer any need for a per
platform version.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9860
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Over the years several slightly different approaches were used
in the Makefiles to determine the target architecture. This
change updates both the build system and Makefile to handle
this in a consistent fashion.
TARGET_CPU is set to i386, x86_64, powerpc, aarch6 or sparc64
and made available in the Makefiles to be used as appropriate.
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9848
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On kernels with KASAN enabled the following failure can be observed as
soon as the zfs module is loaded:
VERIFY(IS_P2ALIGNED(ptr, PAGE_SIZE)) failed
PANIC at spl-kmem-cache.c:228:kv_alloc()
The problem is kmalloc() has never guaranteed aligned allocations; this
requirement resulted in zfsonlinux/spl@8b45dda which removed all
kmalloc() usage in kv_alloc().
Until a GFP_ALIGNED flag (or equivalent functionality) is provided by
the kernel this commit partially reverts 66955885 and 6d948c35 to
prevent k(v)malloc() allocations in kv_alloc().
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Michael Niewöhner <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #9813
|
|
|
|
|
|
|
|
|
|
|
| |
Update the project website links contained in to repository to
reference the secure https://zfsonlinux.org address.
Reviewed-By: Richard Laager <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Garrett Fields <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9837
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When qat_compress() fails to allocate the required contiguous memory
it mistakenly returns success. This prevents the fallback software
compression from taking over and (un)compressing the block.
Resolve the issue by correctly setting the local 'status' variable
on all exit paths. Furthermore, initialize it to CPA_STATUS_FAIL
to ensure qat_compress() always fails safe to guard against any
similar bugs in the future.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9784
Closes #9788
|
|
|
|
|
|
|
|
|
| |
As of cppcheck 1.82 surpress the warning regarding shifting too many
bits for __divdi3() implemention. The algorithm used here is correct.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9732
|
|
|
|
|
|
|
|
|
|
| |
As of cppcheck 1.82 warnings are issued when using the list_for_each_*
functions with an uninitialized variable. Functionally, this is fine
but to resolve the warning initialize these variables.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9732
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Apply umask to `mode` which will eventually be applied to inode.
This is needed since VFS doesn't apply umask for O_TMPFILE files.
(Note that zpl_init_acl() applies `ip->i_mode &= ~current_umask();`
only when POSIX ACL is used.)
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8997
Closes #8998
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FreeBSD's vfs currently doesn't permit file systems
to do their own locking. To avoid having to have
duplicate zfs functions with and without locking add
locking here. With luck these changes can be removed
in the future.
Reviewed-by: Sean Eric Fagan <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9715
|
|
|
|
|
|
|
|
|
|
|
| |
The quota functions are common to all implementations and can be
moved to common code. As a simplification they were moved to the
Linux platform code in the initial refactoring.
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9710
|
|
|
|
|
|
|
|
|
|
| |
Change many of the znops routines to take a znode rather
than an inode so that zfs_replay code can be largely shared
and in the future the much of the znops code may be shared.
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9708
|
|
|
|
|
|
|
|
| |
The zfsvfs->z_sb field is Linux specified and should be abstracted.
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9697
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update the vdev_disk_open() retry logic to use a specified number
of milliseconds to be more robust. Additionally, on failure log
both the time waited and requested timeout to the internal log.
The default maximum allowed open retry time has been increased
from 500ms to 1000ms.
Reviewed-by: Kjeld Schouten <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9680
|
|
|
|
|
|
|
|
|
| |
Update zfs_deadman_failmode to use the ZFS_MODULE_PARAM_CALL
wrapper, and split the common and platform specific portions.
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9670
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the ASSERTV macro and handle suppressing unused
compiler warnings for variables only in ASSERTs using the
__attribute__((unused)) compiler annotation. The annotation
is understood by both gcc and clang.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9671
|
|
|
|
|
|
|
|
|
|
|
| |
FreeBSD needs to cope with multiple version of the zfs_cmd_t
structure. Allowing the platform code to pre and post
process the cmd structure makes it possible to work with
legacy tooling.
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9624
|
|
|
|
|
|
|
|
|
| |
fallocate(2) is a Linux-specific system call which in unavailable
on other platforms.
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9633
|
|
|
|
|
|
|
|
|
|
| |
FreeBSD needs to be able to pass the jail id to the jail/unjail ioctls
and the struct file in the device structure is unused.
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9625
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As described in commit f81d5ef6 the zfs_vdev_elevator module
option is being removed. Users who require this functionality
should update their systems to set the disk scheduler using a
udev rule.
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: loli10K <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #8664
Closes #9417
Closes #9609
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The changes in commit 41e1aa2a / PR #9583 introduced a regression on
tmpfile_001_pos: fsetxattr() on a O_TMPFILE file descriptor started
to fail with errno ENODATA:
openat(AT_FDCWD, "/test", O_RDWR|O_TMPFILE, 0666) = 3
<...>
fsetxattr(3, "user.test", <...>, 64, 0) = -1 ENODATA
The originally proposed change on PR #9583 is not susceptible to it,
so just move the code/if-checks around back in that way, to fix it.
Reviewed-by: Pavel Snajdr <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Original-patch-by: Heitor Alves de Siqueira <[email protected]>
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Closes #9602
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide a common zfs_file_* interface which can be implemented on all
platforms to perform normal file access from either the kernel module
or the libzpool library.
This allows all non-portable vnode_t usage in the common code to be
replaced by the new portable zfs_file_t. The associated vnode and
kobj compatibility functions, types, and macros have been removed
from the SPL. Moving forward, vnodes should only be used in platform
specific code when provided by the native operating system.
Reviewed-by: Sean Eric Fagan <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9556
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reinstate the zpl_revalidate() functionality to resolve a regression
where dentries for open files during a rollback are not invalidated.
The unrelated functionality for automatically unmounting .zfs/snapshots
was not reverted. Nor was the addition of shrink_dcache_sb() to the
zfs_resume_fs() function.
This issue was not immediately caught by the CI because the test case
intended to catch it was included in the list of ZTS tests which may
occasionally fail for unrelated reasons. Remove all of the rollback
tests from this list to help identify the frequency of any spurious
failures.
The rollback_003_pos.ksh test case exposes a real issue with the
long standing code which needs to be investigated. Regardless,
it has been enable with a small workaround in the test case itself.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Pavel Snajdr <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9587
Closes #9592
|
|
|
|
|
|
|
|
|
|
|
| |
If zp->z_unlinked is set, we're working with a znode that has been
marked for deletion. If that's the case, we can skip the "goto again"
loop and return ENOENT, as the znode should not be discovered.
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Heitor Alves de Siqueira <[email protected]>
Closes #9583
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
blkg_tryget() as shipped in EL8 kernels does not seem to handle NULL
@blkg as input; this is different from its mainline counterpart where
NULL is accepted. To prevent dereferencing a NULL pointer when dealing
with block devices which do not set a root_blkg on the request queue
perform the NULL check in vdev_bio_associate_blkg().
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #9546
Closes #9577
|
|
|
|
|
|
|
|
|
|
|
| |
Check for __GFP_RECLAIM instead of GFP_KERNEL because zfs modifies
IO and FS flags which breaks the check for GFP_KERNEL.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Sebastian Gottschall <[email protected]>
Signed-off-by: Michael Niewöhner <[email protected]>
Closes #9034
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new KMC_KVMEM flag was added to enforce use of the
kvmalloc allocator in kmem_cache_create even for large blocks, which
may also increase performance in some specific cases (e.g. zstd), too.
Default to KVMEM instead of VMEM in spl_kmem_cache_create.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Sebastian Gottschall <[email protected]>
Signed-off-by: Michael Niewöhner <[email protected]>
Closes #9034
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements use of kvmalloc for GFP_KERNEL allocations, which
may increase performance if the allocator is able to allocate physical
memory, if kvmalloc is available as a public kernel interface (since
v4.12). Otherwise it will simply fall back to virtual memory (vmalloc).
Also fix vmem_alloc implementation which can lead to slow allocations
since the first attempt with kmalloc does not make use of the noretry
flag but tells the linux kernel to retry several times before it fails.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Sebastian Gottschall <[email protected]>
Signed-off-by: Michael Niewöhner <[email protected]>
Closes #9034
|
|
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Sebastian Gottschall <[email protected]>
Signed-off-by: Michael Niewöhner <[email protected]>
Closes #9034
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Increase the minimum supported kernel version from 2.6.32 to 3.10.
This removes support for the following Linux enterprise distributions.
Distribution | Kernel | End of Life
---------------- | ------ | -------------
Ubuntu 12.04 LTS | 3.2 | Apr 28, 2017
SLES 11 | 3.0 | Mar 32, 2019
RHEL / CentOS 6 | 2.6.32 | Nov 30, 2020
The following changes were made as part of removing support.
* Updated `configure` to enforce a minimum kernel version as
specified in the META file (Linux-Minimum: 3.10).
configure: error:
*** Cannot build against kernel version 2.6.32.
*** The minimum supported kernel version is 3.10.
* Removed all `configure` kABI checks and matching C code for
interfaces which solely predate the Linux 3.10 kernel.
* Updated all `configure` kABI checks to fail when an interface is
missing which was in the 3.10 kernel up to the latest 5.1 kernel.
Removed the HAVE_* preprocessor defines for these checks and
updated the code to unconditionally use the verified interface.
* Inverted the detection logic in several kABI checks to match
the new interface as it appears in 3.10 and newer and not the
legacy interface.
* Consolidated the following checks in to individual files. Due
the large number of changes in the checks it made sense to handle
this now. It would be desirable to group other related checks in
the same fashion, but this as left as future work.
- config/kernel-blkdev.m4 - Block device kABI checks
- config/kernel-blk-queue.m4 - Block queue kABI checks
- config/kernel-bio.m4 - Bio interface kABI checks
* Removed the kABI checks for sops->nr_cached_objects() and
sops->free_cached_objects(). These interfaces are currently unused.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9566
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes the need for zpl_revalidate altogether.
There were 3 main reasons why we used d_revalidate:
1. periodic automounted snapshots umount deferral
2. negative dentries created before snapshot rollback
3. stale inodes referenced by dentry cache after snapshot rollback
Periodic snapshots deferral solution introduces zfs_exit_fs function,
which is called as a part of ZFS_EXIT(zfsvfs_t) macro.
Negative dentries and stale inodes are solved by flushing the dcache
for the particular dataset on zfs_resume_fs call.
This patch also removes now unused HAVE_S_D_OP configure test.
Reviewed-by: Aleksa Sarai <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Pavel Snajdr <[email protected]>
Closes #8774
Closes #9549
|
|
|
|
|
|
|
|
|
| |
Some of the znode fields are different and functions
consuming an inode don't exist on FreeBSD.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9536
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds some new DTRACE_PROBE* endpoints so that we can observe taskq
latencies on a system. Additionally, a new "taskqlatency.bt" script is
added to do this observation via "bpftrace". Lastly, a "zfs-trace.sh"
script is added to wrap "bpftrace" with the proper options required to
run and use "taskqlatency.bt".
For example, with these changes in place, a user can run the following:
$ cd ./contrib/bpftrace
$ sudo ./zfs-trace.sh taskqlatency.bt
Attaching 6 probes...
^C
Here's some example output, showing latency information for time spent
executing the taskq entry's function:
@exec_lat_us[dp_sync_taskq, userquota_updates_task]:
[2, 4) 5 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4, 8) 0 | |
[8, 16) 1 |@@@@@@@@@@ |
[16, 32) 2 |@@@@@@@@@@@@@@@@@@@@ |
@exec_lat_us[z_wr_int_h, zio_execute]:
[8, 16) 16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16, 32) 2 |@@@@@@ |
@exec_lat_us[z_wr_iss_h, zio_execute]:
[16, 32) 4 |@@@@@@@@@@@@@@@@ |
[32, 64) 13 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64, 128) 1 |@@@@ |
@exec_lat_us[z_ioctl_int, zio_execute]:
[2, 4) 1 |@@@@ |
[4, 8) 11 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8, 16) 8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
@exec_lat_us[dp_sync_taskq, sync_dnodes_task]:
[2, 4) 1 |@@@@@@ |
[4, 8) 7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[8, 16) 8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16, 32) 2 |@@@@@@@@@@@@@ |
[32, 64) 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[64, 128) 1 |@@@@@@ |
[128, 256) 0 | |
[256, 512) 1 |@@@@@@
Here's some example output, showing latency information for time spent
waiting on the taskq, prior to starting execution of entry's function:
@queue_lat_us[dp_sync_taskq]:
[2, 4) 1 |@@@@ |
[4, 8) 7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[8, 16) 2 |@@@@@@@@ |
[16, 32) 3 |@@@@@@@@@@@@@ |
[32, 64) 12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64, 128) 6 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[128, 256) 0 | |
[256, 512) 1 |@@@@ |
@queue_lat_us[z_wr_iss]:
[4, 8) 4 |@@@@ |
[8, 16) 13 |@@@@@@@@@@@@@@@ |
[16, 32) 6 |@@@@@@@ |
[32, 64) 2 |@@ |
[64, 128) 12 |@@@@@@@@@@@@@@ |
[128, 256) 15 |@@@@@@@@@@@@@@@@@@ |
[256, 512) 33 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[512, 1K) 27 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[1K, 2K) 7 |@@@@@@@@ |
[2K, 4K) 14 |@@@@@@@@@@@@@@@@ |
[4K, 8K) 14 |@@@@@@@@@@@@@@@@ |
[8K, 16K) 23 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[16K, 32K) 43 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@queue_lat_us[z_wr_int]:
[2, 4) 10 |@@@@@ |
[4, 8) 71 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[8, 16) 88 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16, 32) 50 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[32, 64) 65 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[64, 128) 43 |@@@@@@@@@@@@@@@@@@@@@@@@@ |
[128, 256) 19 |@@@@@@@@@@@ |
[256, 512) 3 |@ |
[512, 1K) 1 | |
Reviewed by: Brad Lewis <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Prakash Surya <[email protected]>
Closes #9525
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change modifies some of the infrastructure for enabling the use of
the DTRACE_PROBE* macros, such that we can use tehm in the "spl" module.
Currently, when the DTRACE_PROBE* macros are used, they get expanded to
create new functions, and these dynamically generated functions become
part of the "zfs" module.
Since the "spl" module does not depend on the "zfs" module, the use of
DTRACE_PROBE* in the "spl" module would result in undefined symbols
being used in the "spl" module. Specifically, DTRACE_PROBE* would turn
into a function call, and the function being called would be a symbol
only contained in the "zfs" module; which results in a linker and/or
runtime error.
Thus, this change adds the necessary logic to the "spl" module, to
mirror the tracing functionality available to the "zfs" module. After
this change, we'll have a "trace_zfs.h" header file which defines the
probes available only to the "zfs" module, and a "trace_spl.h" header
file which defines the probes available only to the "spl" module.
Reviewed by: Brad Lewis <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Prakash Surya <[email protected]>
Closes #9525
|
|
|
|
|
|
|
|
|
|
| |
MODULE_VERSION is already defined on FreeBSD. Wrap all of the
used MODULE_* macros for the sake of consistency and portability.
Add a user space noop version to reduce the need for _KERNEL ifdefs.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9542
|