| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The steps to reproduce the problem:
mdconfig -a -t swap -s 3g -u 0
gpart create -s GPT md0
gpart add -t freebsd-zfs -s 1g md0
zpool create -o autoexpand=on foo md0p1
gpart resize -i 1 -s 2g md0
Authored by: pjd <[email protected]>
FreeBSD-commit: freebsd/freebsd@bccd2db598ede073d6d06781a5fd3b119c08aa81
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Ported-by: Ryan Moeller <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10270
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sync up with the following changes from FreeBSD:
ZFS: add emulation of atomic_swap_64 and atomic_load_64
Some 32-bit platforms do not provide 64-bit atomic operations that ZFS
requires, either in userland or at all. We emulate those operations
for those platforms using a mutex. That is not entirely correct and
it's very efficient. Besides, the loads are plain loads, so torn
values are possible.
Nevertheless, the emulation seems to work for some definition of work.
This change adds atomic_swap_64, which is already used in ZFS code,
and atomic_load_64 that can be used to prevent torn reads.
Authored by: avg <[email protected]>
FreeBSD-commit: freebsd/freebsd@3458e5d1e6354123ec2b0953d29f98126aa442e
cleanup of illumos compatibility atomics
atomic_cas_32 is implemented using atomic_fcmpset_32 on all platforms.
Ditto for atomic_cas_64 and atomic_fcmpset_64 on platforms that have
it. The only exception is sparc64 that provides MD atomic_cas_32 and
atomic_cas_64.
This is slightly inefficient as fcmpset reports whether the operation
updated the target and that information is not needed for cas.
Nevertheless, there is less code to maintain and to add for new
platforms. Also, the operations are done inline now as opposed to
function calls before.
atomic_add_64_nv is implemented using atomic_fetchadd_64 on platforms
that provide it.
casptr, cas32, atomic_or_8, atomic_or_8_nv are completely removed as
they have no users.
atomic_mtx that is used to emulate 64-bit atomics on platforms that
lack them is defined only on those platforms.
As a result, platform specific opensolaris_atomic.S files have lost
most of their code. The only exception is i386 where the
compat+contrib code provides 64-bit atomics for userland use. That
code assumes availability of cmpxchg8b instruction. FreeBSD does not
have that assumption for i386 userland and does not provide 64-bit
atomics. Hopefully, this can and will be fixed.
Authored by: avg <[email protected]>
FreeBSD-commit: freebsd/freebsd@e9642c209b4413f6afb41d3b2607c51d80a1a34
emulate illumos membar_producer with atomic_thread_fence_rel
membar_producer is supposed to be a store-store barrier.
Also, in the code that FreeBSD has ported from illumos membar_producer
is used only with regular stores to regular memory (with respect to
caching).
We do not have an MI primitive for the store-store barrier, so
atomic_thread_fence_rel is the closest we have as it provides
(load | store) -> store barrier.
Previously, membar_producer was an empty function call on all 32-bit
arm-s, 32-bit powerpc, riscv and all mips variants. I think that it
was inadequate.
On other platforms, such as amd64, arm64, i386, powerpc64, sparc64,
membar_producer was implemented using stronger primitives than required
for a store-store barrier with respect to regular memory access.
For example, it used sfence on amd64 and lock-ed nop in i386 (despite
TSO).
On powerpc64 we now use recommended lwsync instead of eieio.
On sparc64 FreeBSD uses TSO mode.
On arm64/aarch64 we now use dmb sy instead of dmb ish. Not sure if
this is an improvement, actually.
After this change we can drop opensolaris_atomic.S for aarch64, amd64,
powerpc64 and sparc64 as all required atomic operations have either
direct or light-weight mapping to FreeBSD native atomic operations.
Discussed with: kib
Authored by: avg <[email protected]>
FreeBSD-commit: freebsd/freebsd@50cdda62fced8d21e45858e01dc375a10f1749e
fix up r353340, don't assume that fcmpset has strong semantics
fcmpset can have two kinds of semantics, weak and strong.
For practical purposes, strong semantics means that if fcmpset fails
then the reported current value is always different from the expected
value. Weak semantics means that the reported current value may be the
same as the expected value even though fcmpset failed. That's a so
called "sporadic" failure.
I originally implemented atomic_cas expecting strong semantics, but
many platforms actually have weak one.
Reported by: pkubaj (not confirmed if same issue)
Discussed with: kib, mjg
Authored by: avg <[email protected]>
FreeBSD-commit: freebsd/freebsd@238787c74e737e271f17330fbad900acc35651c
[PowerPC] [MIPS] Implement 32-bit kernel emulation of atomic64 operations
This is a lock-based emulation of 64-bit atomics for kernel use, split off
from an earlier patch by jhibbits.
This is needed to unblock future improvements that reduce the need for
locking on 64-bit platforms by using atomic updates.
The implementation allows for future integration with userland atomic64,
but as that implies going through sysarch for every use, the current
status quo of userland doing its own locking may be for the best.
Submitted by: jhibbits (original patch), kevans (mips bits)
Reviewed by: jhibbits, jeff, kevans
Authored by: bdragon <[email protected]>
Differential Revision: https://reviews.freebsd.org/D22976
FreeBSD-commit: freebsd/freebsd@db39dab3a896b3d98e588736e9a2b4ddaeb31f1
Remove sparc64 kernel support
Remove all sparc64 specific files
Remove all sparc64 ifdefs
Removee indireeect sparc64 ifdefs
Authored by: imp <[email protected]>
FreeBSD-commit: freebsd/freebsd@48b94864c51253da92e4444f0074eec36ef391f
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Ported-by: Ryan Moeller <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10250
|
|
|
|
|
|
|
|
|
|
|
| |
The verify_pool function should detect checksum errors on any vdev, but
it was only checking at the root of the pool.
Accumulate the errors for all vdevs to obtain the correct count.
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10271
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Running zdb -l $disk shows a warning that zfs_arc_max is being ignored.
zdb sets zfs_arc_max below zfs_arc_min, which causes the value to be
ignored by arc_tuning_update().
Set zfs_arc_min to the bare minimum in zdb, which is below zfs_arc_max.
Reviewed by: Brian Behlendorf <[email protected]>
Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10269
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
reflect delete permissions for ACLs
Authored by: Kevin Crowe <[email protected]>
Reviewed by: Gordon Ross <[email protected]>
Reviewed by: Yuri Pankov <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Approved by: Richard Lowe <[email protected]>
Ported-by: Paul B. Henson <[email protected]>
Porting Notes:
* Only comments are updated
OpenZFS-issue: https://www.illumos.org/issues/6765
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/da412744bc
Closes #10266
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- and some additional considerations
Authored by: Kevin Crowe <[email protected]>
Reviewed by: Gordon Ross <[email protected]>
Reviewed by: Yuri Pankov <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Approved by: Richard Lowe <[email protected]>
Ported-by: Paul B. Henson <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/6762
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1eb4e906ec
Closes #10266
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Dominik Hassler <[email protected]>
Reviewed by: Sam Zaydel <[email protected]>
Reviewed by: Paul B. Henson <[email protected]>
Reviewed by: Prakash Surya <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Approved by: Matthew Ahrens <[email protected]>
Ported-by: Paul B. Henson <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/8984
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/e9bacc6d1a
Closes #10266
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
with aclmode=passthrough
Authored by: Albert Lee <[email protected]>
Reviewed by: Gordon Ross <[email protected]>
Reviewed by: Yuri Pankov <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Approved by: Richard Lowe <[email protected]>
Ported-by: Paul B. Henson <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/6764
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/de0f1ddb59
Closes #10266
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored-by: Paul B. Henson <[email protected]>
Reviewed by: Albert Lee <[email protected]>
Reviewed by: Gordon Ross <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Approved by: Richard Lowe <[email protected]>
Ported-by: Paul B. Henson <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/3254
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/71dbfc287c
Closes #10266
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
masking "deny" ACL entries OpenZFS 279 - Bug in the new ACL (post-PSARC/2010/029) semantics
Porting notes:
* Updated zfs_acl_chmod to take 'boolean_t isdir' as first parameter
rather than 'zfsvfs_t *zfsvfs'
* zfs man pages changes mixed between zfs and new zfsprops man pages
Reviewed by: Aram Hvrneanu <[email protected]>
Reviewed by: Gordon Ross <[email protected]>
Reviewed by: Robert Gordon <[email protected]>
Reviewed by: [email protected]
Reviewed by: Brian Behlendorf <[email protected]>
Approved by: Garrett D'Amore <[email protected]>
Ported-by: Paul B. Henson <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/742
OpenZFS-issue: https://www.illumos.org/issues/664
OpenZFS-issue: https://www.illumos.org/issues/279
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/a3c49ce110
Closes #10266
|
|
|
|
|
|
|
|
|
| |
The 'zfs load-key' command was broken for 'keyformat=passphrase'.
Use the correct output vars when stdin is an interactive terminal.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: adam moss <[email protected]>
Closes #10264
Closes #10265
|
|
|
|
|
|
|
|
|
|
|
| |
When a Thumb-2 kernel is being used, then longjmp must be implemented
using the Thumb-2 instruction set in module/lua/setjmp/setjmp_arm.S.
Original-patch-by: @jsrlabs
Reviewed-by: @awehrfritz
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7408
Closes #9957
Closes #9967
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every platform has their own preferred methods for implementing URI
schemes beyond the currently supported file scheme (e.g. 'https' on
FreeBSD would likely use libfetch, while Linux distros and illumos
would probably use libcurl, etc). It would be helpful if libzfs can
be extended to support additional schemes in a simple manner.
A table of (scheme, handler_function) pairs is added to libzfs_crypto.c,
and the existing functions in libzfs_crypto.c so that when the key
format is ZFS_KEYFORMAT_URI, the scheme from the URI string is
extracted, and a matching handler it located in the aforementioned
table (returning an error if no matching handler is found). The handler
function is then invoked to retrieve the key material (in the format
specified by the keyformat property) and the key is loaded or the
handler can return an error to abort the key loading process.
Reviewed by: Sean Eric Fagan <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jason King <[email protected]>
Closes #10218
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Signed-off-by: sara hartse <[email protected]>
Closes #10243
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise when running with reference_tracking_enable=TRUE mounting
and unmounting an encrypted dataset panics with:
Call Trace:
dump_stack+0x66/0x90
slab_err+0xcd/0xf2
? __kmalloc+0x174/0x260
? __kmem_cache_shutdown+0x158/0x240
__kmem_cache_shutdown.cold+0x1d/0x115
shutdown_cache+0x11/0x140
kmem_cache_destroy+0x210/0x230
spl_kmem_cache_destroy+0x122/0x3e0 [spl]
zfs_refcount_fini+0x11/0x20 [zfs]
spa_fini+0x4b/0x120 [zfs]
zfs_kmod_fini+0x6b/0xa0 [zfs]
_fini+0xa/0x68c [zfs]
__x64_sys_delete_module+0x19c/0x2b0
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reviewed-By: Brian Behlendorf <[email protected]>
Reviewed-By: Tom Caputi <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes #10246
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zlib_inflateEnd was accidentally a wrapper for inflateInit instead of
inflateEnd, and hilarity ensues.
Fix the typo so we free memory instead of allocating more.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10225
Closes #10252
|
|
|
|
|
|
|
|
|
|
|
| |
Round up the volume size requested in `zfs create -V size` to the next
higher multiple of the volblocksize. Updates the man page and adds a
test to verify the new behavior.
Reviewed-by: Brian Behlendorf <[email protected]>
Reported-by: puffi <[email protected]>
Signed-off-by: Alex John <[email protected]>
Closes #8541
Closes #10196
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch corrects a bug introduced in 61152d1069. When
resuming a raw base receive, the dmu_recv code always sets
drc->drc_fromsnapobj to the object ID of the previous
snapshot. For incrementals, this is correct, but for base
sends, this should be left at 0. The presence of this ID
eventually allows a check to run which determines whether
or not the incoming stream and the previous snapshot have
matching IVset guids. This check fails becuase it is not
meant to run when there is no previous snapshot. When it
does fail, the user receives an error stating that the
incoming stream has the problem outlined in errata 4.
This patch corrects this issue by simply ensuring
drc->drc_fromsnapobj is left as 0 for base receives.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #10234
Closes #10239
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix uninitialized variable in `zstream redup` command. The compiler
may determine the 'stream_offset' variable can be uninitialized
because not all rdt_lookup() exit paths set it. This should never
happen in practice as documented by the assert, but initialize it
regardless to resolve the warning.
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #10241
Closes #10244
|
|
|
|
|
|
|
|
|
|
| |
This aids in debugging, so that we can use the same infrastructure to
walk zfs's list_t in the kernel module and in the userland libraries
(e.g. when debugging ztest).
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #10236
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such
streams) are deprecated. Deduplicated send streams can be received by
first converting them to non-deduplicated with the `zstream redup`
command.
This commit removes the code for sending and receiving deduplicated send
streams. `zfs send -D` will now print a warning, ignore the `-D` flag,
and generate a regular (non-deduplicated) send stream. `zfs receive` of
a deduplicated send stream will print an error message and fail.
The resulting code simplification (especially in the kernel's support
for receiving dedup streams) should help enable future performance
enhancements.
Several new tests are added which leverage `zstream redup`.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Issue #7887
Issue #10117
Issue #10156
Closes #10212
|
|
|
|
|
|
|
| |
This commit fixes a bunch of missing free() calls in a10d50f99
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: João Carlos Mendes Luís <[email protected]>
Closes #10219
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Each metaslab group (of which there is one per top-level vdev) has
several (4, by default) "metaslab group allocators". Each "allocator"
has its own metaslab that it prefers to allocate from (the "primary"
allocator), and each can perform allocations concurrently with the other
allocators. In addition to the primary metaslab, there are several
other fields that need to be tracked separately for each allocator.
These are currently stored as several arrays in the metaslab_group_t,
each array indexed by allocator number.
This change organizes all the metaslab-group-allocator-specific fields
into a new struct, metaslab_group_allocator_t. The metaslab_group_t now
needs only one array indexed by the allocator number - which contains
the metaslab_group_allocator_t's.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #10213
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On zpools containing hole vdevs (e.g. removed log devices), the `zpool
trim` (and presumably `zpool initialize`) commands will attempt calling
their respective functions on "hole", which fails, as this is not a real
vdev.
Avoid this by removing HOLE vdevs in zpool_collect_leaves.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Niklas Haas <[email protected]>
Closes #10227
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The progress of a send is supposed to be reported by `zfs send -v`, but
it is not. This works by creating a new user thread (with
pthread_create()) which does ZFS_IOC_SEND_PROGRESS ioctls to check how
much progress has been made. This IOCTL finds the specified send (since
there may be multiple concurrent sends in the system). The IOCTL also
checks that the specified send was started by the current process.
On Linux, different threads of the same process are represented as
different `struct task_struct`s (and, confusingly, have different
PID's). To check if if two threads are in the same process, we need to
check if they have the same `struct task_struct:group_leader`.
We used to to this correctly, but it was inadvertently changed by
30af21b02569 (Redacted Send) to simply check if the current
`struct task_struct` is the one that started the send.
This commit changes the code back to checking if the send was started by
a `struct task_struct` with the same `group_leader` as the calling
thread.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Chris Wedgwood <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #10215
Closes #10216
|
|
|
|
|
|
|
|
| |
Propagate changes in HEAD that mostly eliminate object locking.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #10205
|
|
|
|
|
|
|
|
|
|
|
| |
Minor fixes on persistent L2ARC improving code readability and fixing
a typo in zdb.c when byte-swapping a log block. It also improves the
pesist_l2arc_007_pos.ksh test by giving it more time to retrieve log
blocks on the cache device.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Adam D. Moss <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes #10210
|
|
|
|
|
|
|
|
| |
Remove some obsolete legacy compat, rename some misnamed, and add some
missing tunables for FreeBSD.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10203
|
|
|
|
|
|
|
|
|
|
|
| |
Add a comment so the file is not empty.
The comment can be removed when FreeBSD-specific tests are added.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Sean Eric Fagan <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10206
|
|
|
|
|
|
|
|
|
|
| |
./cmd/zpool/zpool.d/smart:78:32:
note: Double quote to prevent globbing and word splitting. [SC2086]
Reported by latest shellcheck on FreeBSD.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10194
|
|
|
|
|
|
|
|
|
|
|
|
| |
Musl libc defined `stat64` as a macro, which causes the build to fail
upon compiling os/linux/getmntany.c due to conflicts between the forward
declaration and the implementation.
This commit fixes that by including <sys/stat.h> in "sys/mnttab.h"
directly.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Hiếu Lê <[email protected]>
Closes #10195
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the FreeBSD platform code to the OpenZFS repository. As of this
commit the source can be compiled and tested on FreeBSD 11 and 12.
Subsequent commits are now required to compile on FreeBSD and Linux.
Additionally, they must pass the ZFS Test Suite on FreeBSD which is
being run by the CI. As of this commit 1230 tests pass on FreeBSD
and there are no unexpected failures.
Reviewed-by: Sean Eric Fagan <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Co-authored-by: Ryan Moeller <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #898
Closes #8987
|
|
|
|
|
|
|
|
|
|
|
|
| |
The test for VDEV_TYPE_INDIRECT is done after a memory allocation, and
could return from function without freeing it. Since we don't need that
allocation yet, just postpone it.
Add a missing free() when buffer is no longer needed.
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: João Carlos Mendes Luís <[email protected]>
Closes #10193
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The memory and cpu cost of reference count tracking with the current
implementation is significant. For this reason it has always been
disabled by default for the kmods. Apply this same default to user
space so ztest doesn't always incur this performance penalty.
Our intention is to re-enable this by default for ztest once the code
has been optimized. Since we expect to at some point provide a FUSE
implementation we wouldn't want this enabled by default for libzpool.
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #10189
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 379ca9c removed the requirement on aux devices to be block
devices only but the test case cache_010_neg was not updated, making it
fail consistently.
This change changes the test to check that cache devices _can_ be
anything that presents a block interface. The testcase is renamed to
cache_010_pos and the exceptions for known failure removed from the test
runner.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reported-by: Richard Elling <[email protected]>
Signed-off-by: Alex John <[email protected]>
Closes #10172
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can improve the performance of writes to zvols by using
dmu_tx_hold_write_by_dnode() instead of dmu_tx_hold_write(). This
reduces lock contention on the first block of the dnode object, and also
reduces the amount of CPU needed. The benefit will be highest with
multi-threaded async writes (i.e. writes that don't call zil_commit()).
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #10184
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Fix uninitialized variable in `zstream redup` command. The
'rdt.ddt_count' variable is uninitialized because it was
allocated from the stack and not globally. Initialize it.
This was reported by gcc when compiling with debugging enabled.
zstream_redup.c:157:16: error: 'rdt.ddt_count' may be used
uninitialized in this function [-Werror=maybe-uninitialized]
* Remove the cmd/zstreamdump/.gitignore file. It's no longer
needed now that the zstreamdump command is a script.
Reviewed-by: Matthew Ahrens <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #10192
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deduplicated send and receive is deprecated. To ease migration to the
new dedup-send-less world, the commit adds a `zstream redup` utility to
convert deduplicated send streams to normal streams, so that they can
continue to be received indefinitely.
The new `zstream` command also replaces the functionality of
`zstreamdump`, by way of the `zstream dump` subcommand. The
`zstreamdump` command is replaced by a shell script which invokes
`zstream dump`.
The way that `zstream redup` works under the hood is that as we read the
send stream, we build up a hash table which maps from `<GUID, object,
offset> -> <file_offset>`.
Whenever we see a WRITE record, we add a new entry to the hash table,
which indicates where in the stream file to find the WRITE record for
this block. (The key is `drr_toguid, drr_object, drr_offset`.)
For entries other than WRITE_BYREF, we pass them through unchanged
(except for the running checksum, which is recalculated).
For WRITE_BYREF records, we change them to WRITE records. We find the
referenced WRITE record by looking in the hash table (for the record
with key `drr_refguid, drr_refobject, drr_refoffset`), and then reading
the record header and payload from the specified offset in the stream
file. This is why the stream can not be a pipe. The found WRITE record
replaces the WRITE_BYREF record, with its `drr_toguid`, `drr_object`,
and `drr_offset` fields changed to be the same as the WRITE_BYREF's
(i.e. we are writing the same logical block, but with the data supplied
by the previous WRITE record).
This algorithm requires memory proportional to the number of WRITE
records (same as `zfs send -D`), but the size per WRITE record is
relatively low (40 bytes, vs. 72 for `zfs send -D`). A 1TB send stream
with 8KB blocks (`recordsize=8k`) would use around 5GB of RAM to
"redup".
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #10124
Closes #10156
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit makes the L2ARC persistent across reboots. We implement
a light-weight persistent L2ARC metadata structure that allows L2ARC
contents to be recovered after a reboot. This significantly eases the
impact a reboot has on read performance on systems with large caches.
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Co-authored-by: Saso Kiselkov <[email protected]>
Co-authored-by: Jorgen Lundman <[email protected]>
Co-authored-by: George Amanakis <[email protected]>
Ported-by: Yuxuan Shui <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes #925
Closes #1823
Closes #2672
Closes #3744
Closes #9582
|
|
|
|
|
|
|
|
|
|
|
| |
Set arc_c_min before arc_c_max so that when zfs_arc_min is set lower
than the default allmem/32 zfs_arc_max can also be set lower.
Add warning messages when tunables are being ignored.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10157
Closes #10158
|
|
|
|
|
|
|
|
|
|
|
| |
By default it's not possible to open a device already owned by an
active vdev. It's necessary to make an exception to this for vdev
split. The FreeBSD platform code will make an exception if
spa_is splitting is set to to true.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #10178
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit https://github.com/torvalds/linux/commit/3d745ea5 simplified
the blk_alloc_queue() interface by updating it to take the request
queue as an argument. Add a wrapper function which accepts the new
arguments and internally uses the available interfaces.
Other minor changes include increasing the Linux-Maximum to 5.6 now
that 5.6 has been released. It was not bumped to 5.7 because this
release has not yet been finalized and is still subject to change.
Added local 'struct zvol_state_os *zso' variable to zvol_alloc.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #10181
Closes #10187
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added to prevent a possible deadlock, the following comments from
FreeBSD explain the issue. The comment describing vn_io_fault_uiomove:
/*
* Helper function to perform the requested uiomove operation using
* the held pages for io->uio_iov[0].iov_base buffer instead of
* copyin/copyout. Access to the pages with uiomove_fromphys()
* instead of iov_base prevents page faults that could occur due to
* pmap_collect() invalidating the mapping created by
* vm_fault_quick_hold_pages(), or pageout daemon, page laundry or
* object cleanup revoking the write access from page mappings.
*
* Filesystems specified MNTK_NO_IOPF shall use vn_io_fault_uiomove()
* instead of plain uiomove().
*/
This used for vn_io_fault which has the following motivation:
/*
* The vn_io_fault() is a wrapper around vn_read() and vn_write() to
* prevent the following deadlock:
*
* Assume that the thread A reads from the vnode vp1 into userspace
* buffer buf1 backed by the pages of vnode vp2. If a page in buf1 is
* currently not resident, then system ends up with the call chain
* vn_read() -> VOP_READ(vp1) -> uiomove() -> [Page Fault] ->
* vm_fault(buf1) -> vnode_pager_getpages(vp2) -> VOP_GETPAGES(vp2)
* which establishes lock order vp1->vn_lock, then vp2->vn_lock.
* If, at the same time, thread B reads from vnode vp2 into buffer buf2
* backed by the pages of vnode vp1, and some page in buf2 is not
* resident, we get a reversed order vp2->vn_lock, then vp1->vn_lock.
*
* To prevent the lock order reversal and deadlock, vn_io_fault() does
* not allow page faults to happen during VOP_READ() or VOP_WRITE().
* Instead, it first tries to do the whole range i/o with pagefaults
* disabled. If all pages in the i/o buffer are resident and mapped,
* VOP will succeed (ignoring the genuine filesystem errors).
* Otherwise, we get back EFAULT, and vn_io_fault() falls back to do
* i/o in chunks, with all pages in the chunk prefaulted and held
* using vm_fault_quick_hold_pages().
*
* Filesystems using this deadlock avoidance scheme should use the
* array of the held pages from uio, saved in the curthread->td_ma,
* instead of doing uiomove(). A helper function
* vn_io_fault_uiomove() converts uiomove request into
* uiomove_fromphys() over td_ma array.
*
* Since vnode locks do not cover the whole i/o anymore, rangelocks
* make the current i/o request atomic with respect to other i/os and
* truncations.
*/
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #10177
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux and FreeBSD have different parameters for tunable proc handler.
This has prevented FreeBSD from implementing the ZFS_MODULE_PARAM_CALL
macro.
To complete the sharing of ZFS_MODULE_PARAM_CALL declarations, create
per-platform definitions of the parameter list, ZFS_MODULE_PARAM_ARGS.
With the declarations wired up we discovered an incorrect scope prefix
for spa_slop_shift, so this is now fixed.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10179
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 379ca9c removed the check on aux devices to be block devices also
changing zfs_ioctl(hdl, ZFS_IOC_VDEV_ADD, ...) and
zfs_ioctl(hdl, ZFS_IOC_POOL_CREATE, ...) to never set ENOTBLK. This
change removes the dangling check for ENOTBLK that will never trigger.
Reviewed-by: Brian Behlendorf <[email protected]>
Reported-by: Richard Elling <[email protected]>
Signed-off-by: Alex John <[email protected]>
Closes #10173
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The delegate tests use `date(1)` to generate snapshot names, using
the format '%F-%T-%N' to get nanosecond resolution (since multiple
snapshots may be taken in the same second). '%N' is not portable, and
causes tests to fail on FreeBSD.
Since the only purpose these timestamps serve is to create a unique
name, simply use $RANDOM instead.
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10170
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a mechanism to wait for delete queue to drain.
When doing redacted send/recv, many workflows involve deleting files
that contain sensitive data. Because of the way zfs handles file
deletions, snapshots taken quickly after a rm operation can sometimes
still contain the file in question, especially if the file is very
large. This can result in issues for redacted send/recv users who
expect the deleted files to be redacted in the send streams, and not
appear in their clones.
This change duplicates much of the zpool wait related logic into a
zfs wait command, which can be used to wait until the internal
deleteq has been drained. Additional wait activities may be added
in the future.
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Gallagher <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes #9707
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In zfs_write(), the loop continues to the next iteration without
accounting for partial copies occurring in uiomove_iov when
copy_from_user/__copy_from_user_inatomic return a non-zero status.
This results in "zfs: accessing past end of object..." in the
kernel log, and the write failing.
Account for partial copies and update uio struct before returning
EFAULT, leave a comment explaining the reason why this is done.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: ilbsmart <[email protected]>
Signed-off-by: Fabio Scaccabarozzi <[email protected]>
Closes #8673
Closes #10148
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
== Summary ==
Prior to this change, sync writes to a zvol are processed serially.
This commit makes zvols process concurrently outstanding sync writes in
parallel, similar to how reads and async writes are already handled.
The result is that the throughput of sync writes is tripled.
== Background ==
When a write comes in for a zvol (e.g. over iscsi), it is processed by
calling `zvol_request()` to initiate the operation. ZFS is expected to
later call `BIO_END_IO()` when the operation completes (possibly from a
different thread). There are a limited number of threads that are
available to call `zvol_request()` - one one per iscsi client (unless
using MC/S). Therefore, to ensure good performance, the latency of
`zvol_request()` is important, so that many i/o operations to the zvol
can be processed concurrently. In other words, if the client has
multiple outstanding requests to the zvol, the zvol should have multiple
outstanding requests to the storage hardware (i.e. issue multiple
concurrent `zio_t`'s).
For reads, and async writes (i.e. writes which can be acknowledged
before the data reaches stable storage), `zvol_request()` achieves low
latency by dispatching the bulk of the work (including waiting for i/o
to disk) to a taskq. The taskq callback (`zvol_read()` or
`zvol_write()`) blocks while waiting for the i/o to disk to complete.
The `zvol_taskq` has 32 threads (by default), so we can have up to 32
concurrent i/os to disk in service of requests to zvols.
However, for sync writes (i.e. writes which must be persisted to stable
storage before they can be acknowledged, by calling `zil_commit()`),
`zvol_request()` does not use `zvol_taskq`. Instead it blocks while
waiting for the ZIL write to disk to complete. This has the effect of
serializing sync writes to each zvol. In other words, each zvol will
only process one sync write at a time, waiting for it to be written to
the ZIL before accepting the next request.
The same issue applies to FLUSH operations, for which `zvol_request()`
calls `zil_commit()` directly.
== Description of change ==
This commit changes `zvol_request()` to use
`taskq_dispatch_ent(zvol_taskq)` for sync writes, and FLUSh operations.
Therefore we can have up to 32 threads (the taskq threads)
simultaneously calling `zil_commit()`, for a theoretical performance
improvement of up to 32x.
To avoid the locking issue described in the comment (which this commit
removes), we acquire the rangelock from the taskq callback (e.g.
`zvol_write()`) rather than from `zvol_request()`. This applies to all
writes (sync and async), reads, and discard operations. This means that
multiple simultaneously-outstanding i/o's which access the same block
can complete in any order. This was previously thought to be incorrect,
but a review of the block device interface requirements revealed that
this is fine - the order is inherently not defined. The shorter hold
time of the rangelock should also have a slight performance improvement.
For an additional slight performance improvement, we use
`taskq_dispatch_ent()` instead of `taskq_dispatch()`, which avoids a
`kmem_alloc()` and eliminates a failure mode. This applies to all
writes (sync and async), reads, and discard operations.
== Performance results ==
We used a zvol as an iscsi target (server) for a Windows initiator
(client), with a single connection (the default - i.e. not MC/S).
We used `diskspd` to generate a workload with 4 threads, doing 1MB
writes to random offsets in the zvol. Without this change we get
231MB/s, and with the change we get 728MB/s, which is 3.15x the original
performance.
We ran a real-world workload, restoring a MSSQL database, and saw
throughput 2.5x the original.
We saw more modest performance wins (typically 1.5x-2x) when using MC/S
with 4 connections, and with different number of client threads (1, 8,
32).
Reviewed-by: Tony Nguyen <[email protected]>
Reviewed-by: Pavel Zakharov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #10163
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Increasing l2arc_write_size or l2arc_write_boost can result in
l2arc_write_buffers() not having enough space to perform its writes and
panic zio_write_phys().
Instead of resetting l2ad_hand to l2ad_start at the end of
l2arc_write_buffers() and not taking into account a possible
user-mediated increase of l2arc_write_max, we do this in l2arc_evict(),
right after l2arc_write_size() has run. If there is not enough space to
evict (ie we will exceed l2ad_end) we evict to the end of the device,
reset l2ad_hand to l2ad_start, set l2ad_first to 0 and iterate
l2arc_evict(). We avoid infinite iteration of l2arc_evict() by making
sure in l2arc_write_size() that l2ad_start + size does not exceed
l2ad_end.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes #10154
|