| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement semi-compatible functionality for mode=0 (preallocation)
and mode=FALLOC_FL_KEEP_SIZE (preallocation beyond EOF) for ZPL.
Since ZFS does COW and snapshots, preallocating blocks for a file
cannot guarantee that writes to the file will not run out of space.
Even if the first overwrite was guaranteed, it would not handle any
later overwrite of blocks due to COW, so strict compliance is futile.
Instead, make a best-effort check that at least enough free space is
currently available in the pool (with a bit of margin), then create
a sparse file of the requested size and continue on with life.
This does not handle all cases (e.g. several fallocate() calls before
writing into the files when the filesystem is nearly full), which
would require a more complex mechanism to be implemented, probably
based on a modified version of dmu_prealloc(), but is usable as-is.
A new module option zfs_fallocate_reserve_percent is used to control
the reserve margin for any single fallocate call. By default, this
is 110% of the requested preallocation size, so an additional 10% of
available space is reserved for overhead to allow the application a
good chance of finishing the write when the fallocate() succeeds.
If the heuristics of this basic fallocate implementation are not
desirable, the old non-functional behavior of returning EOPNOTSUPP
for calls can be restored by setting zfs_fallocate_reserve_percent=0.
The parameter of zfs_statvfs() is changed to take an inode instead
of a dentry, since no dentry is available in zfs_fallocate_common().
A few tests from @behlendorf cover basic fallocate functionality.
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Arshad Hussain <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Co-authored-by: Brian Behlendorf <[email protected]>
Signed-off-by: Andreas Dilger <[email protected]>
Issue #326
Closes #10408
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Illumos callers of cv_timedwait and cv_timedwait_hires
can't distinguish between whether or not the cv was signaled
or the call timed out. Illumos handles this (for some definition
of handles) by calling cv_signal in the return path if we were
signaled but the return value indicates instead that we timed
out. This would make sense if it were possible to query the the
cv for its net signal disposition. However, this isn't possible
and, in spite of the fact that there are places in the code that
clearly take a different and incompatible path if a timeout value
is indicated, this distinction appears to be rather subtle to most
developers. This problem is further compounded by the fact that on
Linux, calling cv_signal in the return path wouldn't even do the
right thing unless there are other waiters.
Since it is possible for the caller to independently determine how
much time is remaining but it is not possible to query if the cv
was in fact signaled, prioritizing signalling over timeout seems
like a cleaner solution. In addition, judging from usage patterns
within the code itself, it is also less error prone.
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #10471
|
|
|
|
|
|
|
|
|
|
| |
Apparently missed in the initial port integration was
the need to reap the abd_chunk_cache on FreeBSD. This
change addresses that oversight.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #10474
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As it uses kmem_strdup() and kmem_strfree() which both rely on
strlen() being the same, but saved_poolname can be truncated causing:
SPL: kernel memory allocator:
buffer freed to wrong cache
SPL: buffer was allocated from kmem_alloc_16,
SPL: caller attempting free to kmem_alloc_8.
SPL: buffer=0xffffff90acc66a38 bufctl=0x0 cache: kmem_alloc_8
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Closes #10469
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For at least 15 years since OpenSolaris arc_c was set by default to
arc_c_max, later decreased under memory pressure. I've noticed that
if arc_c was set high enough to cause memory pressure as considered
by ZFS, setting of arc_no_grow to TRUE in arc_reap_cb_check() makes
no effect until both arc_kmem_reap_soon() and delay(reap_retry_ms)
return. All that time ZFS can continue increasing its effective ARC
size, causing more memory pressure, potentially up to the point when
OS low memory handler activates and reduces arc_c, requesting fast
reclamation of just allocated memory.
The problem seems to be more serious on FreeBSD and I guess Linux,
since neither of them implement/use asynchronous kmem reclamation,
so arc_kmem_reap_soon() can take more time. On older FreeBSD 11 not
supporting multiple memory domains system with lots of RAM can get
completely unresponsive for minutes due to heavy lock congestion
between ARC reclamation and page daemon kmem reclamation threads.
With this change to more conservative arc_c value ARC stops growing
just it time and does not need later reclamation.
Also while there, since now growing arc_c is a more often situation,
use aggsum_upper_bound() instead of aggsum_compare() in arc_adapt()
to reduce lock congestion. It is also getting in sync with code in
arc_get_data_impl().
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored-By: iXsystems, Inc.
Closes #10437
|
|
|
|
|
|
|
|
|
|
|
| |
Since https://reviews.freebsd.org/D24408 FreeBSD provides XDR functions
in the xdr module instead of krpc.
For FreeBSD 13, the MODULE_DEPEND should be changed to xdr
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10442
Closes #10443
|
|
|
|
|
|
|
|
|
| |
Linux defines different vdev_disk_t members to macOS, but they are
only used in vdev_disk.c so move the declaration there.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Closes #10452
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the event we are allocating a gang ABD in FreeBSD we are passing 0
to abd_alloc_struct(); however, this led to an allocation of ABD scatter
with 0 chunks. This left the gang ABD allocation 24 bytes smaller than
it should have been.
Reviewed-by: Matt Macy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Co-authored-by: Matt Macy <[email protected]>
Signed-off-by: Brian Atkinson <[email protected]>
Closes #10431
|
|
|
|
|
|
|
|
|
| |
The macOS uio struct is opaque and the API must be used, this
makes the smallest changes to the code for all platforms.
Reviewed-by: Matt Macy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Closes #10412
|
|
|
|
|
|
|
|
|
| |
On macOS clock_t is unsigned, so when cv_timedwait_hires() returns -1
we loop forever. The conditional was tweaked to ignore signedness.
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Closes #10445
|
|
|
|
|
|
|
|
|
|
|
| |
For MIPS architectures on Linux the ZERO_PAGE macro references
empty_zero_page, which is exported as a GPL symbol. The call to
ZERO_PAGE in abd_alloc_zero_scatter has been removed and a single
zero'd page is now allocated for each of the pages in abd_zero_scatter
in the kernel ABD code path.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Brian Atkinson <[email protected]>
Closes #10428
|
|
|
|
|
|
|
|
|
| |
The patch was applied to vdev_geom_open instead of vdev_geom_close by
mistake.
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10427
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The linux module can be built either as an external module, or compiled
into the kernel, using copy-builtin. The source and build directories
are slightly different between the two cases, and currently, compiling
into the kernel still refers to some files from the configured ZFS
source tree, instead of the copies inside the kernel source tree. There
is also duplication between copy-builtin, which creates a Kbuild file to
build ZFS inside the kernel tree, and the top-level module/Makefile.in.
Fix this by moving the list of modules and the CFLAGS settings into a
new module/Kbuild.in, which will be used by the kernel kbuild
infrastructure, and using KBUILD_EXTMOD to distinguish the two cases
within the Makefiles, in order to choose appropriate include
directories etc.
Module CFLAGS setting is simplified by using subdir-ccflags-y (available
since 2.6.30) to set them in the top-level Kbuild instead of each
individual module. The disabling of -Wunused-but-set-variable is removed
from the lua and zfs modules. The variable that the Makefile uses is
actually not defined, so this has no effect; and the warning has long
been disabled by the kernel Makefile itself.
The target_cpu definition in module/{zfs,zcommon} is removed as it was
replaced by use of CONFIG_SPARC64 in
commit 70835c5b755e ("Unify target_cpu handling")
os/linux/{spl,zfs} are removed from obj-m, as they are not modules in
themselves, but are included by the Makefile in the spl and zfs module
directories. The vestigial Makefiles in os and os/linux are removed.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Arvind Sankar <[email protected]>
Closes #10379
Closes #10421
|
|
|
|
|
|
|
|
|
| |
Correct various typos in the comments and tests.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Andrea Gelmini <[email protected]>
Closes #10423
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Background:
By increasing the recordsize property above the default of 128KB, a
filesystem may have "large" blocks. By default, a send stream of such a
filesystem does not contain large WRITE records, instead it decreases
objects' block sizes to 128KB and splits the large blocks into 128KB
blocks, allowing the large-block filesystem to be received by a system
that does not support the `large_blocks` feature. A send stream
generated by `zfs send -L` (or `--large-block`) preserves the large
block size on the receiving system, by using large WRITE records.
When receiving an incremental send stream for a filesystem with large
blocks, if the send stream's -L flag was toggled, a bug is encountered
in which the file's contents are incorrectly zeroed out. The contents
of any blocks that were not modified by this send stream will be lost.
"Toggled" means that the previous send used `-L`, but this incremental
does not use `-L` (-L to no-L); or that the previous send did not use
`-L`, but this incremental does use `-L` (no-L to -L).
Changes:
This commit addresses the problem with several changes to the semantics
of zfs send/receive:
1. "-L to no-L" incrementals are rejected. If the previous send used
`-L`, but this incremental does not use `-L`, the `zfs receive` will
fail with this error message:
incremental send stream requires -L (--large-block), to match
previous receive.
2. "no-L to -L" incrementals are handled correctly, preserving the
smaller (128KB) block size of any already-received files that used large
blocks on the sending system but were split by `zfs send` without the
`-L` flag.
3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`.
This feature indicates that we can correctly handle "no-L to -L"
incrementals. This flag is currently not set on any send streams. In
the future, we intend for incremental send streams of snapshots that
have large blocks to use `-L` by default, and these streams will also
have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams
from the default use of `zfs send` won't encounter the bug mentioned
above, because they can't be received by software with the bug.
Implementation notes:
To facilitate accessing the ZPL's generation number,
`zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and
restructured to fill in a struct with ZPL-specific info including owner
and generation.
In the "no-L to -L" case, if this is a compressed send stream (from
`zfs send -cL`), large WRITE records that are being written to small
(128KB) blocksize files need to be decompressed so that they can be
written split up into multiple blocks. The zio pipeline will recompress
each smaller block individually.
A new test case, `send-L_toggle`, is added, which tests the "no-L to -L"
case and verifies that we get an error for the "-L to no-L" case.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #6224
Closes #10383
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The l2arc_evict() function is responsible for evicting buffers which
reference the next bytes of the L2ARC device to be overwritten. Teach
this function to additionally TRIM that vdev space before it is
overwritten if the device has been filled with data. This is done by
vdev_trim_simple() which trims by issuing a new type of TRIM,
TRIM_TYPE_SIMPLE.
We also implement a "Trim Ahead" feature. It is a zfs module parameter,
expressed in % of the current write size. This trims ahead of the
current write size. A minimum of 64MB will be trimmed. The default is 0
which disables TRIM on L2ARC as it can put significant stress to
underlying storage devices. To enable TRIM on L2ARC we set
l2arc_trim_ahead > 0.
We also implement TRIM of the whole cache device upon addition to a
pool, pool creation or when the header of the device is invalid upon
importing a pool or onlining a cache device. This is dependent on
l2arc_trim_ahead > 0. TRIM of the whole device is done with
TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t.
We save the TRIM state for the whole device and the time of completion
on-disk in the header, and restore these upon L2ARC rebuild so that
zpool status -t can correctly report them. Whole device TRIM is done
asynchronously so that the user can export of the pool or remove the
cache device while it is trimming (ie if it is too slow).
We do not TRIM the whole device if persistent L2ARC has been disabled by
l2arc_rebuild_enabled = 0 because we may not want to lose all cached
buffers (eg we may want to import the pool with
l2arc_rebuild_enabled = 0 only once because of memory pressure). If
persistent L2ARC has been disabled by setting the module parameter
l2arc_rebuild_blocks_min_l2size to a value greater than the size of the
cache device then the whole device is trimmed upon creation or import of
a pool if l2arc_trim_ahead > 0.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Adam D. Moss <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes #9713
Closes #9789
Closes #10224
|
|
|
|
|
|
|
| |
Move the GFP flags kernel compat code from c file to kmem header.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Michael Niewöhner <[email protected]>
Closes #10424
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `pgprot` argument has been removed from `__vmalloc` in Linux 5.8,
being `PAGE_KERNEL` always now [1].
Detect this during configure and define a wrapper for older kernels.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/mm/vmalloc.c?h=next-20200605&id=88dca4ca5a93d2c09e5bbc6a62fbfc3af83c4fca
Reviewed-by: Brian Behlendorf <[email protected]>
Co-authored-by: Sebastian Gottschall <[email protected]>
Co-authored-by: Michael Niewöhner <[email protected]>
Signed-off-by: Sebastian Gottschall <[email protected]>
Signed-off-by: Michael Niewöhner <[email protected]>
Closes #10422
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Illumos it is possible to call ioctl functions from within the
kernel by passing the FKIOCTL flag. Neither FreeBSD nor Linux support
that, but it doesn't hurt to keep it around, as all the code is there.
Before this commit it was a dead code and zc_iflags was always zero.
Restore this functionality by allowing to pass a flag to the
zfsdev_ioctl_common() function.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Pawel Jakub Dawidek <[email protected]>
Closes #10417
|
|
|
|
|
|
|
|
|
|
| |
By removing excessive includes it takes us a small step close to
compiling this file in userland.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Pawel Jakub Dawidek <[email protected]>
Closes #10415
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The strcpy() and sprintf() functions are deprecated on some platforms.
Care is needed to ensure correct size is used. If some platforms
miss snprintf, we can add a #define to sprintf, likewise strlcpy().
The biggest change is adding a size parameter to zfs_id_to_fuidstr().
The various *_impl_get() functions are only used on linux and have
not yet been updated.
Reviewed by: Sean Eric Fagan <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Closes #10400
|
|
|
|
|
|
|
|
|
| |
C++ is a little picky about not using keywords for names, or string
constness.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10409
|
|\
| |
| |
| |
| | |
Signed-off-by: Allan Jude <[email protected]>
Co-authored-by: Allan Jude <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Allan Jude <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Expand the FreeBSD spl for kstats to support all current types
Move the dataset_kstats_t back to zvol_state_t from zfs_state_os_t
now that it is common once again
```
kstat.zfs/mypool.dataset.objset-0x10b.nunlinked: 0
kstat.zfs/mypool.dataset.objset-0x10b.nunlinks: 0
kstat.zfs/mypool.dataset.objset-0x10b.nread: 150528
kstat.zfs/mypool.dataset.objset-0x10b.reads: 48
kstat.zfs/mypool.dataset.objset-0x10b.nwritten: 134217728
kstat.zfs/mypool.dataset.objset-0x10b.writes: 1024
kstat.zfs/mypool.dataset.objset-0x10b.dataset_name: mypool/datasetname
```
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed by: Sean Eric Fagan <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Allan Jude <[email protected]>
Closes #10386
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was possible to cause a kernel panic in the send code by
initializing an already-initialized mutex, if a record was created
with type DATA, destroyed with a different type (bypassing the
mutex_destroy call) and then re-allocated as a DATA record again.
We tweak the logic to not change the type of a record once it has
been created, avoiding the issue.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes #10374
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zvol_geom_bio_strategy should handle its own use of the zvol
suspend reader lock and ensure the zilog exists when needed.
A few other places using the zvol zilog should use the suspend
reader lock as well.
Simplify consumers of zvol_geom_bio_strategy, fix the locking, and
while in here, use the boolean_t constants with doread.
Reviewed-by: Matt Macy <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10381
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FreeBSD needs arc_adjust_zthr to run periodically for kstats to be
updated. A comment in the code suggests this may have been the
original intent in illumos as well:
https://github.com/openzfs/zfs/blob/c946d5a91329b075fb9bda1ac703a2e85139cf1c/module/zfs/arc.c#L4697-L4700
Create the thread with a 1 second timer.
Reviewed-by: Matt Macy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10371
|
|
|
|
|
|
|
|
|
| |
The macOS kmem implementation uses avl_update() and related
functions. These same function exist in the Solaris AVL code but
were removed because they were unused. Restore them.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Closes #10390
|
|
|
|
|
|
|
|
| |
Update API usage to reflect recent change.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #10384
|
|
|
|
|
|
|
|
| |
The dsl_destroy_snapshots_nvl() function has an early error out,
and temporary nvlists were not freed.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Closes #10366
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adding the gang ABD type, which allows for linear and scatter ABDs to
be chained together into a single ABD.
This can be used to avoid doing memory copies to/from ABDs. An example
of this can be found in vdev_queue.c in the vdev_queue_aggregate()
function.
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Co-authored-by: Brian <[email protected]>
Co-authored-by: Mark Maybee <[email protected]>
Signed-off-by: Brian Atkinson <[email protected]>
Closes #10069
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to hotplug support or BIOS bugs sometimes max_ncpus can be
an absurdly high value. I have a system with 32 cores/threads
but reports max_ncpus == 440. This many threads potentially
cripples the system during arc_prune floods for example.
boot_ncpus is the number of working CPUs when called so use
that instead.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: DHE <[email protected]>
Closes #10282
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sponsored by: DARPA
External-issue: https://reviews.freebsd.org/D24656
FreeBSD-commit: freebsd/freebsd@a431c095d32df45a31faad8382b9bc712480e27e
Authored by: jhb <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: Ryan Moeller <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10344
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is arguably a change for internal consistency within OpenZFS, as the
Linux implementation will reject read(2) on directories with EISDIR. It's
not unreasonable for read(2) to do something here on FreeBSD, but we don't
currently copy out anything useful anyways so start rejecting it with the
appropriate error.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Kyle Evans <[email protected]>
Closes #10338
|
|
|
|
|
|
|
|
|
|
| |
We only use ZVOL_DIR on FreeBSD, and on FreeBSD it isn't correct.
Move the definition to the file where it is needed, and define it as
/dev/zvol/.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10337
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If `receive_writer_thread()` gets an error from `receive_process_record()`,
it should be saved in `rwa->err` so that we will stop processing records,
and the main thread will notice that the receive has failed.
When an error is first encountered, this happens correctly. However, if
there are more records to dequeue, the next time through the loop we
will reset `rwa->err` to zero, allowing us to try to process the
following record (2 after the failed record). Depending on what types
of records remain, we may incorrectly complete the receive
"successfully", but without actually having processed all the records.
The fix is to only set `rwa->err` if we got a *non-zero* error.
This bug was introduced by #10099 "Improve zfs receive performance by
batching writes".
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #10320
|
|
|
|
|
|
|
|
|
| |
The VN_OPEN_INVFS literal is in the wrong field.
Reviewed-by: Matt Macy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: yparitcher <[email protected]>
Closes #10322
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit fc551d7 introduced the wrappers abd_enter_critical() and
abd_exit_critical() to mark critical sections. On Linux these are
implemented with the local_irq_save() and local_irq_restore() macros
which set the 'flags' argument when saving. By wrapping them with
a function the local variable is no longer set by the macro and is
no longer properly restored.
Convert abd_enter_critical() and abd_exit_critical() to macros to
resolve this issue and ensure the flags are properly restored.
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #10332
|
|
|
|
|
|
|
|
| |
Undo FreeBSD wrapper for thread_create() added to call thread_exit.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Closes #10314
|
|
|
|
|
|
|
|
| |
The member drc_err of dmu_recv_cookie_t is used only locally in
receive_read, so we can replace it with a local variable.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #10319
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a resilver finishes, vdev_dtl_reassess is called to hopefully
excise DTL_MISSING (amongst other things). If there are errors during
the resilver, they are tracked in DTL_SCRUB, as spelled out in the
block comment in vdev.c. DTL_SCRUB is in-core only, so it can only
be used if the pool was online for the whole resilver. This state is
tracked with the spa_scrub_started flag, which only gets set when
the scan is initialized. Unfortunately, this flag gets cleared right
before vdev_dtl_reassess gets called, so if there are any errors
during the scan, DTL_MISSING will never get excised and the resilver
will just continually restart. This fix simply moves clearing that
flag until after the call to vdev_dtl_reasses.
In addition, if a pool is imported and already has scn_errors > 0,
this change will restart the resilver immediately instead of doing
the rest of the scan and then restarting it from the beginning. On
the other hand, if scn_errors == 0 at import, then no errors have
been encountered so far, so the spa_scrub_started flag can be safely
set.
A test has been added to verify that resilver does not restart when
relevant DTL's are available.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Paul Zuchowski <[email protected]>
Signed-off-by: John Poduska <[email protected]>
Closes #10291
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reorganizing ABD code base so OS-independent ABD code has been placed
into a common abd.c file. OS-dependent ABD code has been left in each
OS's ABD source files, and these source files have been renamed to
abd_os.
The OS-independent ABD code is now under:
module/zfs/abd.c
With the OS-dependent code in:
module/os/linux/zfs/abd_os.c
module/os/freebsd/zfs/abd_os.c
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Brian Atkinson <[email protected]>
Closes #10293
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Functional changes:
We implement refcounts of log blocks and their aligned size on the
cache device along with two corresponding arcstats. The refcounts are
reflected in the header of the device and provide valuable information
as to whether log blocks are accounted for correctly. These are
dynamically adjusted as log blocks are committed/evicted. zdb also uses
this information in the device header and compares it to the
corresponding values as reported by dump_l2arc_log_blocks() which
emulates l2arc_rebuild(). If the refcounts saved in the device header
report higher values, zdb exits with an error. For this feature to work
correctly there should be no active writes on the device. This is also
employed in the tests of persistent L2ARC. We extend the structure of
the cache device header by adding the two new variables mirroring the
refcounts after the existing variables to preserve backward
compatibility in terms of persistent L2ARC.
1) a new arcstat "l2_log_blk_asize" and refcount "l2ad_lb_asize" which
reflect the total aligned size of log blocks on the device. This is
also reflected in the header of the cache device as "dh_lb_asize".
2) a new arcstat "l2arc_log_blk_count" and refcount "l2ad_lb_count"
which reflect the total number of L2ARC log blocks present on cache
devices. It is also reflected in the header of the cache device as
"dh_lb_count".
In l2arc_rebuild_vdev() if the amount of committed log entries in a log
block is 0 and the device header is valid we update the device header.
This will facilitate trimming of the whole device in this case when
TRIM for L2ARC is implemented.
Improve loop protection in l2arc_rebuild() by using the starting offset
of the payload of each log block instead of the starting offset of the
log block.
If the zio in l2arc_write_buffers() fails, restore the lbps array in the
header of the device to its previous state in l2arc_write_done().
If l2arc_rebuild() ends the rebuild process without restoring any L2ARC
log blocks in ARC and without any other error, this means that the lbps
array in the header is pointing to non-existent or invalid log blocks.
Reset the device header in this case.
In l2arc_rebuild() change the zfs_dbgmsg messages to
spa_history_log_internal() making them user visible with zpool history
command.
Non-functional changes:
Make the first test in persistent L2ARC use `zdb -lll` to increase
coverage in `zdb.c`.
Rename psize with asize when referring to log blocks, since
L2ARC_SET_PSIZE stores the vdev aligned size for log blocks. Also
rename dh_log_blk_entries to dh_log_entries to make it clear that
it is a mirror of l2ad_log_entries. Added comments for both changes.
Fix inaccurate comments for example in l2arc_log_blk_restore().
Add asserts at the end in l2arc_evict() and l2arc_write_buffers().
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes #10228
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Modern bootloaders leverage data stored in the root filesystem to
enable some of their powerful features. GRUB specifically has a grubenv
file which can store large amounts of configuration data that can be
read and written at boot time and during normal operation. This allows
sysadmins to configure useful features like automated failover after
failed boot attempts. Unfortunately, due to the Copy-on-Write nature
of ZFS, the standard behavior of these tools cannot handle writing to
ZFS files safely at boot time. We need an alternative way to store
data that allows the bootloader to make changes to the data.
This work is very similar to work that was done on Illumos to enable
similar functionality in the FreeBSD bootloader. This patch is different
in that the data being stored is a raw grubenv file; this file can store
arbitrary variables and values, and the scripting provided by grub is
powerful enough that special structures are not required to implement
advanced behavior.
We repurpose the second padding area in each label to store the grubenv
file, protected by an embedded checksum. We add two ioctls to get and
set this data, and libzfs_core and libzfs functions to access them more
easily. There are no direct command line interfaces to these functions;
these will be added directly to the bootloader utilities.
Reviewed-by: Pavel Zakharov <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes #10009
|
|
|
|
|
|
|
|
|
|
|
| |
When a top-level vdev is removed from a pool it is converted to an
indirect vdev. Until now splitting such mirrored pools was not possible
with zpool split. This patch enables handling of indirect vdevs and
splitting of those pools with zpool split.
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes #10283
|
|
|
|
|
|
|
|
|
| |
Adds a missing taskq_destroy() call.
Reported by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10292
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The steps to reproduce the problem:
mdconfig -a -t swap -s 3g -u 0
gpart create -s GPT md0
gpart add -t freebsd-zfs -s 1g md0
zpool create -o autoexpand=on foo md0p1
gpart resize -i 1 -s 2g md0
Authored by: pjd <[email protected]>
FreeBSD-commit: freebsd/freebsd@bccd2db598ede073d6d06781a5fd3b119c08aa81
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Ported-by: Ryan Moeller <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10270
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sync up with the following changes from FreeBSD:
ZFS: add emulation of atomic_swap_64 and atomic_load_64
Some 32-bit platforms do not provide 64-bit atomic operations that ZFS
requires, either in userland or at all. We emulate those operations
for those platforms using a mutex. That is not entirely correct and
it's very efficient. Besides, the loads are plain loads, so torn
values are possible.
Nevertheless, the emulation seems to work for some definition of work.
This change adds atomic_swap_64, which is already used in ZFS code,
and atomic_load_64 that can be used to prevent torn reads.
Authored by: avg <[email protected]>
FreeBSD-commit: freebsd/freebsd@3458e5d1e6354123ec2b0953d29f98126aa442e
cleanup of illumos compatibility atomics
atomic_cas_32 is implemented using atomic_fcmpset_32 on all platforms.
Ditto for atomic_cas_64 and atomic_fcmpset_64 on platforms that have
it. The only exception is sparc64 that provides MD atomic_cas_32 and
atomic_cas_64.
This is slightly inefficient as fcmpset reports whether the operation
updated the target and that information is not needed for cas.
Nevertheless, there is less code to maintain and to add for new
platforms. Also, the operations are done inline now as opposed to
function calls before.
atomic_add_64_nv is implemented using atomic_fetchadd_64 on platforms
that provide it.
casptr, cas32, atomic_or_8, atomic_or_8_nv are completely removed as
they have no users.
atomic_mtx that is used to emulate 64-bit atomics on platforms that
lack them is defined only on those platforms.
As a result, platform specific opensolaris_atomic.S files have lost
most of their code. The only exception is i386 where the
compat+contrib code provides 64-bit atomics for userland use. That
code assumes availability of cmpxchg8b instruction. FreeBSD does not
have that assumption for i386 userland and does not provide 64-bit
atomics. Hopefully, this can and will be fixed.
Authored by: avg <[email protected]>
FreeBSD-commit: freebsd/freebsd@e9642c209b4413f6afb41d3b2607c51d80a1a34
emulate illumos membar_producer with atomic_thread_fence_rel
membar_producer is supposed to be a store-store barrier.
Also, in the code that FreeBSD has ported from illumos membar_producer
is used only with regular stores to regular memory (with respect to
caching).
We do not have an MI primitive for the store-store barrier, so
atomic_thread_fence_rel is the closest we have as it provides
(load | store) -> store barrier.
Previously, membar_producer was an empty function call on all 32-bit
arm-s, 32-bit powerpc, riscv and all mips variants. I think that it
was inadequate.
On other platforms, such as amd64, arm64, i386, powerpc64, sparc64,
membar_producer was implemented using stronger primitives than required
for a store-store barrier with respect to regular memory access.
For example, it used sfence on amd64 and lock-ed nop in i386 (despite
TSO).
On powerpc64 we now use recommended lwsync instead of eieio.
On sparc64 FreeBSD uses TSO mode.
On arm64/aarch64 we now use dmb sy instead of dmb ish. Not sure if
this is an improvement, actually.
After this change we can drop opensolaris_atomic.S for aarch64, amd64,
powerpc64 and sparc64 as all required atomic operations have either
direct or light-weight mapping to FreeBSD native atomic operations.
Discussed with: kib
Authored by: avg <[email protected]>
FreeBSD-commit: freebsd/freebsd@50cdda62fced8d21e45858e01dc375a10f1749e
fix up r353340, don't assume that fcmpset has strong semantics
fcmpset can have two kinds of semantics, weak and strong.
For practical purposes, strong semantics means that if fcmpset fails
then the reported current value is always different from the expected
value. Weak semantics means that the reported current value may be the
same as the expected value even though fcmpset failed. That's a so
called "sporadic" failure.
I originally implemented atomic_cas expecting strong semantics, but
many platforms actually have weak one.
Reported by: pkubaj (not confirmed if same issue)
Discussed with: kib, mjg
Authored by: avg <[email protected]>
FreeBSD-commit: freebsd/freebsd@238787c74e737e271f17330fbad900acc35651c
[PowerPC] [MIPS] Implement 32-bit kernel emulation of atomic64 operations
This is a lock-based emulation of 64-bit atomics for kernel use, split off
from an earlier patch by jhibbits.
This is needed to unblock future improvements that reduce the need for
locking on 64-bit platforms by using atomic updates.
The implementation allows for future integration with userland atomic64,
but as that implies going through sysarch for every use, the current
status quo of userland doing its own locking may be for the best.
Submitted by: jhibbits (original patch), kevans (mips bits)
Reviewed by: jhibbits, jeff, kevans
Authored by: bdragon <[email protected]>
Differential Revision: https://reviews.freebsd.org/D22976
FreeBSD-commit: freebsd/freebsd@db39dab3a896b3d98e588736e9a2b4ddaeb31f1
Remove sparc64 kernel support
Remove all sparc64 specific files
Remove all sparc64 ifdefs
Removee indireeect sparc64 ifdefs
Authored by: imp <[email protected]>
FreeBSD-commit: freebsd/freebsd@48b94864c51253da92e4444f0074eec36ef391f
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Ported-by: Ryan Moeller <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #10250
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
reflect delete permissions for ACLs
Authored by: Kevin Crowe <[email protected]>
Reviewed by: Gordon Ross <[email protected]>
Reviewed by: Yuri Pankov <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Approved by: Richard Lowe <[email protected]>
Ported-by: Paul B. Henson <[email protected]>
Porting Notes:
* Only comments are updated
OpenZFS-issue: https://www.illumos.org/issues/6765
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/da412744bc
Closes #10266
|