| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12901
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12901
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12901
|
|
|
|
|
|
|
|
| |
We register all providers at once, before anything happens
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12901
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12901
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12901
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12901
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12901
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12901
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12901
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is currently twice the amount we actually have (sha[12], skein,
aes), and 512 * sizeof(void*) = 4096: 128x more than we need and a waste
of most of a page in the kernel address space
Plus, there's no need to actually allocate it dynamically: it's always
got a static size. Put it in .data
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12901
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12901
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12901
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12901
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12901
|
|
|
|
|
|
|
|
|
|
|
| |
Add hooks for when spa is created, exported, activated and
deactivated. Used by macOS to attach iokit, and lock
kext as busy (to stop unloads).
Userland, Linux, and, FreeBSD have empty stubs.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Closes #12801
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two codepaths than can dirty final TXGs:
1) If calling spa_export_common()->spa_unload()->
spa_unload_log_sm_flush_all() after the spa_final_txg is set, then
spa_sync()->spa_flush_metaslabs() may end up dirtying the final
TXGs. Then we have the following panic:
Call Trace:
<TASK>
dump_stack_lvl+0x46/0x62
spl_panic+0xea/0x102 [spl]
dbuf_dirty+0xcd6/0x11b0 [zfs]
zap_lockdir_impl+0x321/0x590 [zfs]
zap_lockdir+0xed/0x150 [zfs]
zap_update+0x69/0x250 [zfs]
feature_sync+0x5f/0x190 [zfs]
space_map_alloc+0x83/0xc0 [zfs]
spa_generate_syncing_log_sm+0x10b/0x2f0 [zfs]
spa_flush_metaslabs+0xb2/0x350 [zfs]
spa_sync_iterate_to_convergence+0x15a/0x320 [zfs]
spa_sync+0x2e0/0x840 [zfs]
txg_sync_thread+0x2b1/0x3f0 [zfs]
thread_generic_wrapper+0x62/0xa0 [spl]
kthread+0x127/0x150
ret_from_fork+0x22/0x30
</TASK>
2) Calling vdev_*_stop_all() for a second time in spa_unload() after
spa_export_common() unnecessarily delays the final TXGs beyond what
spa_final_txg is set at.
Fix this by performing the check and call for
spa_unload_log_sm_flush_all() before the spa_final_txg is set in
spa_export_common(). Also check if the spa_final_txg has already been
set in spa_unload() and skip those calls in this case.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
External-issue: https://www.illumos.org/issues/9081
Closes #13048
Closes #13098
|
|
|
|
|
|
|
|
| |
Unfortunately macOS has obj-C keyword "fallthrough" in the OS headers.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Damian Szuberski <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Closes #13097
|
|
|
|
|
|
|
|
| |
On newer compilers, dsl_dataset.c now warns (or, on DEBUG, errors)
on uninitialized variable usage.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes #13083
|
|
|
|
|
|
|
|
|
|
|
| |
This allows reads/writes caused by accesses to mmap files to be
accounted correctly in the per-dataset kstats for both Linux and
FreeBSD.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Matthias Blankertz <[email protected]>
Closes #12994
Closes #13044
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dmu_recv_begin_check() unconditionally sets the DS_HOLD_FLAG_DECRYPT
flag before calling dsl_dataset_hold_flags(). If the key on the
receiving side isn't loaded or the send stream contains embedded
blocks, the receive check fails for a stream which is perfectly
valid and could be received without any problem. This seems like
a remnant of the initial design, where unencrypted datasets below
encrypted ones weren't allowed.
Add a condition to set `DS_HOLD_FLAG_DECRYPT` only for encrypted
datasets, modify an existing test to detect this regression and add
a test for raw replication streams.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Amanakis <[email protected]>
Co-authored-by: George Amanakis <[email protected]>
Signed-off-by: Attila Fülöp <[email protected]>
Closes #13033
Closes #13076
|
|
|
|
|
|
|
|
| |
To follow a change in illumos taskq
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Closes #12802
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, $(CC), $(LD), and $(LLVM) variables aren't passed to kbuild
while building modules. This causes modules to build with the default
GNU GCC toolchain and prevents experimenting with other toolchains such
as CLANG/LLVM. It can also lead to build failure if the CFLAGS/LDFLAGS
passed are incompatible with gcc/ld.
Pass $KERNEL_CC, $KERNEL_LD, and $KERNEL_LLVM as $(CC), $(LD), and
$(LLVM), respectively, to kbuild for each that is defined in the
environment. This should take care of the majority of alternative
toolchain use cases.
Reviewed-by: Damian Szuberski <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Peter Levine <[email protected]>
Closes #13046
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 5.16 moved XSTATE_XSAVE and XSTATE_XRESTORE out of our reach,
so add our own XSAVE{,OPT,S} code and use it for Linux 5.16.
Please note that this differs from previous behavior in that it
won't handle exceptions created by XSAVE an XRSTOR. This is sensible
for three reasons.
- Exceptions during XSAVE and XRSTOR can only occur if the feature
is not supported or enabled or the memory operand isn't aligned
on a 64 byte boundary. If this happens something else went
terribly wrong, and it may be better to stop execution.
- Previously we just printed a warning and didn't handle the fault,
this is arguable for the above reason.
- All other *SAVE instruction also don't handle exceptions, so this
at least aligns behavior.
Finally add a test to catch such a regression in the future.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Attila Fülöp <[email protected]>
Closes #13042
Closes #13059
|
|
|
|
|
|
|
|
|
| |
All of these externs are already #included as static inline
functions via corresponding headers.
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #13073
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's no need to make the platform ops dynamic dispatch.
This change replaces the dynamic dispatch with static calls to the
platform-specific functions.
To avoid name collisions, prefix all platform-specific functions
with `zvol_os_`.
I actually find `zvol_..._os` slightly nicer to read in the calling
code, but having it as a prefix is useful.
Advantage:
- easier jump-to-definition / grepping
- potential benefits to static analysis
- better legibility
Future work: also prefix remaining `static` functions in zvol_os.c.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Christian Schwarz <[email protected]>
Closes #12965
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use error thresholds from policy to control whether to scrub data
and/or metadata. If threshold is set to UINT64_MAX, then caller
probably does not care about result and we may skip that part.
By default import neither set the data error threshold nor read
the error counter, so skip the data scrub for faster import.
Metadata are still scrubbed and fail if even single error found.
While there just for symmetry return number of metadata errors in
case threshold is not set to zero and we haven't reached it.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Pavel Zakharov <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes #13022
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Christian Schwarz <[email protected]>
Closes #12963
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following commit moved the users of `deferred` into function
dsl_pool_unreserved_space:
commit d2734cce68cf740e015312314415f9034c67851c
Author: Serapheim Dimitropoulos <[email protected]>
Date: Fri Dec 16 14:11:29 2016 -0800
OpenZFS 9166 - zfs storage pool checkpoint
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Christian Schwarz <[email protected]>
Closes #13056
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
POSIX requires that set-uid and set-gid bits to be removed when an
unprivileged user writes to a file and ZFS does that during normal
operation.
The problem arrises when the write is stored in the ZIL and replayed.
During replay we have no access to original credentials of the process
doing the write, so zfs_write() will be performed with the root
credentials. When root is doing the write set-uid and set-gid bits
are not removed from the file.
To correct that, log a separate TX_SETATTR entry that removed those bits
on first write to such file.
Idea from: Christian Schwarz
Add test for ZIL replay of setuid/setgid clearing.
Improve various edge cases when clearing setid bits:
- The setid bits can be readded during a single write, so make sure to check
for them on every chunk write.
- Log TX_SETATTR record at most once per transaction group (if the setid bits
are keep coming back).
- Move zfs_log_setattr() outside of zp->z_acl_lock.
Reviewed-by: Dan McDonald <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Co-authored-by: Christian Schwarz <[email protected]>
Signed-off-by: Pawel Jakub Dawidek <[email protected]>
Closes #13027
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`configure` now accepts `--enable-asan` and `--enable-ubsan` switches
which results in passing `-fsanitize=address`
and `-fsanitize=undefined`, respectively, to the compiler. Those
flags are enabled in GitHub workflows for ZTS and zloop. Errors
reported by both instrumentations are corrected, except for:
- Memory leak reporting is (temporarily) suppressed. The cost of
fixing them is relatively high compared to the gains.
- Checksum computing functions in `module/zcommon/zfs_fletcher*`
have UBSan errors suppressed. It is completely impractical
to enforce 64-byte payload alignment there due to performance
impact.
- There's no ASan heap poisoning in `module/zstd/lib/zstd.c`. A custom
memory allocator is used there rendering that measure
unfeasible.
- Memory leaks detection has to be suppressed for `cmd/zvol_id`.
`zvol_id` is run by udev with the help of `ptrace(2)`. Tracing is
incompatible with memory leaks detection.
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: szubersk <[email protected]>
Closes #12928
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In files created/modified before 4254acb there may be a corruption of
xattrs which is not reported during scrub and normal send/receive. It
manifests only as an error when raw sending/receiving. This happens
because currently only the raw receive path checks for discrepancies
between the dnode bonus length and the spill pointer flag.
In case we encounter a dnode whose bonus length is greater than the
predicted one, we should report an error. Modify in this regard
dnode_sync() with an assertion at the end, dump_dnode() to error out,
dsl_scan_recurse() to report errors during a scrub, and zstream to
report a warning when dumping. Also added a test to verify spill blocks
are sent correctly in a raw send.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes #12720
Closes #13014
|
|
|
|
|
|
|
|
|
|
| |
* Improve naming.
* Reduce indentation.
* Avoid boilerplate logic duplication.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #12967
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12993
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
69 CSTYLED BEGINs remain, appx. 30 of which can be removed if cstyle(1)
had a useful policy regarding
CALL(ARG1,
ARG2,
ARG3);
above 2 lines. As it stands, it spits out *both*
sysctl_os.c: 385: continuation line should be indented by 4 spaces
sysctl_os.c: 385: indent by spaces instead of tabs
which is very cool
Another >10 could be fixed by removing "ulong" &al. handling.
I don't foresee anyone actually using it intentionally
(does it even exist in modern headers? why did it in the first place?).
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12993
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12993
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12979
|
|
|
|
|
|
|
|
|
|
|
|
| |
First open locking changes were correctly applied to zvol_geom_open but
incorrectly applied to zvol_cdev_open, causing spa_namespace_lock to be
held indefinitely.
Make the first open locking in zvol_cdev_open match zvol_geom_open.
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #13016
|
|
|
|
|
|
|
|
|
| |
When using the two argument version of submit_bio() in kernel's prior
to 4.8 the first argument should be specified. It's used by block
dump to report the bio direction.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Finix Yan <[email protected]>
Closes #13006
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Upstream commit 359745d78351c6f5442435f81549f0207ece28aa
("proc: remove PDE_DATA() completely")
Link: https://lore.kernel.org/all/[email protected]/T/#u
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #13004
Closes #12989
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 5.17's dequeue_signal() takes an additional enum pid_type *
output argument
Upstream commit 5768d8906bc23d512b1a736c1e198aa833a6daa4
("signal: Requeue signals in the appropriate queue")
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12989
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 5.17 sees a rename from complete_and_exit()
to kthread complete_and_exit()
Upstream commit cead18552660702a4a46f58e65188fe5f36e9dfe
("exit: Rename complete_and_exit to kthread_complete_and_exit")
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12989
|
|
|
|
|
|
|
|
|
|
| |
For us, I think it's always just FALLOC_FL_PUNCH_HOLE with a fake
mustache on.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Coleman Kane <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes #12975
|
|
|
|
|
|
|
|
|
| |
add_disk went from void to must-check int return.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Coleman Kane <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes #12975
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FreeBSD's implementation of zfs_uio_fault_move() returns EFAULT when a
page fault occurs while copying data in or out of user buffers. The VFS
treats such errors specially and will retry the I/O operation (which may
have made some partial progress).
When the FreeBSD and Linux implementations of zfs_write() were merged,
the handling of errors from dmu_write_uio_dbuf() changed such that
EFAULT is not handled as a partial write. For example, when appending
to a file, the z_size field of the znode is not updated after a partial
write resulting in EFAULT.
Restore the old handling of errors from dmu_write_uio_dbuf() to fix
this. This should have no impact on Linux, which has special handling
for EFAULT already.
Reviewed-by: Andriy Gapon <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Mark Johnston <[email protected]>
Closes #12964
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Raw receiving a snapshot back to the originating dataset is currently
impossible because of user accounting being present in the originating
dataset.
One solution would be resetting user accounting when raw receiving on
the receiving dataset. However, to recalculate it we would have to dirty
all dnodes, which may not be preferable on big datasets.
Instead, we rely on the os_phys flag
OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is
incomplete when raw receiving. Thus, on the next mount of the receiving
dataset the local mac protecting user accounting is zeroed out.
The flag is then cleared when user accounting of the raw received
snapshot is calculated.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes #12981
Closes #10523
Closes #11221
Closes #11294
Closes #12594
Issue #11300
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the eviction thread goes to shrink an ARC state, it allocates a set
of marker buffers used to hold its place in the state's sublists.
This can be problematic in low memory conditions, since
1) the allocation can be substantial, as we allocate NCPU markers;
2) on at least FreeBSD, page reclamation can block in
arc_wait_for_eviction()
In particular, in stress tests it's possible to hit a deadlock on
FreeBSD when the number of free pages is very low, wherein the system is
waiting for the page daemon to reclaim memory, the page daemon is
waiting for the ARC eviction thread to finish, and the ARC eviction
thread is blocked waiting for more memory.
Try to reduce the likelihood of such deadlocks by pre-allocating markers
for the eviction thread at ARC initialization time. When evicting
buffers from an ARC state, check to see if the current thread is the ARC
eviction thread, and use the pre-allocated markers for that purpose
rather than dynamically allocating them.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: George Amanakis <[email protected]>
Signed-off-by: Mark Johnston <[email protected]>
Closes #12985
|
|
|
|
|
|
|
|
|
|
|
|
| |
Evaluated every variable that lives in .data (and globals in .rodata)
in the kernel modules, and constified/eliminated/localised them
appropriately. This means that all read-only data is now actually
read-only data, and, if possible, at file scope. A lot of previously-
global-symbols became inlinable (and inlined!) constants. Probably
not in a big Wowee Performance Moment, but hey.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12899
|
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #12934
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are the changes for FreeBSD corresponding to the changes made for
Linux in #12863, see that PR for details.
Changes from #12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD. This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open. The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #12934
|