| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
6.7 changes the shrinker API such that shrinkers must be allocated
dynamically by the kernel. To accomodate this, this commit reworks
spl_register_shrinker() to do something similar against earlier kernels.
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://github.com/sponsors/robn
|
|
|
|
|
|
|
|
|
| |
In 6.7 the superblock shrinker member s_shrink has changed from being an
embedded struct to a pointer. Detect this, and don't take a reference if
it already is one.
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://github.com/sponsors/robn
|
|
|
|
|
|
|
|
|
| |
6.6 made i_ctime inaccessible; 6.7 has done the same for i_atime and
i_mtime. This extends the method used for ctime in b37f29341 to atime
and mtime as well.
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://github.com/sponsors/robn
|
|
|
|
|
|
|
|
|
|
| |
6.7 changed the names of the time members in struct inode, so we can't
assign back to it because we don't know its name. In practice this
doesn't matter though - if we're missing current_time(), then we must be
on <4.9, and we know our fallback will need to return timespec.
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://github.com/sponsors/robn
|
|
|
|
|
|
| |
META file and changelog updated.
Signed-off-by: Tony Hutter <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Call vfs_exjail_clone() for mounts created under .zfs/snapshot
to fill in the mnt_exjail field for the mount. If this is not
done, the snapshots under .zfs/snapshot with not be accessible
over NFS.
This version has the argument name in vfs.h fixed to match that
of the name in spl_vfs.c, although it really does not matter.
External-issue: https://reviews.freebsd.org/D42672
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rick Macklem <[email protected]>
Closes #15563
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zil_claim_clone_range() takes references on cloned blocks before ZIL
replay. Later zil_free_clone_range() drops them after replay or on
dataset destroy. The total balance is neutral. It means on actual
replay we must take additional references, which would stay in BRT.
Without this blocks could be freed prematurely when either original
file or its clone are destroyed. I've observed BRT being emptied
and the feature being deactivated after ZIL replay completion, which
should not have happened. With the patch I see expected stats.
Reviewed-by: Kay Pedersen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes #15603
|
|
|
|
|
|
|
|
|
| |
Bug introduced in 213d6829673.
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Warner Losh <[email protected]>
Signed-off-by: Martin Matuska <[email protected]>
Closes #15606
|
|
|
|
|
|
|
|
| |
With Linux v6.6.x and clang 16, a configure step fails on a warning that
later results in an error while building, due to 'ts' being
uninitialized. Add a trivial initialization to silence the warning.
Signed-off-by: Jaron Kent-Dobias <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
|
|
|
|
|
|
|
|
|
| |
If all zfs dkms modules have been removed, a shell-init error message
may appear, because /var/lib/dkms/zfs does no longer exist.
Resolve this by leaving the directory earlier on.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Mart Frauenlob <[email protected]>
Closes #15576
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was broken for several reasons:
* VOP_UNLOCK lost an argument in 13.0. So OpenZFS should be using
VOP_UNLOCK1, but a few direct calls to VOP_UNLOCK snuck in.
* The location of the zlib header moved in 13.0 and 12.1. We can drop
support for building on 12.0, which is EoL.
* knlist_init lost an argument in 13.0. OpenZFS change 9d0887402ba
assumed 13.0 or later.
* FreeBSD 13.0 added copy_file_range, and OpenZFS change 67a1b037915
assumed 13.0 or later.
Sponsored-by: Axcient
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Closes #15551
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, dmu_buf_will_clone() would roll back any dirty record, but
would not clean out the modified data nor reset the state before
releasing the lock. That leaves the last-written data in db_data, but
the dbuf in the wrong state.
This is eventually corrected when the dbuf state is made NOFILL, and
dbuf_noread() called (which clears out the old data), but at this point
its too late, because the lock was already dropped with that invalid
state.
Any caller acquiring the lock before the call into
dmu_buf_will_not_fill() can find what appears to be a clean, readable
buffer, and would take the wrong state from it: it should be getting the
data from the cloned block, not from earlier (unwritten) dirty data.
Even after the state was switched to NOFILL, the old data was still not
cleaned out until dbuf_noread(), which is another gap for a caller to
take the lock and read the wrong data.
This commit fixes all this by properly cleaning up the previous state
and then setting the new state before dropping the lock. The
DBUF_VERIFY() calls confirm that the dbuf is in a valid state when the
lock is down.
Sponsored-by: Klara, Inc.
Sponsored-By: OpenDrives Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Pawel Jakub Dawidek <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes #15566
Closes #15526
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zdb with '-e' or exported zpool doesn't work along with
'-O' and '-r' options as we process them before '-e' has
been processed.
Below errors are seen:
~> zdb -e pool-mds65/mdt65 -O oi.9/0x200000009:0x0:0x0
failed to hold dataset 'pool-mds65/mdt65': No such file or directory
~> zdb -e pool-oss0/ost0 -r file1 /tmp/filecopy1 -p.
failed to hold dataset 'pool-oss0/ost0': No such file or directory
zdb: internal error: No such file or directory
We need to make sure to process '-O|-r' options after the
'-e' option has been processed, which imports the pool to
the namespace if it's not in the cachefile.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Akash B <[email protected]>
Closes #15532
|
|
|
|
|
|
|
|
|
|
| |
Same idea as the dedup stats, but for block cloning.
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes #15541
|
|
|
|
|
|
|
|
|
|
| |
So that zdb (and others!) can get at the BRT on-disk structures.
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes #15541
|
|
|
|
|
|
|
|
|
|
|
| |
The zfs_load-key tests were failing on F39 due to their use of the
deprecated ssl.wrap_socket function. This commit updates the test to
instead use ssl.SSLContext() as described in
https://stackoverflow.com/a/65194957.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #15534
Closes #15550
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case of crash cloned blocks need to be claimed on pool import.
It is only possible if they (lr_bps) and their count (lr_nbps) are
not encrypted but only authenticated, similar to block pointer in
lr_write_t. Few other fields can be and are still encrypted.
This should fix panic on ZIL claim after crash when block cloning
is actively used.
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Sean Eric Fagan <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Edmund Nadolski <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes #15543
Closes #15513
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Over its history this the dirty dnode test has been changed between
checking for a dnodes being on `os_dirty_dnodes` (`dn_dirty_link`) and
`dn_dirty_record`.
de198f2d9 Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
2531ce372 Revert "Report holes when there are only metadata changes"
ec4f9b8f3 Report holes when there are only metadata changes
454365bba Fix dirty check in dmu_offset_next()
66aca2473 SEEK_HOLE should not block on txg_wait_synced()
Also illumos/illumos-gate@c543ec060d illumos/illumos-gate@2bcf0248e9
It turns out both are actually required.
In the case of appending data to a newly created file, the dnode proper
is dirtied (at least to change the blocksize) and dirty records are
added. Thus, a single logical operation is represented by separate
dirty indicators, and must not be separated.
The incorrect dirty check becomes a problem when the first block of a
file is being appended to while another process is calling lseek to skip
holes. There is a small window where the dnode part is undirtied while
there are still dirty records. In this case, `lseek(fd, 0, SEEK_DATA)`
would not know that the file is dirty, and would go to
`dnode_next_offset()`. Since the object has no data blocks yet, it
returns `ESRCH`, indicating no data found, which results in `ENXIO`
being returned to `lseek()`'s caller.
Since coreutils 9.2, `cp` performs sparse copies by default, that is, it
uses `SEEK_DATA` and `SEEK_HOLE` against the source file and attempts to
replicate the holes in the target. When it hits the bug, its initial
search for data fails, and it goes on to call `fallocate()` to create a
hole over the entire destination file.
This has come up more recently as users upgrade their systems, getting
OpenZFS 2.2 as well as a newer coreutils. However, this problem has been
reproduced against 2.1, as well as on FreeBSD 13 and 14.
This change simply updates the dirty check to check both types of dirty.
If there's anything dirty at all, we immediately go to the "wait for
sync" stage, It doesn't really matter after that; both changes are on
disk, so the dirty fields should be correct.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Rich Ercolani <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes #15571
Closes #15526
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit bd7a02c251d8c119937e847d5161b512913667e6 which
can trigger an unlikely existing bio alignment issue on Linux.
This change is good, but the underlying issue it exposes needs to
be resolved before this can be re-applied.
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #15533
|
|
|
|
|
|
| |
META file and changelog updated.
Signed-off-by: Tony Hutter <[email protected]>
|
|
|
|
|
|
|
|
| |
Many tests are failing on AlmaLinux 9 because ZTS could not destroy the
pool in cleanup. This was due to $PWD being set to '.' instead of the
expected full path. This patch sets $PWD to the full path.
Signed-off-by: Tony Hutter <[email protected]>
|
|
|
|
|
|
|
| |
Disable block cloning by default to mitigate possible data corruption
(see #15529 and #15526).
Signed-off-by: Tony Hutter <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Copy the disable parameter that FreeBSD implemented, and extend it to
work on Linux as well, until we're sure this is stable.
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes #15529
|
|
|
|
|
|
|
|
|
| |
Auto-generate changelog based off on @VERSION@ during configure,
so that it is not needed to be update with new releases / version
updates.
Signed-off-by: Umer Saleem <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Update the META file to reflect compatibility with the 6.6 kernel.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Umer Saleem <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #15520
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This gets around UBSAN errors when using arrays at the end of
structs. It converts some zero-length arrays to variable length
arrays and disables UBSAN checking on certain modules.
It is based off of the patch from #15460.
Reviewed-by: Brian Behlendorf <[email protected]>
Tested-by: Thomas Lamprecht <[email protected]>
Co-authored-by: Thomas Lamprecht <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Issue #15145
Closes #15510
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zpool_create_features_007_pos only tested for compat-2020 feature
set. It would be useful to test for all known features sets. If
any additional feature is found enabled that is not present in
compatibility list or feature set, it should be caught and
reported earlier.
This commit also removes encryption from openzfsonosx-1.8.1
compatibility list. Encryption enables bookmark_v2, since it is
a dependency of encryption, but not listed in openzfsonoxx-1.8.1
compatibility list.
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes #15505
|
|
|
|
|
|
|
|
|
|
| |
This commit updates zpool-features.7 man page to add newly added
zpool features to grub2 compatibility list.
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes #15505
|
|
|
|
|
|
|
|
|
|
|
| |
As shown in #15404#issuecomment-1765002181, Ubuntu kernel has
'Provides: zfs-dkms', which will cause uninstall of the kernel, when
attempting to install openzfs-zfs-dkms.
As a workaround remove the 'Conflicts: zfs-dkms' definition from
the debian control file.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Mart Frauenlob <[email protected]>
Closes #15503
|
|
|
|
|
|
|
|
|
|
| |
Private read/write mapping can't be used to modify the mapped files, so
they will remain be immutable. Private read/write mappings are usually
used to load the data segment of executable files, rejecting them will
rendering immutable executable files to stop working.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: WHR <[email protected]>
Closes #15344
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently vdev_queue_class_length is responsible for checking how long
the queue length is, however, it doesn't check the length when a list
is used, rather it just returns whether it is empty or not. To fix this
I added a counter variable to vdev_queue_class to keep track of the sync
IO ops, and changed vdev_queue_class_length to reference this variable
instead.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: MigeljanImeri <[email protected]>
Closes #15478
|
|
|
|
|
|
|
|
|
|
|
|
| |
With Linux v6.6.0 and GCC 12, when debug build is configured,
implicit conversion error is raised while converting
'enum <anonymous>' to 'boolean_t'. Use 'B_TRUE' instead of
'true' to fix the issue.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Pavel Snajdr <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes #15489
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PR#15459 add all read-only compatible zpool features to grub2
compatibility list. 'obsolete_counts' is a read-only features that
depends on 'device_removal' feature which is not read-only and
is marked as ZFEATURE_FLAG_MOS. Creating a pool with grub2
compatibility enables 'device_removal' feature as well, which is
not desired.
This commit removes the 'obsolete_counts' feature from
grub2 compatibility list, as GRUB only supports read-only
compatible features.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes #15499
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Alien does not honour the %posttrans hook.
So move the dkms uninstall/install scripts to the
%pre/%post hooks in case of package install/upgrade.
In case of package removal, handle that in %preun.
Add removal of all old dkms modules.
Add checking for broken 'dkms status'. Handle that as
good as possible and warn the user about it.
Also add more verbose messages about what we are doing.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Mart Frauenlob <[email protected]>
Closes #15415
|
|
|
|
|
|
|
|
|
|
|
| |
GRUB opens the boot pool in read-only mode. All read-only
compatible features for zpool can be enabled and added to
grub2 compatibility, as GRUB does not open the boot-pool
for write.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes #15459
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no sense to have separate implementations for FreeBSD and
Linux. Make Linux code shared as more functional and just register
FreeBSD-specific prune callback with arc_add_prune_callback() API.
Aside of code cleanup this should fix excessive pruning on FreeBSD:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274698
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Mark Johnston <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes #15456
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We should not always use PAGESIZE alignment for caches bigger than
it and SPA_MINBLOCKSIZE otherwise. Doing that caches for 5, 6, 7,
10 and 14KB rounded up to 8, 12 and 16KB respectively make no sense.
Instead specify as alignment the biggest power-of-2 divisor. This
way 2KB and 6KB caches are both aligned to 2KB, while 4KB and 8KB
are aligned to 4KB.
Reduce number of caches to half-power of 2 instead of quarter-power
of 2. This removes caches difficult for underlying allocators to
fit into page-granular slabs, such as: 2.5, 3.5, 5, 7, 10KB, etc.
Since these caches are mostly used for transient allocations like
ZIOs and small DBUF cache it does not worth being too aggressive.
Due to the above alignment issue some of those caches were not
working properly any way. 6KB cache now finally has a chance to
work right, placing 2 buffers into 3 pages, that makes sense.
Remove explicit alignment in Linux user-space case. I don't think
it should be needed any more with the above fixes.
As result on FreeBSD instead of such numbers of pages per slab:
vm.uma.zio_buf_comb_16384.keg.ppera: 4
vm.uma.zio_buf_comb_14336.keg.ppera: 4
vm.uma.zio_buf_comb_12288.keg.ppera: 3
vm.uma.zio_buf_comb_10240.keg.ppera: 3
vm.uma.zio_buf_comb_8192.keg.ppera: 2
vm.uma.zio_buf_comb_7168.keg.ppera: 2
vm.uma.zio_buf_comb_6144.keg.ppera: 2 <= Broken
vm.uma.zio_buf_comb_5120.keg.ppera: 2
vm.uma.zio_buf_comb_4096.keg.ppera: 1
vm.uma.zio_buf_comb_3584.keg.ppera: 7 <= Hard to free
vm.uma.zio_buf_comb_3072.keg.ppera: 3
vm.uma.zio_buf_comb_2560.keg.ppera: 2
vm.uma.zio_buf_comb_2048.keg.ppera: 1
vm.uma.zio_buf_comb_1536.keg.ppera: 2
vm.uma.zio_buf_comb_1024.keg.ppera: 1
vm.uma.zio_buf_comb_512.keg.ppera: 1
I am now getting such:
vm.uma.zio_buf_comb_16384.keg.ppera: 4
vm.uma.zio_buf_comb_12288.keg.ppera: 3
vm.uma.zio_buf_comb_8192.keg.ppera: 2
vm.uma.zio_buf_comb_6144.keg.ppera: 3 <= Fixed, 2 in 3 pages
vm.uma.zio_buf_comb_4096.keg.ppera: 1
vm.uma.zio_buf_comb_3072.keg.ppera: 3
vm.uma.zio_buf_comb_2048.keg.ppera: 1
vm.uma.zio_buf_comb_1536.keg.ppera: 2
vm.uma.zio_buf_comb_1024.keg.ppera: 1
vm.uma.zio_buf_comb_512.keg.ppera: 1
Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes #15452
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dmu_tx_check_ioerr() pre-reads blocks that are going to be dirtied
as part of transaction to both prefetch them and check for errors.
But it makes no sense to do it for holes, since there are no disk
reads to prefetch and there can be no errors. On the other side
those blocks are anonymous, and they are freed immediately by the
dbuf_rele() without even being put into dbuf cache, so we just
burn CPU time on decompression and overheads and get absolutely
no result at the end.
Use of dbuf_hold_impl() with fail_sparse parameter allows to skip
the extra work, and on my tests with sequential 8KB writes to empty
ZVOL with 32KB blocks shows throughput increase from 1.7 to 2GB/s.
Reviewed-by: Brian Atkinson <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes #15371
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Linux commit 560e20e4bf6484a0c12f9f3c7a1aa55056948e1e, the
fsync_bdev() function was removed in favor of sync_blockdev() to do
(roughly) the same thing, given the same input. This change
conditionally attempts to call sync_blockdev() if fsync_bdev() isn't
discovered during configure.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes #15263
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 0d72b92883c651a11059d93335f33d65c6eb653b, a new u32 argument
for the request_mask was added to generic_fillattr. This is the same
request_mask for statx that's present in the most recent API implemented
by zpl_getattr_impl. This commit conditionally adds it to the
zpl_generic_fillattr(...) macro, as well as the zfs_getattr_fast(...)
implementation, when configure determines it's present in the kernel's
generic_fillattr(...).
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes #15263
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Linux commit 13bc24457850583a2e7203ded05b7209ab4bc5ef, direct access
to the i_ctime member of struct inode was removed. The new approach is
to use accessor methods that exclusively handle passing the timestamp
around by value. This change adds new tests for each of these functions
and introduces zpl_* equivalents in include/os/linux/zfs/sys/zpl.h. In
where the inode_get/set_ctime*() functions exist, these zpl_* calls will
be mapped to the new functions. On older kernels, these macros just wrap
direct-access calls. The code that operated on an address of ip->i_ctime
to call ZFS_TIME_DECODE() now will take a local copy using
zpl_inode_get_ctime(), and then pass the address of the local copy when
performing the ZFS_TIME_DECODE() call, in all cases, rather than
directly accessing the member.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes #15263
Closes #15257
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prefetched buffers are currently read from L2ARC if, and only if,
l2arc_noprefetch is set to non-default value of 0. This means that
a streaming read which can be served from L2ARC will instead engage
the main pool.
For example, consider what happens when a file is sequentially read:
- application requests contiguous data, engaging the prefetcher;
- ARC buffers are initially marked as prefetched but, as the calling
application consumes data, the prefetch tag is cleared;
- these "normal" buffers become eligible for L2ARC and are copied to it;
- re-reading the same file will *not* engage L2ARC even if it contains
the required buffers;
- main pool has to suffer another sequential read load, which (due to
most NCQ-enabled HDDs preferring sequential loads) can dramatically
increase latency for uncached random reads.
In other words, current behavior is to write data to L2ARC (wearing it)
without using the very same cache when reading back the same data. This
was probably useful many years ago to preserve L2ARC read bandwidth but,
with current SSD speed/size/price, it is vastly sub-optimal.
Setting l2arc_noprefetch=1, while enabling L2ARC to serve these reads,
means that even prefetched but unused buffers will be copied into L2ARC,
further increasing wear and load for potentially not-useful data.
This patch enable prefetched buffer to be read from L2ARC even when
l2arc_noprefetch=1 (default), increasing sequential read speed and
reducing load on the main pool without polluting L2ARC with not-useful
(ie: unused) prefetched data. Moreover, it clear users confusion about
L2ARC size increasing but not serving any IO when doing sequential
reads.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Gionatan Danti <[email protected]>
Closes #15451
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Many long-running ZFS ioctls lock the spa_namespace_lock, forcing
concurrent ioctls to sleep for the mutex. Previously, the only
option is to call mutex_enter() which sleeps uninterruptibly. This
is a usability issue for sysadmins, for example, if the admin runs
`zpool status` while a slow `zpool import` is ongoing, the admin's
shell will be locked in uninterruptible sleep for a long time.
This patch resolves this admin usability issue by introducing
mutex_enter_interruptible() which sleeps interruptibly while waiting
to acquire a lock. It is implemented for both Linux and FreeBSD.
The ZFS_IOC_POOL_CONFIGS ioctl, used by `zpool status`, is changed to
use this new macro so that the command can be interrupted if it is
issued during a concurrent `zpool import` (or other long-running
operation).
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Thomas Bertschinger <[email protected]>
Closes #15360
|
|
|
|
|
|
|
|
|
| |
This reverts commit aefb6a2bd6c24597cde655e9ce69edd0a4c34357.
aefb6a2bd temporally disabled blk-mq until we could fix a fix for
Signed-off-by: Tony Hutter <[email protected]>
Closes #15439
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fix removes a dubious optimization in zfs_uiomove_bvec_rq()
that saved the iterator contents of a rq_for_each_segment(). This
optimization allowed restoring the "saved state" from a previous
rq_for_each_segment() call on the same uio so that you wouldn't
need to iterate though each bvec on every zfs_uiomove_bvec_rq() call.
However, if the kernel is manipulating the requests/bios/bvecs under
the covers between zfs_uiomove_bvec_rq() calls, then it could result
in corruption from using the "saved state". This optimization
results in an unbootable system after installing an OS on a zvol
with blk-mq enabled.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #15351
|
|
|
|
|
|
|
|
| |
The first occurrence should be "ARC prefetch data accesses:"
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: ofthesun9 <[email protected]>
Closes #15427
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In my understanding ARC_BUF_SHARED() and arc_buf_is_shared() should
return identical results, except the second also asserts it deeper.
The first is much cheaper though, saving few pointer dereferences.
Replace production arc_buf_is_shared() calls with ARC_BUF_SHARED(),
and call arc_buf_is_shared() in random assertions, while making it
even more strict.
On my tests this in half reduces arc_buf_destroy_impl() time, that
noticeably reduces hash_lock congestion under heavy dbuf eviction.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes #15397
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Torn reads/writes of dp_dirty_total are unlikely: on 64-bit systems
due to register size, while on 32-bit due to memory constraints.
And even if we hit some race, the code implementing the delay takes
the lock any way.
Removal of the poll-wide lock acquisition saves ~1% of CPU time on
8-thread 8KB write workload.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes #15390
|
|
|
|
|
|
|
|
|
|
|
| |
Once we trigger the zpool scrub, all zpool/zfs command gets stuck for
180 seconds. Post 180 seconds zpool/zfs commands gets start executing
however few more seconds(10s) it take to update the status. hence
sleeping for 200 seconds so that we get the correct status.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: vaibhav.bhanawat <[email protected]>
Closes #15364
|
|
|
|
|
|
|
|
|
| |
We already use ____cacheline_aligned in many places, so add one more
instead of seems arbitrary char tc_pad[8].
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by: iXsystems, Inc.
Closes #15402
|