| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity found a number of places where we either do MAX(unsigned, 0) or
do assertions that a unsigned variable is >= 0. These do nothing, so
let us drop them all.
It also found a spot where we do `if (unsigned >= 0 && ...)`. Let us
also drop the unsigned >= 0 check.
Reviewed-by: Neal Gompa <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes #13871
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity complained about this. An error from `hkdf_sha512()` before uio
initialization will cause pointers to uninitialized memory to be passed
to `zio_crypt_destroy_uio()`. This is a regression that was introduced
by cf63739191b6cac629d053930a4aea592bca3819. Interestingly, this never
affected FreeBSD, since the FreeBSD version never had that patch ported.
Since moving uio initialization to the top of this function would slow
down the qat_crypt() path, we only move the `memset()` calls to the top
of the function. This is sufficient to fix this problem.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Neal Gompa <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes #13944
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity reported that we pass a pointer to zfsvfs to
`dmu_objset_disown()` after freeing zfsvfs in zfsvfs_create_impl() after
a failure in zfsvfs_init().
We have nearly identical duplicate versions of this code for FreeBSD and
Linux, but interestingly, the FreeBSD version of this code differs in
such a way that it does not suffer from this bug. We remove the
difference from the FreeBSD version to fix this bug.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes #13883
|
|
|
|
|
|
|
|
| |
The zpl_fadvise() function was recently added and was not included
in the initial patch. Update it accordingly.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13831
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace ZFS_ENTER and ZFS_VERIFY_ZP, which have hidden returns, with
functions that return error code. The reason we want to do this is
because hidden returns are not obvious and had caused some missing fail
path unwinding.
This patch changes the common, linux, and freebsd parts. Also fixes
fail path unwinding in zfs_fsync, zpl_fsync, zpl_xattr_{list,get,set}, and
zfs_lookup().
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #13831
|
|
|
|
|
|
|
|
|
|
| |
Unused code detected by coverity.
Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Neal Gompa <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes #13868
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The purpose of this PR is to accepts fadvise ioctl from userland
to do read-ahead by demand.
It could dramatically improve sequential read performance especially
when primarycache is set to metadata or zfs_prefetch_disable is 1.
If the file is mmaped, generic_fadvise is also called for page cache
read-ahead besides dmu_prefetch.
Only POSIX_FADV_WILLNEED and POSIX_FADV_SEQUENTIAL are supported in
this PR currently.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Finix Yan <[email protected]>
Closes #13694
|
|
|
|
|
|
|
|
|
| |
As of the Linux 5.20 kernel blk_cleanup_disk() has been removed,
all callers should use put_disk().
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13728
|
|
|
|
|
|
|
|
|
| |
As of the Linux 5.20 kernel bdevname() has been removed, all
callers should use snprintf() and the "%pg" format specifier.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13728
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZIL kstats are reported in an inclusive way, i.e., same counters are
shared to capture all the activities happening in zil. Added support
to report zil stats for every datset individually by combining them
with already exposed dataset kstats.
Wmsum uses per cpu counters and provide less overhead as compared
to atomic operations. Updated zil kstats to replace wmsum counters
to avoid atomic operations.
Reviewed-by: Christian Schwarz <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ameer Hamza <[email protected]>
Closes #13636
|
|
|
|
|
|
|
|
| |
Makes the case sensitivity setting visible on Linux in /proc/mounts.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Ameer Hamza <[email protected]>
Closes #13607
|
|
|
|
|
|
|
|
|
| |
The commit replaces all findings of the link:
http://www.opensolaris.org/os/licensing with this one:
https://opensource.org/licenses/CDDL-1.0
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes #13619
|
|
|
|
|
|
|
|
|
| |
Specify the lua and zstd license text in the manor in which the
kernel MODULE_LICENSE macro requires it. The now duplicate entries
were merged and a comment added to make it clear what they apply to.
Reviewed-by: Christian Schwarz <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13641
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #13348
|
|
|
|
|
|
|
|
|
| |
Zp->z_mode is set at the same time inode->i_mode
is being changed. This has the effect of keeping both
in sync without relying on zfs_znode_update_vfs.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: yanping.gao <[email protected]>
Closes #13581
|
|
|
|
|
|
|
|
|
|
|
| |
Increase nlinks in stat results of ./zfs/snapshot based on snapshot
count. This provides quick and efficient method for administrators to
get snapshot counts without having to use libzfs or list the snapdir
contents.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Andrew Walker <[email protected]>
Closes #13559
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
```
os/linux/zfs/zvol_os.c:1111:3: error: ignoring return value of function
declared with 'warn_unused_result' attribute [-Werror,-Wunused-result]
add_disk(zv->zv_zso->zvo_disk);
^~~~~~~~ ~~~~~~~~~~~~~~~~~~~~
zpl_xattr.c:1579:1: warning: no previous prototype for function
'zpl_posix_acl_release_impl' [-Wmissing-prototypes]
```
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: szubersk <[email protected]>
Closes #13551
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows ZFS datasets to be delegated to a user/mount namespace
Within that namespace, only the delegated datasets are visible
Works very similarly to Zones/Jailes on other ZFS OSes
As a user:
```
$ unshare -Um
$ zfs list
no datasets available
$ echo $$
1234
```
As root:
```
# zfs list
NAME ZONED MOUNTPOINT
containers off /containers
containers/host off /containers/host
containers/host/child off /containers/host/child
containers/host/child/gchild off /containers/host/child/gchild
containers/unpriv on /unpriv
containers/unpriv/child on /unpriv/child
containers/unpriv/child/gchild on /unpriv/child/gchild
# zfs zone /proc/1234/ns/user containers/unpriv
```
Back to the user namespace:
```
$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
containers 129M 47.8G 24K /containers
containers/unpriv 128M 47.8G 24K /unpriv
containers/unpriv/child 128M 47.8G 128M /unpriv/child
```
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Will Andrews <[email protected]>
Signed-off-by: Allan Jude <[email protected]>
Signed-off-by: Mateusz Piotrowski <[email protected]>
Co-authored-by: Allan Jude <[email protected]>
Co-authored-by: Mateusz Piotrowski <[email protected]>
Sponsored-by: Buddy <https://buddy.works>
Closes #12263
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for the kernel's block multiqueue (blk-mq) interface in
the zvol block driver. blk-mq creates multiple request queues on
different CPUs rather than having a single request queue. This can
improve zvol performance with multithreaded reads/writes.
This implementation uses the blk-mq interfaces on 4.13 or newer
kernels. Building against older kernels will fall back to the
older BIO interfaces.
Note that you must set the `zvol_use_blk_mq` module param to
enable the blk-mq API. It is disabled by default.
In addition, this commit lets the zvol blk-mq layer process whole
`struct request` IOs at a time, rather than breaking them down
into their individual BIOs. This reduces dbuf lock contention
and overhead versus the legacy zvol submit_bio() codepath.
sequential dd to one zvol, 8k volblocksize, no O_DIRECT:
legacy submit_bio() 292MB/s write 453MB/s read
this commit 453MB/s write 885MB/s read
It also introduces a new `zvol_blk_mq_chunks_per_thread` module
parameter. This parameter represents how many volblocksize'd chunks
to process per each zvol thread. It can be used to tune your zvols
for better read vs write performance (higher values favor write,
lower favor read).
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #13148
Issue #12483
|
|
|
|
|
|
|
|
|
| |
As of the Linux 5.19 kernel the readpage() address space operation
has been replaced by read_folio().
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13515
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 5.19 commit torvalds/linux@44abff2c0 splits the secure
erase functionality from the blkdev_issue_discard() function.
The blkdev_issue_secure_erase() must now be issued to issue
a secure erase.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13515
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 5.19 commit torvalds/linux@44abff2c0 removed the
blk_queue_secure_erase() helper function. The preferred
interface is to now use the bdev_max_secure_erase_sectors()
function to check for discard support.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13515
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 5.19 commit torvalds/linux@70200574cc removed the
blk_queue_discard() helper function. The preferred interface
is to now use the bdev_max_discard_sectors() function to check
for discard support.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13515
|
|
|
|
|
|
|
|
|
|
| |
As for the Linux 5.18 kernel bio_alloc() expects a block_device struct
as an argument. This removes the need for the bio_set_dev() compatibility
code for 5.18 and newer kernels.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13515
|
|
|
|
|
|
|
|
|
| |
A followup to 849c14e04844a2f0e1f7e42886c2cef083563f35
Fix https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1009242
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: szubersk <[email protected]>
Closes #13389
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refcount creation for abd_zero_scatter->abd_children is redundant in
abd_alloc_zero_scatter, as it has been done in abd_init_struct.
In addition, abd_children is undefined when ZFS_DEBUG is disabled, the
reference of abd_children in abd_alloc_zero_scatter breaks build of
libzpool when ZFS_DEBUG is disabled.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: Ping Huang <[email protected]>
Closes #13429
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Page writebacks with WB_SYNC_NONE can take several seconds to complete
since they wait for the transaction group to close before being
committed. This is usually not a problem since the caller does not
need to wait. However, if we're simultaneously doing a writeback
with WB_SYNC_ALL (e.g via msync), the latter can block for several
seconds (up to zfs_txg_timeout) due to the active WB_SYNC_NONE
writeback since it needs to wait for the transaction to complete
and the PG_writeback bit to be cleared.
This commit deals with 2 cases:
- No page writeback is active. A WB_SYNC_ALL page writeback starts
and even completes. But when it's about to check if the PG_writeback
bit has been cleared, another writeback with WB_SYNC_NONE starts.
The sync page writeback ends up waiting for the non-sync page
writeback to complete.
- A page writeback with WB_SYNC_NONE is already active when a
WB_SYNC_ALL writeback starts. The WB_SYNC_ALL writeback ends up
waiting for the WB_SYNC_NONE writeback.
The fix works by carefully keeping track of active sync/non-sync
writebacks and committing when beneficial.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Shaan Nobee <[email protected]>
Closes #12662
Closes #12790
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using a Linux kernel which predates the iov_iter interface the
O_APPEND flag should be applied in zpl_aio_write() via the call to
generic_write_checks(). The updated pos variable was incorrectly
ignored resulting in the current offset being used.
This issue should only realistically impact the RHEL/CentOS 7.x
kernels which are based on Linux 3.10.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13370
Closes #13377
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace __set_page_dirty_nobuffers with filemap_dirty_folio.
Upstream-commit: 6b1f86f8e9c7f9de7ca1cb987b2cf25e99b1ae3a
("Merge tag 'folio-5.18b' of
git://git.infradead.org/users/willy/pagecache ")
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Authored-by: Satadru Pramanik <[email protected]>
Signed-off-by: Satadru Pramanik <[email protected]>
Closes #13325
Closes #13380
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 5.12 PPC 5.12 get_user() and __copy_from_user_inatomic()
inline helpers very indirectly include a reference to the GPL'd
array mmu_feature_keys[] and fails to build. Workaround this by
using copy_from_user() and throwing EFAULT for any calls to
__copy_from_user_inatomic(). This is a workaround until a fix
for Linux commit 7613f5a66becfd0e43a0f34de8518695888f5458
"powerpc/64s/kuap: Use mmu_has_feature()" is fully addressed.
Reviewed-by: Brian Behlendorf <[email protected]>
Authored-by: Colin Ian King <[email protected]>
Signed-off-by: szubersk <[email protected]>
Closes #11958
Closes #12590
Closes #13367
|
|
|
|
|
|
|
|
| |
Upstream-commit: cdb4f26a63c391317e335e6e683a614358e70aeb ("kobject:
kobj_type: remove default_attrs")
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #13357
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #13357
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out, no, in fact, ZERO_RANGE and PUNCH_HOLE do
have differing semantics in some ways - in particular,
one requires KEEP_SIZE, and the other does not.
Also added a zero-range test to catch this, corrected a flaw
that made the punch-hole test succeed vacuously, and a typo
in file_write.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes #13329
Closes #13338
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally it was thought it would be useful to split up the kmods
by functionality. This would allow external consumers to only load
what was needed. However, in practice we've never had a case where
this functionality would be needed, and conversely managing multiple
kmods can be awkward. Therefore, this change merges all but the
spl.ko kmod in to a single zfs.ko kmod.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #13274
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As of the 5.17 kernel the GENHD_FL_EXT_DEVT flag has been removed
and the GENHD_FL_NO_PART_SCAN flag renamed GENHD_FL_NO_PART. Update
zvol_alloc() to set GENHD_FL_NO_PART for the newer kernels which
is sufficient. The behavior for prior kernels remains unchanged.
1ebe2e5f ("block: remove GENHD_FL_EXT_DEVT")
46e7eac6 ("block: rename GENHD_FL_NO_PART_SCAN to GENHD_FL_NO_PART")
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #13294
Closes #13297
|
|
|
|
|
|
|
|
|
| |
->readpages was removed and replaced by ->readahead. Define
zpl_readahead for kernels that don't have ->readpages.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Riccardo Schirone <[email protected]>
Closes #13278
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Riccardo Schirone <[email protected]>
Closes #13278
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bypass check of ZFS aces if the ACL is trivial. When an ACL is
trivial its permissions are represented by the mode without any
loss of information. In this case, it is safe to convert the
access request into equivalent mode and then pass desired mask
and inode to generic_permission(). This has the added benefit
of also checking whether entries in a POSIX ACL on the file grant
the desired access.
This commit also skips the ACL check on looking up the xattr dir
since such restrictions don't exist in Linux kernel and it makes
xattr lookup behavior inconsistent between SA and file-based
xattrs. We also don't want to perform a POSIX ACL check while
looking up the POSIX ACL if for some reason it is located in
the xattr dir rather than an SA.
Reviewed-by: Brian Behlendorf <[email protected]>
Co-authored-by: Ryan Moeller <[email protected]>
Signed-off-by: Andrew Walker <[email protected]>
Closes #13237
|
|
|
|
|
|
|
|
|
|
| |
The argument type of rw_destroy is (krwlock_t *) while currently
krwlock_t is passed in zfs_ctldir.c. This error is hidden because
rw_destroy is defined as ((void) 0) in linux. But anyway, this
mismatch should be fixed.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ping Huang <[email protected]>
Closes #13272
|
|
|
|
|
|
|
|
| |
When HAVE_BLKDEV_GET_ERESTARTSYS is defined, compiler will complain
"defined but not used" warning for zvol_open_timeout_ms.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ping Huang <[email protected]>
Closes #13270
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
blkdev.h includes genhd.h since dawn of upstream git, so this is
globally safe
Upstream-commit: 322cbb50de711814c42fb088f6d31901502c711a ("block:
remove genhd.h")
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #13251
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bio_alloc(gfp_t gfp_mask, unsigned short nr_iovecs)
became
bio_alloc(struct block_device *bdev, unsigned short nr_vecs,
unsigned int opf, gfp_t gfp_mask)
passing NULL/0 continues previous behaviour
Upstream-commit: 07888c665b405b1cd3577ddebfeb74f4717a84c4 ("block:
pass a block_device and opf to bio_alloc")
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #13251
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR changes ZFS ACL checks to evaluate
fsuid / fsgid rather than euid / egid to avoid
accidentally granting elevated permissions to
NFS clients.
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Co-authored-by: Andrew Walker <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #13221
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12996
|
|
|
|
|
|
|
|
|
|
| |
bcopy() has a confusing argument order and is actually a move, not a
copy; they're all deprecated since POSIX.1-2001 and removed in -2008,
and we shim them out to mem*() on Linux anyway
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #12996
|
|
|
|
|
|
|
|
|
|
|
| |
On some architectures ZERO_PAGE is unavailable because it references
a GPL exported symbol of empty_zero_page. Originally e08b993 removed
the call to PAGE_ZERO(0) for assignment to the abd_zero_page. However,
a simple check can be done to avoid a kernel allocation and free for
the abd_zero_page if ZERO_PAGE is available.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Brian Atkinson <[email protected]>
Closes #13199
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZFS allows to update and retrieve additional file level attributes for
FreeBSD. This commit allows additional file level attributes to be
updated and retrieved for Linux. These include the flags stored in the
upper half of z_pflags only.
Two new IOCTLs have been added for this purpose. ZFS_IOC_GETDOSFLAGS
can be used to retrieve the attributes, while ZFS_IOC_SETDOSFLAGS can
be used to update the attributes.
Attributes that are allowed to be updated include ZFS_IMMUTABLE,
ZFS_APPENDONLY, ZFS_NOUNLINK, ZFS_ARCHIVE, ZFS_NODUMP, ZFS_SYSTEM,
ZFS_HIDDEN, ZFS_READONLY, ZFS_REPARSE, ZFS_OFFLINE and ZFS_SPARSE.
Flags can be or'd together while calling ZFS_IOC_SETDOSFLAGS.
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes #13118
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As such, there are no specific synchronous semantics defined for
the xattrs. But for xattr=on, it does log to ZIL and zil_commit() is
done, if sync=always is set on dataset. This provides sync semantics
for xattr=on with sync=always set on dataset.
For the xattr=sa implementation, it doesn't log to ZIL, so, even with
sync=always, xattrs are not guaranteed to be synced before xattr call
returns to caller. So, xattr can be lost if system crash happens, before
txg carrying xattr transaction is synced.
This change adds xattr=sa logging to ZIL on xattr create/remove/update
and xattrs are synced to ZIL (zil_commit() done) for sync=always.
This makes xattr=sa behavior similar to xattr=on.
Implementation notes:
The actual logging is fairly straight-forward and does not warrant
additional explanation.
However, it has been 14 years since we last added new TX types
to the ZIL [1], hence this is the first time we do it after the
introduction of zpool features. Therefore, here is an overview of the
feature activation and deactivation workflow:
1. The feature must be enabled. Otherwise, we don't log the new
record type. This ensures compatibility with older software.
2. The feature is activated per-dataset, since the ZIL is per-dataset.
3. If the feature is enabled and dataset is not for zvol, any append to
the ZIL chain will activate the feature for the dataset. Likewise
for starting a new ZIL chain.
4. A dataset that doesn't have a ZIL chain has the feature deactivated.
We ensure (3) by activating on the first zil_commit() after the feature
was enabled. Since activating the features requires waiting for txg
sync, the first zil_commit() after enabling the feature will be slower
than usual. The downside is that this is really a conservative
approximation: even if we never append a 'TX_SETSAXATTR' to the ZIL
chain, we pay the penalty for feature activation. The upside is that the
user is in control of when we pay the penalty, i.e., upon enabling the
feature.
We ensure (4) by hooking into zil_sync(), where ZIL destroy actually
happens.
One more piece on feature activation, since it's spread across
multiple functions:
zil_commit()
zil_process_commit_list()
if lwb == NULL // first zil_commit since zil_open
zil_create()
if no log block pointer in ZIL header:
if feature enabled and not active:
// CASE 1
enable, COALESCE txg wait with dmu_tx that allocated the
log block
else // log block was allocated earlier than this zil_open
if feature enabled and not active:
// CASE 2
enable, EXPLICIT txg wait
else // already have an in-DRAM LWB
if feature enabled and not active:
// this happens when we enable the feature after zil_create
// CASE 3
enable, EXPLICIT txg wait
[1] https://github.com/illumos/illumos-gate/commit/da6c28aaf62fa55f0fdb8004aa40f88f23bf53f0
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Christian Schwarz <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jitendra Patidar <[email protected]>
Closes #8768
Closes #9078
|
|
|
|
|
|
|
| |
Reviewed-by: Alejandro Colomar <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #13110
|
|
|
|
|
|
|
| |
Reviewed-by: Alejandro Colomar <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes #13110
|