aboutsummaryrefslogtreecommitdiffstats
path: root/module/os
Commit message (Collapse)AuthorAgeFilesLines
* Add note for printing all dbgmsg entries on FreeBSDRich Ercolani2021-05-251-1/+4
| | | | | | | | | | | | | | | I looked for a bit, and couldn't find any documentation on how to print all logged dbgmsg entries, just messages since the DTrace probe started, until @allanjude kindly pointed me toward the sysctl. So let's add that note where the DTrace probe is mentioned for FreeBSD, so other people can find it. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Allan Jude <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12113
* FreeBSD: Retry OCF ENOMEM errors.Alexander Motin2021-05-241-3/+5
| | | | | | | | | | | | | | | ZFS does not expect transient errors from crypto. For read they are counted as checksum errors, while for write end up in panic. To not panic on random low memory conditions retry ENOMEM errors in the OCF wrapper function. While there remove unneeded timeout and priority from msleep(). External-issue: https://reviews.freebsd.org/D30339 Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #12077
* Update tmpfile() existence detectionRich Ercolani2021-05-201-0/+5
| | | | | | | | | | | | Linux changed the tmpfile() signature again in torvalds/linux@6521f89, which in turn broke our HAVE_TMPFILE detection in configure. Update that macro to include the new case, and change the signature of zpl_tmpfile as appropriate. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes: #12060 Closes: #12087
* Simple change to fix building in recent environmentsRich Ercolani2021-05-191-4/+4
| | | | | | | | | | | Renamed _fini too for symmetry. Suggested-by: @ensch Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12059 Closes: #11987 Closes: #12056
* FreeBSD: Implement xattr=saRyan Moeller2021-05-133-147/+395
| | | | | | | | | | | | | | | | | | | | FreeBSD historically has not cared about the xattr property; it was always treated as xattr=on. With xattr=on, xattrs are stored as files in a hidden xattr directory. With xattr=sa, xattrs are stored as system attributes and get cached in nvlists during xattr operations. This makes SA xattrs simpler and more efficient to manipulate. FreeBSD needs to implement the SA xattr operations for feature parity with Linux and to ensure that SA xattrs are accessible when migrated or replicated from Linux. Following the example set by Linux, refactor our existing extattr vnops to split off the parts handling dir style xattrs, and add the corresponding SA handling parts. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11997
* FreeBSD: Use SET_ERROR to trace xattr name errorsRyan Moeller2021-05-131-4/+4
| | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11997
* Revert "Fix raw sends on encrypted datasets when copying back snapshots"Brian Behlendorf2021-05-132-28/+2
| | | | | | | | | | | | | | | Commit d1d4769 takes into account the encryption key version to decide if the local_mac could be zeroed out. However, this could lead to failure mounting encrypted datasets created with intermediate versions of ZFS encryption available in master between major releases. In order to prevent this situation revert d1d4769 pending a more comprehensive fix which addresses the mount failure case. Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #11294 Issue #12025 Issue #12300 Closes #12033
* linux 5.13 compat: bdevops->revalidate_disk() removedColeman Kane2021-05-111-0/+2
| | | | | | | | | | | | | | Linux kernel commit 0f00b82e5413571ed225ddbccad6882d7ea60bc7 removes the revalidate_disk() handler from struct block_device_operations. This caused a regression, and this commit eliminates the call to it and the assignment in the block_device_operations static handler assignment code, when configure identifies that the kernel doesn't support that API handler. Reviewed-by: Colin Ian King <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #11967 Closes #11977
* Remove unimplemented virus scanning hooksRyan Moeller2021-05-104-57/+0
| | | | | | | Reviewed-by: Adam Moss <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11972
* FreeBSD: Remove !FreeBSD ifdef'd codeRyan Moeller2021-05-071-35/+1
| | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11994
* Clean up use of zfs_log_create in zfs_dirRyan Moeller2021-05-072-4/+4
| | | | | | | | | | zfs_log_create returns void, so there is no reason to cast its return value to void at the call site. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11994
* Return required size when encode_fh size too smallAlyssa Ross2021-05-072-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | Quoting <linux/exportfs.h>: > encode_fh() should return the fileid_type on success and on error > returns 255 (if the space needed to encode fh is greater than > @max_len*4 bytes). On error @max_len contains the minimum size (in 4 > byte unit) needed to encode the file handle. ZFS was not setting max_len in the case where the handle was too small. As a result of this, the `t_name_to_handle_at.c' example in name_to_handle_at(2) did not work on ZFS. zfsctl_fid() will itself set max_len if called with a fid that is too small, so if we give zfs_fid() that behavior as well, the fix is quite easy: if the handle is too small, just use a zero-size fid instead of the handle. Tested by running t_name_to_handle_at on a normal file, a directory, a .zfs directory, and a snapshot. Thanks-to: Puck Meerburg <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Alyssa Ross <[email protected]> Closes #11995
* FreeBSD: Initialize/destroy zp->z_lockRyan Moeller2021-05-061-0/+2
| | | | | | | | | | zp->z_lock is used in shared code for protecting projid and scantime. We don't exercise these paths much if at all on FreeBSD, so have been lucky enough not to have issues with the uninitialized locks so far. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12003
* Miscellaneous code cleanupRyan Moeller2021-04-303-9/+5
| | | | | | | | | | | | | Remove some extra whitespace. Use pointer-typed asserts in Linux's znode cache destructor for more info when debugging. Simplify a couple of conversions from inode to znode when we already have the znode. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11974
* FreeBSD: Clean up ASSERT/VERIFY use in moduleRyan Moeller2021-04-3023-238/+233
| | | | | | | | | | | | | | Convert use of ASSERT() to ASSERT0(), ASSERT3U(), ASSERT3S(), ASSERT3P(), and likewise for VERIFY(). In some cases it ended up making more sense to change the code, such as VERIFY on nvlist operations that I have converted to use fnvlist instead. In one place I changed an internal struct member from int to boolean_t to match its use. Some asserts that combined multiple checks with && in a single assert have been split to separate asserts, to make it apparent which check fails. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11971
* FreeBSD: Prune some unneeded definitionsRyan Moeller2021-04-302-2/+2
| | | | | | | | | IS_XATTRDIR is never used. v_count is only used in two places, one immediately followed by the use of the real name, v_usecount. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11973
* Drop "All rights reserved" from files by [email protected]Martin Matuška2021-04-271-1/+0
| | | | | | | | | | This obeys the change in freebsd/freebsd-src@bce7ee9d4 External-issue: https://reviews.freebsd.org/D26980 Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Martin Matuska <[email protected]> Closes #11947
* FreeBSD: damage control racing .. lookups in face of mkdir/rmdirMateusz Guzik2021-04-261-0/+27
| | | | | | | External-issue: https://reviews.freebsd.org/D29769 Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11926
* linux/spl: proc: use global table_{min,max} values instead of local onesнаб2021-04-151-6/+6
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #11879
* linux/spl: base proc_dohostid() on proc_dostring()наб2021-04-151-76/+17
| | | | | | | | | | | | | | | | | | | | | | This fixes /proc/sys/kernel/spl/hostid on kernels with mainline commit 32927393dc1ccd60fb2bdc05b9e8e88753761469 ("sysctl: pass kernel pointers to ->proc_handler") ‒ 5.7-rc1 and up The access_ok() check in copy_to_user() in proc_copyout_string() would always fail, so all userspace reads and writes would fail with EINVAL proc_dostring() strips only the final new-line, but simple_strtoul() doesn't actually need a back-trimmed string ‒ writing "012345678 \n" is still allowed, as is "012345678zupsko", &c. This alters what happens when an invalid value is written ‒ previously it'd get set to what-ever simple_strtoul() returned (probably 0, thereby resetting it to default), now it does nothing Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #11878 Closes #11879
* Add SIGSTOP and SIGTSTP handling to issigPaul Dagnelie2021-04-151-0/+51
| | | | | | | | | | | | | | | | This change adds SIGSTOP and SIGTSTP handling to the issig function; this mirrors its behavior on Solaris. This way, long running kernel tasks can be stopped with the appropriate signals. Note that doing so with ctrl-z on the command line doesn't return control of the tty to the shell, because tty handling is done separately from stopping the process. That can be future work, if people feel that it is a necessary addition. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Issue #810 Issue #10843 Closes #11801
* FreeBSD: use vnlru_free_vfsops if availableMateusz Guzik2021-04-121-1/+21
| | | | | | | | Fixes issues when zfs is used along with other filesystems. External-issue: https://cgit.freebsd.org/src/commit/?id=e9272225e6bed840b00eef1c817b188c172338ee Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11881
* FreeBSD: add missing seqc write begin/end around zfs_acl_chown_setattrMateusz Guzik2021-04-121-0/+2
| | | | | | | | It happens to trip over an assert but does not matter for correctness at this time. Done for future proofing. Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11884
* FreeBSD: add support for lockless symlink lookupMateusz Guzik2021-04-122-2/+99
| | | | | Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11883
* Move zfsdev_state_{init,destroy} to common codeRyan Moeller2021-04-082-105/+18
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11833
* Eliminate zfsdev_get_state_implRyan Moeller2021-04-081-1/+1
| | | | | | | | After 3937ab20f zfsdev_get_state_impl can become zfsdev_get_state. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11833
* zpl_inode.c: Fix SMACK interoperabilityTerraTech2021-04-081-12/+24
| | | | | | | | | | | | | | | | | | | | | | SMACK needs to have the ZFS dentry security field setup before SMACK's d_instantiate() hook is called as it requires functioning '__vfs_getxattr()' calls to properly set the labels. Fxes: 1) file instantiation properly setting the object label to the subject's label 2) proper file labeling in a transmutable directory Functions Updated: 1) zpl_create() 2) zpl_mknod() 3) zpl_mkdir() 4) zpl_symlink() External-issue: https://github.com/cschaufler/smack-next/issues/1 Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: TerraTech <[email protected]> Closes #11646 Closes #11839
* kmem_alloc(KM_SLEEP) should use kvmalloc()Matthew Ahrens2021-04-061-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | `kmem_alloc(size>PAGESIZE, KM_SLEEP)` is backed by `kmalloc()`, which finds contiguous physical memory. If there isn't enough contiguous physical memory available (e.g. due to physical page fragmentation), the OOM killer will be invoked to make more memory available. This is not ideal because processes may be killed when there is still plenty of free memory (it just happens to be in individual pages, not contiguous runs of pages). We have observed this when allocating the ~13KB `zfs_cmd_t`, for example in `zfsdev_ioctl()`. This commit changes the behavior of `kmem_alloc(size>PAGESIZE, KM_SLEEP)` when there are insufficient contiguous free pages. In this case we will find individual pages and stitch them together using virtual memory. This is accomplished by using `kvmalloc()`, which implements the described behavior by trying `kmalloc(__GFP_NORETRY)` and falling back on `vmalloc()`. The behavior of `kmem_alloc(KM_NOSLEEP)` is not changed; it continues to use `kmalloc(GPF_ATOMIC | __GFP_NORETRY)`. This is because `vmalloc()` may sleep. Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #11461
* Fix various typosAndrea Gelmini2021-04-027-7/+7
| | | | | | | | | | Correct an assortment of typos throughout the code base. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #11774
* Avoid taking global lock to destroy zfsdev stateRyan Moeller2021-04-022-21/+11
| | | | | | | | | | | | | | We have exclusive access to our zfsdev state object in this section until it is invalidated by setting zs_minor to -1, so we can destroy the state without taking a lock if we do the invalidation last, after a member to ensure correct ordering. While here, strengthen the assertions that zs_minor is valid when we enter. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11751
* FreeBSD: Fix stable/12 after AT_BENEATH removalRyan Moeller2021-04-021-3/+1
| | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11827
* Fix error code on __zpl_ioctl_setflags()Luis Henriques2021-03-261-1/+1
| | | | | | | | | | Other (all?) Linux filesystems seem to return -EPERM instead of -EACCESS when trying to set FS_APPEND_FL or FS_IMMUTABLE_FL without the CAP_LINUX_IMMUTABLE capability. This was detected by generic/545 test in the fstest suite. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Luis Henriques <[email protected]> Closes #11791
* Removed duplicated includesAndrea Gelmini2021-03-221-1/+0
| | | | | | Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #11775
* Removing old code for k(un)map_atomicBrian Atkinson2021-03-192-8/+6
| | | | | | | | | | | | It used to be required to pass a enum km_type to kmap_atomic() and kunmap_atomic(), however this is no longer necessary and the wrappers zfs_k(un)map_atomic removed these. This is confusing in the ABD code as the struct abd_iter member iter_km no longer exists and the wrapper macros simply compile them out. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Brian Atkinson <[email protected]> Closes #11768
* Linux 5.12 update: bio_max_segs() replaces BIO_MAX_PAGESColeman Kane2021-03-191-0/+5
| | | | | | | | | | | The BIO_MAX_PAGES macro is being retired in favor of a bio_max_segs() function that implements the typical MIN(x,y) logic used throughout the kernel for bounding the allocation, and also the new implementation is intended to be signed-safe (which the former was not). Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #11765
* Linux 5.12 compat: idmapped mountsColeman Kane2021-03-196-13/+100
| | | | | | | | | | | | | | | | In Linux 5.12, the filesystem API was modified to support ipmapped mounts by adding a "struct user_namespace *" parameter to a number functions and VFS handlers. This change adds the needed autoconf macros to detect the new interfaces and updates the code appropriately. This change does not add support for idmapped mounts, instead it preserves the existing behavior by passing the initial user namespace where needed. A subsequent commit will be required to add support for idmapped mounted. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #11712
* FreeBSD: make seqc asserts conditional on replayMateusz Guzik2021-03-171-3/+6
| | | | | | | Avoids tripping on asserts when doing pool recovery. Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11739
* FreeBSD: Fix memory leaks in kstatsRyan Moeller2021-03-171-7/+4
| | | | | | | | | | | Don't handle (incorrectly) kmem_zalloc() failure. With KM_SLEEP, will never return NULL. Free the data allocated for non-virtual kstats when deleting the object. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11767
* Linux: always check or verify return of igrab()Adam D. Moss2021-03-163-3/+9
| | | | | | | | | | | zhold() wraps igrab() on Linux, and igrab() may fail when the inode is in the process of being deleted. This means zhold() must only be called when a reference exists and therefore it cannot be deleted. This is the case for all existing consumers so add a VERIFY and a comment explaining this requirement. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Adam Moss <[email protected]> Closes #11704
* Reference_tracking_enable should be a module paramDon Brady2021-03-161-6/+0
| | | | | | | | | | | | To make use of zfs_refcount_held tunable it should be a module parameter in open-zfs. Also, since the macros will auto-generate OS specific tunables, removed the existing zfs_refcount_held reference in module/os/freebsd/zfs/sysctl_os.c. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Allan Jude <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #11753
* FreeBSD: bring back possibility to rewind the checkpoint from bootloaderMariusz Zaborski2021-03-121-1/+16
| | | | | | | | | | | | | | | | | | Add parsing of the rewind options. When I was upstreaming the change [1], I omitted the part where we detect that the pool should be rewind. When the FreeBSD repo has synced with the OpenZFS, this part of the code was removed. [1] FreeBSD repo: 277f38abffc6a8160b5044128b5b2c620fbb970c [2] OpenZFS repo: f2c027bd6a003ec5793f8716e6189c389c60f47a External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254152 Originally reviewed by: tsoome, allanjude Originally reviewed by: kevans (ok from high-level overview) Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mariusz Zaborski <[email protected]> Closes #11730
* FreeBSD: Clean up zfsdev_close to match LinuxRyan Moeller2021-03-121-10/+8
| | | | | | | | | | | | | | Resolve some oddities in zfsdev_close() which could result in a panic and were not present in the equivalent function for Linux. - Remove unused definition ZFS_MIN_MINOR - FreeBSD: Simplify zfsdev state destruction - Assert zs_minor is valid in zfsdev_close - Make locking around zfsdev state match Linux Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11720
* Macroify teardown lock handlingMateusz Guzik2021-03-122-26/+22
| | | | | | | | | | | This will allow platforms to implement it as they see fit, in particular in a different manner than rrm locks. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Macy <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11153
* FreeBSD: rename teardown inactive macros to mimick rrm conventionMateusz Guzik2021-03-123-18/+18
| | | | | | | | Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Macy <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11153
* FreeBSD: remove 2 assertions that teardown lock is not heldMateusz Guzik2021-03-121-45/+0
| | | | | | | | | | | They are not very useful and hard to implement in the rms routine the code is about to start using. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Macy <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11153
* FreeBSD: rework asserts in zfs_dd_lookupMateusz Guzik2021-03-121-3/+2
| | | | | | | | | | | | 1. even up ifdefs 2. drop the arguably useless teardown lock asserts -- nothing else checks for it Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Macy <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11153
* zvol: call zil_replaying() during replayChristian Schwarz2021-03-072-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zil_replaying(zil, tx) has the side-effect of informing the ZIL that an entry has been replayed in the (still open) tx. The ZIL uses that information to record the replay progress in the ZIL header when that tx's txg syncs. ZPL log entries are not idempotent and logically dependent and thus calling zil_replaying() is necessary for correctness. For ZVOLs the question of correctness is more nuanced: ZVOL logs only TX_WRITE and TX_TRUNCATE, both of which are idempotent. Logical dependencies between two records exist only if the write or discard request had sync semantics or if the ranges affected by the records overlap. Thus, at a first glance, it would be correct to restart replay from the beginning if we crash before replay completes. But this does not address the following scenario: Assume one log record per LWB. The chain on disk is HDR -> 1:W(1, "A") -> 2:W(1, "B") -> 3:W(2, "X") -> 4:W(3, "Z") where N:W(O, C) represents log entry number N which is a TX_WRITE of C to offset A. We replay 1, 2 and 3 in one txg, sync that txg, then crash. Bit flips corrupt 2, 3, and 4. We come up again and restart replay from the beginning because we did not call zil_replaying() during replay. We replay 1 again, then interpret 2's invalid checksum as the end of the ZIL chain and call replay done. The replayed zvol content is "AX". If we had called zil_replaying() the HDR would have pointed to 3 and our resumed replay would not have replayed anything because 3 was corrupted, resulting in zvol content "BX". If 3 logically depends on 2 then the replay corrupted the ZVOL_OBJ's contents. This patch adds the zil_replaying() calls to the replay functions. Since the callbacks in the replay function need the zilog_t* pointer so that they can call zil_replaying() we open the ZIL while replaying in zvol_create_minor(). We also verify that replay has been done when on-demand-opening the ZIL on the first modifying bio. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Christian Schwarz <[email protected]> Closes #11667
* Intentionally allow ZFS_READONLY in zfs_writeRyan Moeller2021-03-071-5/+21
| | | | | | | | | | | | | | | | ZFS_READONLY represents the "DOS R/O" attribute. When that flag is set, we should behave as if write access were not granted by anything in the ACL. In particular: We _must_ allow writes after opening the file r/w, then setting the DOS R/O attribute, and writing some more. (Similar to how you can write after fchmod(fd, 0444).) Restore these semantics which were lost on FreeBSD when refactoring zfs_write. To my knowledge Linux does not actually expose this flag, but we'll need it to eventually so I've added the supporting checks. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11693
* Initialize ZIL buffersBrian Behlendorf2021-03-051-0/+1
| | | | | | | | | When populating a ZIL destination buffer ensure it is always zeroed before its contents are constructed. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11687
* linux: zvol: avoid heap allocation for zvol_request_sync=1Christian Schwarz2021-03-031-29/+64
| | | | | | | | | | | | The spl_kmem_alloc showed up in some flamegraphs in a single-threaded 4k sync write workload at 85k IOPS on an Intel(R) Xeon(R) Silver 4215 CPU @ 2.50GHz. Certainly not a huge win but I believe the change is clean and easy to maintain down the road. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Christian Schwarz <[email protected]> Closes #11666