aboutsummaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* Linux 5.6 compat: timestamp_truncate()Brian Behlendorf2020-02-071-6/+7
| | | | | | | | | | | | The timestamp_truncate() function was added, it replaces the existing timespec64_trunc() function. This change renames our wrapper function to be consistent with the upstream name and updates the compatibility code for older kernels accordingly. Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9956 Closes #9961
* Linux 5.6 compat: struct proc_opsBrian Behlendorf2020-02-071-1/+7
| | | | | | | | | | | | | | The proc_ops structure was introduced to replace the use of of the file_operations structure when registering proc handlers. This change creates a new kstat_proc_op_t typedef for compatibility which can be used to pass around the correct structure. This change additionally adds the 'const' keyword to all of the existing proc operations structures. Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9961
* Reduce number of atomic_add() calls in aggsumAlexander Motin2020-02-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Previous code used 4 atomics to do aggsum_flush_bucket() and 2 more to re-borrow after the flush. But since asc_borrowed and asc_delta are accessed only while holding asc_lock, it makes no any sense to modify as_lower_bound and as_upper_bound in multiple steps. Instead of that the new code uses only 2 atomics in all the cases, one per as_*_bound variable. I think even that is overkill, simple atomic store and load could be used here, since all modifications are done under the as_lock, but there are no such primitives in ZFS code now. While there, make borrow code consider previous borrow value, so that on mixed request patterns reduce chance of needing to borrow again if much larger request follows tiny one that needed borrow. Also reduce as_numbuckets from uint64_t to u_int. It makes no sense to use so large division operation on every aggsum_add(). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #9930
* Few microoptimizations to dbuf layerAlexander Motin2020-02-051-6/+7
| | | | | | | | | | | | | | | | | | | Move db_link into the same cache line as db_blkid and db_level. It allows significantly reduce avl_add() time in dbuf_create() on systems with large RAM and huge number of dbufs per dnode. Avoid few accesses to dbuf_caches[].size, which is highly congested under high IOPS and never stays in cache for a long time. Use local value we are receiving from zfs_refcount_add_many() any way. Remove cache_size_bytes_max bump from dbuf_evict_one(). I don't see a point to do it on dbuf eviction after we done it on insertion in dbuf_rele_and_unlock(). Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored-By: iXsystems, Inc. Closes #9931
* Convert dbuf dirty record record list to a list_tMatthew Macy2020-02-051-4/+27
| | | | | | | | | Additionally pull in state machine comments about upcoming async cow work. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9902
* Restore aclmode and remove acltype on FreeBSDRyan Moeller2020-02-041-1/+1
| | | | | | | | | | | | | | This replaces the placeholder ZFS_PROP_PRIVATE with ZFS_PROP_ACLMODE, matching what is done in the NFSv4 ACLs PR (#9709). On FreeBSD we hide ZFS_PROP_ACLTYPE, while on Linux we hide ZFS_PROP_ACLMODE. The tests already assume this arrangement. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9913
* async zvol minor node creation interferes with receiveMatthew Ahrens2020-02-032-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we finish a zfs receive, dmu_recv_end_sync() calls zvol_create_minors(async=TRUE). This kicks off some other threads that create the minor device nodes (in /dev/zvol/poolname/...). These async threads call zvol_prefetch_minors_impl() and zvol_create_minor(), which both call dmu_objset_own(), which puts a "long hold" on the dataset. Since the zvol minor node creation is asynchronous, this can happen after the `ZFS_IOC_RECV[_NEW]` ioctl and `zfs receive` process have completed. After the first receive ioctl has completed, userland may attempt to do another receive into the same dataset (e.g. the next incremental stream). This second receive and the asynchronous minor node creation can interfere with one another in several different ways, because they both require exclusive access to the dataset: 1. When the second receive is finishing up, dmu_recv_end_check() does dsl_dataset_handoff_check(), which can fail with EBUSY if the async minor node creation already has a "long hold" on this dataset. This causes the 2nd receive to fail. 2. The async udev rule can fail if zvol_id and/or systemd-udevd try to open the device while the the second receive's async attempt at minor node creation owns the dataset (via zvol_prefetch_minors_impl). This causes the minor node (/dev/zd*) to exist, but the udev-generated /dev/zvol/... to not exist. 3. The async minor node creation can silently fail with EBUSY if the first receive's zvol_create_minor() trys to own the dataset while the second receive's zvol_prefetch_minors_impl already owns the dataset. To address these problems, this change synchronously creates the minor node. To avoid the lock ordering problems that the asynchrony was introduced to fix (see #3681), we create the minor nodes from open context, with no locks held, rather than from syncing contex as was originally done. Implementation notes: We generally do not need to traverse children or prefetch anything (e.g. when running the recv, snapshot, create, or clone subcommands of zfs). We only need recursion when importing/opening a pool and when loading encryption keys. The existing recursive, asynchronous, prefetching code is preserved for use in these cases. Channel programs may need to create zvol minor nodes, when creating a snapshot of a zvol with the snapdev property set. We figure out what snapshots are created when running the LUA program in syncing context. In this case we need to remember what snapshots were created, and then try to create their minor nodes from open context, after the LUA code has completed. There are additional zvol use cases that asynchronously own the dataset, which can cause similar problems. E.g. changing the volmode or snapdev properties. These are less problematic because they are not recursive and don't touch datasets that are not involved in the operation, there is still potential for interference with subsequent operations. In the future, these cases should be similarly converted to create the zvol minor node synchronously from open context. The async tasks of removing and renaming minors do not own the objset, so they do not have this problem. However, it may make sense to also convert these operations to happen synchronously from open context, in the future. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> External-issue: DLPX-65948 Closes #7863 Closes #9885
* Add AltiVec RAID-ZRomain Dolbeau2020-01-234-1/+117
| | | | | | | | | | | | | Implements the RAID-Z function using AltiVec SIMD. This is basically the NEON code translated to AltiVec. Note that the 'fletcher' algorithm requires 64-bits operations, and the initial implementations of AltiVec (PPC74xx a.k.a. G4, PPC970 a.k.a. G5) only has up to 32-bits operations, so no 'fletcher'. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes #9539
* Support inheriting properties in channel programsJason King2020-01-221-0/+9
| | | | | | | | | This adds support in channel programs to inherit properties analogous to `zfs inherit` by adding `zfs.sync.inherit` and `zfs.check.inherit` functions to the ZFS LUA API. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jason King <[email protected]> Closes #9738
* Fix errata #4 handling for resuming streamsTom Caputi2020-01-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the handling for errata #4 has two issues which allow the checks for this issue to be bypassed using resumable sends. The first issue is that drc->drc_fromsnapobj is not set in the resuming code as it is in the non-resuming code. This causes dsl_crypto_recv_key_check() to skip its checks for the from_ivset_guid. The second issue is that resumable sends do not clean up their on-disk state if they fail the checks in dmu_recv_stream() that happen before any data is received. As a result of these two bugs, a user can attempt a resumable send of a dataset without a from_ivset_guid. This will fail the initial dmu_recv_stream() checks, leaving a valid resume state. The send can then be resumed, which skips those checks, allowing the receive to be completed. This commit fixes these issues by setting drc->drc_fromsnapobj in the resuming receive path and by ensuring that resumablereceives are properly cleaned up if they fail the initial dmu_recv_stream() checks. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #9818 Closes #9829
* libzfs: add zfs_mount_at() functionKyle Evans2020-01-141-0/+1
| | | | | | | | | | | | | | zfs_mount_at() mounts a dataset at an arbitrary mountpoint rather than at the configured mountpoint. This may be used by consumers that wish to temporarily expose a dataset at another mountpoint without altering dataset/pool properties. This will be used by FreeBSD's libbe be_mount(), which mounts a boot environment at an arbitrary mountpoint. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Kyle Evans <[email protected]> Closes #9833
* Change http://zfsonlinux.org links to https://zfsonlinux.orgBrian Behlendorf2020-01-131-1/+1
| | | | | | | | | | | Update the project website links contained in to repository to reference the secure https://zfsonlinux.org address. Reviewed-By: Richard Laager <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Garrett Fields <[email protected]> Reviewed-by: Kjeld Schouten <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9837
* Add 'zfs send --saved' flagTom Caputi2020-01-103-4/+11
| | | | | | | | | | | | | | | | | | This commit adds the --saved (-S) to the 'zfs send' command. This flag allows a user to send a partially received dataset, which can be useful when migrating a backup server to new hardware. This flag is compatible with resumable receives, so even if the saved send is interrupted, it can be resumed. The flag does not require any user / kernel ABI changes or any new feature flags in the send stream format. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Paul Zuchowski <[email protected]> Reviewed-by: Christian Schwarz <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #9007
* Remove parameter names from libzfs.h signaturesNick Black2020-01-081-7/+7
| | | | | | | | | | | | | | | Most of libzfs.h doesn't provide names for the parameters in its signatures. These few functions included them. That wouldn't be a problem, per se, but the 'lines' parameter conflicts with the 'lines' #define from terminfo's term.h, present for at least a decade. This makes it difficult to compile code making use of both ZFS and terminfo. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Kjeld Schouten <[email protected]> Signed-off-by: Nick Black <[email protected]> Closes #9821
* Fix "zpool add -n" for dedup, special and log devicesloli10K2020-01-061-1/+1
| | | | | | | | | | | | | | | | | | | | For dedup, special and log devices "zpool add -n" does not print correctly their vdev type: ~# zpool add -n pool dedup /tmp/dedup special /tmp/special log /tmp/log would update 'pool' to the following configuration: pool /tmp/normal /tmp/dedup /tmp/special /tmp/log This could lead storage administrators to modify their ZFS pools to unexpected and unintended vdev configurations. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #9783 Closes #9390
* Colorize zpool status outputTony Hutter2019-12-191-0/+12
| | | | | | | | | | | | | | | | | | | If the ZFS_COLOR env variable is set, then use ANSI color output in zpool status: - Column headers are bold - Degraded or offline pools/vdevs are yellow - Non-zero error counters and faulted vdevs/pools are red - The 'status:' and 'action:' sections are yellow if they're displaying a warning. This also includes a new 'faketty' function in libtest.shlib that is compatible with FreeBSD (code provided by @freqlabs). Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #9340
* Make zfs_replay.c work on FreeBSDMatthew Macy2019-12-131-0/+2
| | | | | | | | | | | | | FreeBSD's vfs currently doesn't permit file systems to do their own locking. To avoid having to have duplicate zfs functions with and without locking add locking here. With luck these changes can be removed in the future. Reviewed-by: Sean Eric Fagan <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9715
* Relocate common quota functions to shared codeRyan Moeller2019-12-114-12/+49
| | | | | | | | | | | The quota functions are common to all implementations and can be moved to common code. As a simplification they were moved to the Linux platform code in the initial refactoring. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9710
* Add FreeBSD jail support hooksMatthew Macy2019-12-111-1/+1
| | | | | | | | | | | | Add the 'zfs jail/unjail' subcommands along with the relevant documentation from FreeBSD. This feature is not supported on Linux and still requires the match kernel ioctls which will be included when the FreeBSD platform code is integrated. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9686
* Eliminate Linux specific inode usage from common code Matthew Macy2019-12-114-22/+25
| | | | | | | | | | Change many of the znops routines to take a znode rather than an inode so that zfs_replay code can be largely shared and in the future the much of the znops code may be shared. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9708
* SIMD: Use alloc_pages_node to force alignmentFabian-Gruenbichler2019-12-101-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | fxsave and xsave require the target address to be 16-/64-byte aligned. kmalloc(_node) does not (yet) offer such fine-grained control over alignment[0,1], even though it does "the right thing" most of the time for power-of-2 sizes. unfortunately, alignment is completely off when using certain debugging or hardening features/configs, such as KASAN, slub_debug=Z or the not-yet-upstream SLAB_CANARY. Use alloc_pages_node() instead which allows us to allocate page-aligned memory. Since fpregs_state is padded to a full page anyway, and this code is only relevant for x86 which has 4k pages, this approach should not allocate any unnecessary memory but still guarantee the needed alignment. 0: https://lwn.net/Articles/787740/ 1: https://lore.kernel.org/linux-block/[email protected]/ Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Fabian Grünbichler <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9608 Closes #9674
* Abstract away platform specific superblock referencesMatthew Macy2019-12-101-0/+2
| | | | | | | | The zfsvfs->z_sb field is Linux specified and should be abstracted. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9697
* Exclude data from cores unconditionally and metadata conditionallyMatthew Macy2019-12-091-0/+1
| | | | | | | | | This change allows us to align the code dump logic across platforms. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9691
* Remove stale ASSERTV commentMatthew Macy2019-12-061-1/+0
| | | | | | | | | | Followup for #9671, remove stale comment. Reviewed-by: Kjeld Schouten <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Issue #9671 Closes #9682
* Disable EDONR on FreeBSDMatthew Macy2019-12-052-0/+4
| | | | | | | | | | | FreeBSD uses its own crypto framework in-kernel which, at this time, has no EDONR implementation. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9664
* Add ZFS_IOC offsets for FreeBSDMatthew Macy2019-12-051-12/+13
| | | | | | | | | | | | | | | | | FreeBSD requires three additional ioctls, they are ZFS_IOC_NEXTBOOT, ZFS_IOC_JAIL, and ZFS_IOC_UNJAIL. These have been added after the Linux-specific ioctls. The range 0x80-0xFF has been reserved for future optional platform-specific ioctls. Any platform may choose to implement these as appropriate. None of the existing ioctl numbers have been changed to maintain compatibility. For Linux no vectors have been registered for the new ioctls and they are reported as unsupported. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9667
* Refactor deadman set failmode to be cross platformMatthew Macy2019-12-052-1/+3
| | | | | | | | | Update zfs_deadman_failmode to use the ZFS_MODULE_PARAM_CALL wrapper, and split the common and platform specific portions. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9670
* Replace ASSERTV macro with compiler annotationMatthew Macy2019-12-051-2/+4
| | | | | | | | | | | Remove the ASSERTV macro and handle suppressing unused compiler warnings for variables only in ASSERTs using the __attribute__((unused)) compiler annotation. The annotation is understood by both gcc and clang. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9671
* Refactor zfs_context.h to build on FreeBSDMatthew Macy2019-12-043-18/+35
| | | | | | | | | | | | - on Linux move Linux specific headers to zfs_context_os.h - on FreeBSD move FreeBSD specific definitions to zfs_context_os.h - remove duplicate tsd_ definitions - remove unused AT_TYPE Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9668
* Move linux qsort def to platform headerMatthew Macy2019-12-031-1/+5
| | | | | | | | | Moving qsort to the platform header allows each platform to provide an appropriate sorting implementation. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9663
* Move zfs_cmd_t copyin/copyout to platform codeMatthew Macy2019-12-021-1/+1
| | | | | | | | | | | FreeBSD needs to cope with multiple version of the zfs_cmd_t structure. Allowing the platform code to pre and post process the cmd structure makes it possible to work with legacy tooling. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9624
* Add FreeBSD required defines to mntent.hMatthew Macy2019-11-301-0/+9
| | | | | | | Linux and FreeBSD use different names for suid / setuid. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9632
* Add FreeBSD support to zio_crypto.hMatthew Macy2019-11-301-1/+17
| | | | | | | | Minimal compatibility changes for FreeBSD. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9631
* Resolve ZoF differences in zfs_ioctl.hMatthew Macy2019-11-301-1/+1
| | | | | | | | | | FreeBSD needs to be able to pass the jail id to the jail/unjail ioctls and the struct file in the device structure is unused. Reviewed-by: Kjeld Schouten <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9625
* Minor diff reduction with ZoF in include/sysMatthew Macy2019-11-275-6/+11
| | | | | | | | | | - move linux/ includes to platform headers - add void * io_bio to zio for tracking the underlying bio - add freebsd specific fields to abd_scatter Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Kjeld Schouten <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9615
* Remove zfs_vdev_elevator module optionBrian Behlendorf2019-11-271-8/+0
| | | | | | | | | | | | | As described in commit f81d5ef6 the zfs_vdev_elevator module option is being removed. Users who require this functionality should update their systems to set the disk scheduler using a udev rule. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #8664 Closes #9417 Closes #9609
* Prevent unnecessary resilver restartsjwpoduska2019-11-273-4/+9
| | | | | | | | | | | | | | | | | | If a device is participating in an active resilver, then it will have a non-empty DTL. Operations like vdev_{open,reopen,probe}() can cause the resilver to be restarted (or deferred to be restarted later), which is unnecessary if the DTL is still covered by the current scan range. This is similar to the logic in vdev_dtl_should_excise() where the DTL can only be excised if it's max txg is in the resilvered range. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Gallagher <[email protected]> Reviewed-by: Kjeld Schouten <[email protected]> Signed-off-by: John Poduska <[email protected]> Issue #840 Closes #9155 Closes #9378 Closes #9551 Closes #9588
* Add zfs_file_* interface, remove vnodesMatthew Macy2019-11-2116-234/+82
| | | | | | | | | | | | | | | | | | Provide a common zfs_file_* interface which can be implemented on all platforms to perform normal file access from either the kernel module or the libzpool library. This allows all non-portable vnode_t usage in the common code to be replaced by the new portable zfs_file_t. The associated vnode and kobj compatibility functions, types, and macros have been removed from the SPL. Moving forward, vnodes should only be used in platform specific code when provided by the native operating system. Reviewed-by: Sean Eric Fagan <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9556
* Partially revert 5a6ac4cBrian Behlendorf2019-11-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Reinstate the zpl_revalidate() functionality to resolve a regression where dentries for open files during a rollback are not invalidated. The unrelated functionality for automatically unmounting .zfs/snapshots was not reverted. Nor was the addition of shrink_dcache_sb() to the zfs_resume_fs() function. This issue was not immediately caught by the CI because the test case intended to catch it was included in the list of ZTS tests which may occasionally fail for unrelated reasons. Remove all of the rollback tests from this list to help identify the frequency of any spurious failures. The rollback_003_pos.ksh test case exposes a real issue with the long standing code which needs to be investigated. Regardless, it has been enable with a small workaround in the test case itself. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9587 Closes #9592
* Add kmem_cache flag for forcing kvmallocMichael Niewöhner2019-11-132-2/+5
| | | | | | | | | | | | | | This adds a new KMC_KVMEM flag was added to enforce use of the kvmalloc allocator in kmem_cache_create even for large blocks, which may also increase performance in some specific cases (e.g. zstd), too. Default to KVMEM instead of VMEM in spl_kmem_cache_create. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Sebastian Gottschall <[email protected]> Signed-off-by: Michael Niewöhner <[email protected]> Closes #9034
* Make use of kvmalloc if available and fix vmem_alloc implementationMichael Niewöhner2019-11-131-0/+3
| | | | | | | | | | | | | | | | | This patch implements use of kvmalloc for GFP_KERNEL allocations, which may increase performance if the allocator is able to allocate physical memory, if kvmalloc is available as a public kernel interface (since v4.12). Otherwise it will simply fall back to virtual memory (vmalloc). Also fix vmem_alloc implementation which can lead to slow allocations since the first attempt with kmalloc does not make use of the noretry flag but tells the linux kernel to retry several times before it fails. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Sebastian Gottschall <[email protected]> Signed-off-by: Michael Niewöhner <[email protected]> Closes #9034
* Add wrapper stub for zfs_cmd ioctl to libzpoolMatthew Macy2019-11-121-0/+3
| | | | | | | | | | FreeBSD needs a wrapper for handling zfs_cmd ioctls. In libzfs this is handled by zfs_ioctl. However, here we need to wrap the call directly. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9511
* Linux compat: Minimum kernel version 3.10Brian Behlendorf2019-11-1214-635/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Increase the minimum supported kernel version from 2.6.32 to 3.10. This removes support for the following Linux enterprise distributions. Distribution | Kernel | End of Life ---------------- | ------ | ------------- Ubuntu 12.04 LTS | 3.2 | Apr 28, 2017 SLES 11 | 3.0 | Mar 32, 2019 RHEL / CentOS 6 | 2.6.32 | Nov 30, 2020 The following changes were made as part of removing support. * Updated `configure` to enforce a minimum kernel version as specified in the META file (Linux-Minimum: 3.10). configure: error: *** Cannot build against kernel version 2.6.32. *** The minimum supported kernel version is 3.10. * Removed all `configure` kABI checks and matching C code for interfaces which solely predate the Linux 3.10 kernel. * Updated all `configure` kABI checks to fail when an interface is missing which was in the 3.10 kernel up to the latest 5.1 kernel. Removed the HAVE_* preprocessor defines for these checks and updated the code to unconditionally use the verified interface. * Inverted the detection logic in several kABI checks to match the new interface as it appears in 3.10 and newer and not the legacy interface. * Consolidated the following checks in to individual files. Due the large number of changes in the checks it made sense to handle this now. It would be desirable to group other related checks in the same fashion, but this as left as future work. - config/kernel-blkdev.m4 - Block device kABI checks - config/kernel-blk-queue.m4 - Block queue kABI checks - config/kernel-bio.m4 - Bio interface kABI checks * Removed the kABI checks for sops->nr_cached_objects() and sops->free_cached_objects(). These interfaces are currently unused. Signed-off-by: Brian Behlendorf <[email protected]> Closes #9566
* Allow platform dependent path stripping for vdevsRyan Moeller2019-11-111-1/+1
| | | | | | | | | | | | On Linux the full path preceding devices is stripped when formatting vdev names. On FreeBSD we only want to strip "/dev/". Hide the implementation details of path stripping behind zfs_strip_path(). Make zfs_strip_partition_path() static in Linux implementation while here, since it is never used outside of the file it is defined in. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9565
* Remove zpl_revalidatePavel Snajdr2019-11-113-1/+2
| | | | | | | | | | | | | | | | | | | | | | | This patch removes the need for zpl_revalidate altogether. There were 3 main reasons why we used d_revalidate: 1. periodic automounted snapshots umount deferral 2. negative dentries created before snapshot rollback 3. stale inodes referenced by dentry cache after snapshot rollback Periodic snapshots deferral solution introduces zfs_exit_fs function, which is called as a part of ZFS_EXIT(zfsvfs_t) macro. Negative dentries and stale inodes are solved by flushing the dcache for the particular dataset on zfs_resume_fs call. This patch also removes now unused HAVE_S_D_OP configure test. Reviewed-by: Aleksa Sarai <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Closes #8774 Closes #9549
* Preliminary support for RV64GRomain Dolbeau2019-11-061-1/+22
| | | | | | | This adds basic support for RISC-V, specifically RV64G. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes #9540
* Move platform specific parts of zfs_znode.h to platform codeMatthew Macy2019-11-063-137/+177
| | | | | | | | | Some of the znode fields are different and functions consuming an inode don't exist on FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9536
* Add tracepoints for taskq entry lifetime eventsPrakash Surya2019-11-013-0/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds some new DTRACE_PROBE* endpoints so that we can observe taskq latencies on a system. Additionally, a new "taskqlatency.bt" script is added to do this observation via "bpftrace". Lastly, a "zfs-trace.sh" script is added to wrap "bpftrace" with the proper options required to run and use "taskqlatency.bt". For example, with these changes in place, a user can run the following: $ cd ./contrib/bpftrace $ sudo ./zfs-trace.sh taskqlatency.bt Attaching 6 probes... ^C Here's some example output, showing latency information for time spent executing the taskq entry's function: @exec_lat_us[dp_sync_taskq, userquota_updates_task]: [2, 4) 5 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 0 | | [8, 16) 1 |@@@@@@@@@@ | [16, 32) 2 |@@@@@@@@@@@@@@@@@@@@ | @exec_lat_us[z_wr_int_h, zio_execute]: [8, 16) 16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [16, 32) 2 |@@@@@@ | @exec_lat_us[z_wr_iss_h, zio_execute]: [16, 32) 4 |@@@@@@@@@@@@@@@@ | [32, 64) 13 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [64, 128) 1 |@@@@ | @exec_lat_us[z_ioctl_int, zio_execute]: [2, 4) 1 |@@@@ | [4, 8) 11 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [8, 16) 8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | @exec_lat_us[dp_sync_taskq, sync_dnodes_task]: [2, 4) 1 |@@@@@@ | [4, 8) 7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [8, 16) 8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [16, 32) 2 |@@@@@@@@@@@@@ | [32, 64) 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@ | [64, 128) 1 |@@@@@@ | [128, 256) 0 | | [256, 512) 1 |@@@@@@ Here's some example output, showing latency information for time spent waiting on the taskq, prior to starting execution of entry's function: @queue_lat_us[dp_sync_taskq]: [2, 4) 1 |@@@@ | [4, 8) 7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [8, 16) 2 |@@@@@@@@ | [16, 32) 3 |@@@@@@@@@@@@@ | [32, 64) 12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [64, 128) 6 |@@@@@@@@@@@@@@@@@@@@@@@@@@ | [128, 256) 0 | | [256, 512) 1 |@@@@ | @queue_lat_us[z_wr_iss]: [4, 8) 4 |@@@@ | [8, 16) 13 |@@@@@@@@@@@@@@@ | [16, 32) 6 |@@@@@@@ | [32, 64) 2 |@@ | [64, 128) 12 |@@@@@@@@@@@@@@ | [128, 256) 15 |@@@@@@@@@@@@@@@@@@ | [256, 512) 33 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [512, 1K) 27 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [1K, 2K) 7 |@@@@@@@@ | [2K, 4K) 14 |@@@@@@@@@@@@@@@@ | [4K, 8K) 14 |@@@@@@@@@@@@@@@@ | [8K, 16K) 23 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [16K, 32K) 43 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @queue_lat_us[z_wr_int]: [2, 4) 10 |@@@@@ | [4, 8) 71 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [8, 16) 88 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [16, 32) 50 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [32, 64) 65 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [64, 128) 43 |@@@@@@@@@@@@@@@@@@@@@@@@@ | [128, 256) 19 |@@@@@@@@@@@ | [256, 512) 3 |@ | [512, 1K) 1 | | Reviewed by: Brad Lewis <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Closes #9525
* Enable use of DTRACE_PROBE* macros in "spl" modulePrakash Surya2019-11-017-23/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | This change modifies some of the infrastructure for enabling the use of the DTRACE_PROBE* macros, such that we can use tehm in the "spl" module. Currently, when the DTRACE_PROBE* macros are used, they get expanded to create new functions, and these dynamically generated functions become part of the "zfs" module. Since the "spl" module does not depend on the "zfs" module, the use of DTRACE_PROBE* in the "spl" module would result in undefined symbols being used in the "spl" module. Specifically, DTRACE_PROBE* would turn into a function call, and the function being called would be a symbol only contained in the "zfs" module; which results in a linker and/or runtime error. Thus, this change adds the necessary logic to the "spl" module, to mirror the tracing functionality available to the "zfs" module. After this change, we'll have a "trace_zfs.h" header file which defines the probes available only to the "zfs" module, and a "trace_spl.h" header file which defines the probes available only to the "spl" module. Reviewed by: Brad Lewis <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Closes #9525
* Wrap Linux module macrosMatthew Macy2019-11-016-6/+78
| | | | | | | | | | MODULE_VERSION is already defined on FreeBSD. Wrap all of the used MODULE_* macros for the sake of consistency and portability. Add a user space noop version to reduce the need for _KERNEL ifdefs. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9542