summaryrefslogtreecommitdiffstats
path: root/include/sys
Commit message (Collapse)AuthorAgeFilesLines
* Refactor deadman set failmode to be cross platformMatthew Macy2019-12-052-1/+3
| | | | | | | | | Update zfs_deadman_failmode to use the ZFS_MODULE_PARAM_CALL wrapper, and split the common and platform specific portions. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9670
* Refactor zfs_context.h to build on FreeBSDMatthew Macy2019-12-041-18/+4
| | | | | | | | | | | | - on Linux move Linux specific headers to zfs_context_os.h - on FreeBSD move FreeBSD specific definitions to zfs_context_os.h - remove duplicate tsd_ definitions - remove unused AT_TYPE Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9668
* Move zfs_cmd_t copyin/copyout to platform codeMatthew Macy2019-12-021-1/+1
| | | | | | | | | | | FreeBSD needs to cope with multiple version of the zfs_cmd_t structure. Allowing the platform code to pre and post process the cmd structure makes it possible to work with legacy tooling. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9624
* Add FreeBSD required defines to mntent.hMatthew Macy2019-11-301-0/+9
| | | | | | | Linux and FreeBSD use different names for suid / setuid. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9632
* Add FreeBSD support to zio_crypto.hMatthew Macy2019-11-301-1/+17
| | | | | | | | Minimal compatibility changes for FreeBSD. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9631
* Resolve ZoF differences in zfs_ioctl.hMatthew Macy2019-11-301-1/+1
| | | | | | | | | | FreeBSD needs to be able to pass the jail id to the jail/unjail ioctls and the struct file in the device structure is unused. Reviewed-by: Kjeld Schouten <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9625
* Minor diff reduction with ZoF in include/sysMatthew Macy2019-11-274-6/+7
| | | | | | | | | | - move linux/ includes to platform headers - add void * io_bio to zio for tracking the underlying bio - add freebsd specific fields to abd_scatter Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Kjeld Schouten <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9615
* Prevent unnecessary resilver restartsjwpoduska2019-11-273-4/+9
| | | | | | | | | | | | | | | | | | If a device is participating in an active resilver, then it will have a non-empty DTL. Operations like vdev_{open,reopen,probe}() can cause the resilver to be restarted (or deferred to be restarted later), which is unnecessary if the DTL is still covered by the current scan range. This is similar to the logic in vdev_dtl_should_excise() where the DTL can only be excised if it's max txg is in the resilvered range. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Gallagher <[email protected]> Reviewed-by: Kjeld Schouten <[email protected]> Signed-off-by: John Poduska <[email protected]> Issue #840 Closes #9155 Closes #9378 Closes #9551 Closes #9588
* Add zfs_file_* interface, remove vnodesMatthew Macy2019-11-2110-62/+80
| | | | | | | | | | | | | | | | | | Provide a common zfs_file_* interface which can be implemented on all platforms to perform normal file access from either the kernel module or the libzpool library. This allows all non-portable vnode_t usage in the common code to be replaced by the new portable zfs_file_t. The associated vnode and kobj compatibility functions, types, and macros have been removed from the SPL. Moving forward, vnodes should only be used in platform specific code when provided by the native operating system. Reviewed-by: Sean Eric Fagan <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9556
* Add kmem_cache flag for forcing kvmallocMichael Niewöhner2019-11-131-0/+1
| | | | | | | | | | | | | | This adds a new KMC_KVMEM flag was added to enforce use of the kvmalloc allocator in kmem_cache_create even for large blocks, which may also increase performance in some specific cases (e.g. zstd), too. Default to KVMEM instead of VMEM in spl_kmem_cache_create. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Sebastian Gottschall <[email protected]> Signed-off-by: Michael Niewöhner <[email protected]> Closes #9034
* Linux compat: Minimum kernel version 3.10Brian Behlendorf2019-11-121-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Increase the minimum supported kernel version from 2.6.32 to 3.10. This removes support for the following Linux enterprise distributions. Distribution | Kernel | End of Life ---------------- | ------ | ------------- Ubuntu 12.04 LTS | 3.2 | Apr 28, 2017 SLES 11 | 3.0 | Mar 32, 2019 RHEL / CentOS 6 | 2.6.32 | Nov 30, 2020 The following changes were made as part of removing support. * Updated `configure` to enforce a minimum kernel version as specified in the META file (Linux-Minimum: 3.10). configure: error: *** Cannot build against kernel version 2.6.32. *** The minimum supported kernel version is 3.10. * Removed all `configure` kABI checks and matching C code for interfaces which solely predate the Linux 3.10 kernel. * Updated all `configure` kABI checks to fail when an interface is missing which was in the 3.10 kernel up to the latest 5.1 kernel. Removed the HAVE_* preprocessor defines for these checks and updated the code to unconditionally use the verified interface. * Inverted the detection logic in several kABI checks to match the new interface as it appears in 3.10 and newer and not the legacy interface. * Consolidated the following checks in to individual files. Due the large number of changes in the checks it made sense to handle this now. It would be desirable to group other related checks in the same fashion, but this as left as future work. - config/kernel-blkdev.m4 - Block device kABI checks - config/kernel-blk-queue.m4 - Block queue kABI checks - config/kernel-bio.m4 - Bio interface kABI checks * Removed the kABI checks for sops->nr_cached_objects() and sops->free_cached_objects(). These interfaces are currently unused. Signed-off-by: Brian Behlendorf <[email protected]> Closes #9566
* Move platform specific parts of zfs_znode.h to platform codeMatthew Macy2019-11-061-137/+12
| | | | | | | | | Some of the znode fields are different and functions consuming an inode don't exist on FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9536
* Enable use of DTRACE_PROBE* macros in "spl" modulePrakash Surya2019-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | This change modifies some of the infrastructure for enabling the use of the DTRACE_PROBE* macros, such that we can use tehm in the "spl" module. Currently, when the DTRACE_PROBE* macros are used, they get expanded to create new functions, and these dynamically generated functions become part of the "zfs" module. Since the "spl" module does not depend on the "zfs" module, the use of DTRACE_PROBE* in the "spl" module would result in undefined symbols being used in the "spl" module. Specifically, DTRACE_PROBE* would turn into a function call, and the function being called would be a symbol only contained in the "zfs" module; which results in a linker and/or runtime error. Thus, this change adds the necessary logic to the "spl" module, to mirror the tracing functionality available to the "zfs" module. After this change, we'll have a "trace_zfs.h" header file which defines the probes available only to the "zfs" module, and a "trace_spl.h" header file which defines the probes available only to the "spl" module. Reviewed by: Brad Lewis <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Closes #9525
* Wrap Linux module macrosMatthew Macy2019-11-013-6/+44
| | | | | | | | | | MODULE_VERSION is already defined on FreeBSD. Wrap all of the used MODULE_* macros for the sake of consistency and portability. Add a user space noop version to reduce the need for _KERNEL ifdefs. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9542
* Prefix struct rangelockMatthew Macy2019-11-014-19/+19
| | | | | | | | | | A struct rangelock already exists on FreeBSD. Add a zfs_ prefix as per our convention to prevent any conflict with existing symbols. This change is a follow up to 2cc479d0. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9534
* Return an error code from zfs_acl_chmod_setattrMatthew Macy2019-11-011-1/+1
| | | | | | | | The FreeBSD implementation can fail, allow this function to fail and add the required error handling for Linux. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9541
* Move sha2.h to platform codeMatthew Macy2019-10-312-156/+0
| | | | | | | | | FreeBSD has its own sha routines that the port uses. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9530
* Don't cast away constMatthew Macy2019-10-311-1/+1
| | | | | | | | | Follow up to 511fce6b which missed a cast. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9533
* Factor Linux specific code out of spa_misc.cMatthew Macy2019-10-312-1/+6
| | | | | | | | | | Move these Linux module parameter get/set helpers in to platform specific code. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9457
* Minor spa portability fixesMatthew Macy2019-10-281-0/+3
| | | | | | | | | | | | - FreeBSD's rootpool import code uses spa_config_parse - Move the zvol_create_minors call out from under the spa_namespace_lock in spa_import. It isn't needed and it causes a lock order reversal on FreeBSD. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9499
* Fix for ARC sysctls ignored at runtimeloli10K2019-10-262-0/+10
| | | | | | | | | | | | | | This change leverage module_param_call() to run arc_tuning_update() immediately after the ARC tunable has been updated as suggested in cffa8372 code review. A simple test case is added to the ZFS Test Suite to prevent future regressions in functionality. Reviewed-by: Matt Macy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #9487 Closes #9489
* Move platform dependent errno aliasesMatthew Macy2019-10-251-10/+0
| | | | | | | | | EBADE, EBADR, and ENOANO do not exist on FreeBSD The libspl errno.h is similarly platform dependent. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9498
* Remove sdt.hMatthew Macy2019-10-254-72/+8
| | | | | | | | | It's mostly a noop on ZoL and it conflicts with platforms that support dtrace. Remove this header to resolve the conflict. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9497
* Linux 4.14, 4.19, 5.0+ compat: SIMD save/restoreBrian Behlendorf2019-10-241-7/+7
| | | | | | | | | | | | | | | | | | | | Contrary to initial testing we cannot rely on these kernels to invalidate the per-cpu FPU state and restore the FPU registers. Nor can we guarantee that the kernel won't modify the FPU state which we saved in the task struck. Therefore, the kfpu_begin() and kfpu_end() functions have been updated to save and restore the FPU state using our own dedicated per-cpu FPU state variables. This has the additional advantage of allowing us to use the FPU again in user threads. So we remove the code which was added to use task queues to ensure some functions ran in kernel threads. Reviewed-by: Fabian Grünbichler <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #9346 Closes #9403
* OpenZFS restructuring - ARC memory pressureMatthew Macy2019-10-181-0/+58
| | | | | | | | | | | Factor Linux specific memory pressure handling out of ARC. Each platform will have different available interfaces for managing memory pressure. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9472
* Make zfsdev_getminor signature cross platformMatthew Macy2019-10-161-1/+1
| | | | | | | | Only pass the file descriptor to make zfsdev_get_miror() portable. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9466
* Don't call sizeof on voidPaul Dagnelie2019-10-131-6/+6
| | | | | | | | | We get the sizeof the appropriate type, and don't cast away const. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Matt Macy <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #9455
* Function name and comment updatesPaul Dagnelie2019-10-111-8/+13
| | | | | | | | | | | Rename certain functions for more consistency when they share common features. Make comments clearer about what arguments should be passed to the insert and add functions. Reviewed by: Sara Hartse <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #9441
* Expose dmu_buf_hold_array_by_dnode to platform codeMatthew Macy2019-10-111-1/+3
| | | | | | | | FreeBSD uses this in its pager ops routines Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9431
* Make `zil_async_to_sync` visible to platform codeMatthew Macy2019-10-101-0/+1
| | | | | | | | | FreeBSD's zvol platform code requires access to the zil_async_to_sync() function. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9440
* Fix strdup conflict on other platformsMatthew Macy2019-10-101-1/+2
| | | | | | | | | | | | | | | | In the FreeBSD kernel the strdup signature is: ``` char *strdup(const char *__restrict, struct malloc_type *); ``` It's unfortunate that the developers have chosen to change the signature of libc functions - but it's what I have to deal with. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9433
* Reduce loaded range tree memory usagePaul Dagnelie2019-10-0911-82/+564
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements a new tree structure for ZFS, and uses it to store range trees more efficiently. The new structure is approximately a B-tree, though there are some small differences from the usual characterizations. The tree has core nodes and leaf nodes; each contain data elements, which the elements in the core nodes acting as separators between its children. The difference between core and leaf nodes is that the core nodes have an array of children, while leaf nodes don't. Every node in the tree may be only partially full; in most cases, they are all at least 50% full (in terms of element count) except for the root node, which can be less full. Underfull nodes will steal from their neighbors or merge to remain full enough, while overfull nodes will split in two. The data elements are contained in tree-controlled buffers; they are copied into these on insertion, and overwritten on deletion. This means that the elements are not independently allocated, which reduces overhead, but also means they can't be shared between trees (and also that pointers to them are only valid until a side-effectful tree operation occurs). The overhead varies based on how dense the tree is, but is usually on the order of about 50% of the element size; the per-node overheads are very small, and so don't make a significant difference. The trees can accept arbitrary records; they accept a size and a comparator to allow them to be used for a variety of purposes. The new trees replace the AVL trees used in the range trees today. Currently, the range_seg_t structure contains three 8 byte integers of payload and two 24 byte avl_tree_node_ts to handle its storage in both an offset-sorted tree and a size-sorted tree (total size: 64 bytes). In the new model, the range seg structures are usually two 4 byte integers, but a separate one needs to exist for the size-sorted and offset-sorted tree. Between the raw size, the 50% overhead, and the double storage, the new btrees are expected to use 8*1.5*2 = 24 bytes per record, or 33.3% as much memory as the AVL trees (this is for the purposes of storing metaslab range trees; for other purposes, like scrubs, they use ~50% as much memory). We reduced the size of the payload in the range segments by teaching range trees about starting offsets and shifts; since metaslabs have a fixed starting offset, and they all operate in terms of disk sectors, we can store the ranges using 4-byte integers as long as the size of the metaslab divided by the sector size is less than 2^32. For 512-byte sectors, this is a 2^41 (or 2TB) metaslab, which with the default settings corresponds to a 256PB disk. 4k sector disks can handle metaslabs up to 2^46 bytes, or 2^63 byte disks. Since we do not anticipate disks of this size in the near future, there should be almost no cases where metaslabs need 64-byte integers to store their ranges. We do still have the capability to store 64-byte integer ranges to account for cases where we are storing per-vdev (or per-dnode) trees, which could reasonably go above the limits discussed. We also do not store fill information in the compact version of the node, since it is only used for sorted scrub. We also optimized the metaslab loading process in various other ways to offset some inefficiencies in the btree model. While individual operations (find, insert, remove_from) are faster for the btree than they are for the avl tree, remove usually requires a find operation, while in the AVL tree model the element itself suffices. Some clever changes actually caused an overall speedup in metaslab loading; we use approximately 40% less cpu to load metaslabs in our tests on Illumos. Another memory and performance optimization was achieved by changing what is stored in the size-sorted trees. When a disk is heavily fragmented, the df algorithm used by default in ZFS will almost always find a number of small regions in its initial cursor-based search; it will usually only fall back to the size-sorted tree to find larger regions. If we increase the size of the cursor-based search slightly, and don't store segments that are smaller than a tunable size floor in the size-sorted tree, we can further cut memory usage down to below 20% of what the AVL trees store. This also results in further reductions in CPU time spent loading metaslabs. The 16KiB size floor was chosen because it results in substantial memory usage reduction while not usually resulting in situations where we can't find an appropriate chunk with the cursor and are forced to use an oversized chunk from the size-sorted tree. In addition, even if we do have to use an oversized chunk from the size-sorted tree, the chunk would be too small to use for ZIL allocations, so it isn't as big of a loss as it might otherwise be. And often, more small allocations will follow the initial one, and the cursor search will now find the remainder of the chunk we didn't use all of and use it for subsequent allocations. Practical testing has shown little or no change in fragmentation as a result of this change. If the size-sorted tree becomes empty while the offset sorted one still has entries, it will load all the entries from the offset sorted tree and disregard the size floor until it is unloaded again. This operation occurs rarely with the default setting, only on incredibly thoroughly fragmented pools. There are some other small changes to zdb to teach it to handle btrees, but nothing major. Reviewed-by: George Wilson <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed by: Sebastien Roy [email protected] Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #9181
* Rename rangelock_ functions to zfs_rangelock_Matthew Macy2019-10-031-5/+5
| | | | | | | | | A rangelock KPI already exists on FreeBSD. Add a zfs_ prefix as per our convention to prevent any conflict with existing symbols. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9402
* Add inode accessors to common codeMatthew Macy2019-10-021-1/+9
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9389
* OpenZFS restructuring - arc_statsMatthew Macy2019-10-011-0/+276
| | | | | | | | Make arc_stats visible to platform code. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9386
* OpenZFS restructuring - zfs_ioctlMatthew Macy2019-09-274-0/+99
| | | | | | | | | Refactor the zfs ioctls in to platform dependent and independent bits. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Sean Eric Fagan <[email protected]> Signed-off-by: Matthew Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9301
* OpenZFS restructuring - zvolMatthew Macy2019-09-254-6/+117
| | | | | | | | | | Refactor the zvol in to platform dependent and independent bits. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9295
* Add subcommand to wait for background zfs activity to completeJohn Gallagher2019-09-134-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the best way to wait for the completion of a long-running operation in a pool, like a scrub or device removal, is to poll 'zpool status' and parse its output, which is neither efficient nor convenient. This change adds a 'wait' subcommand to the zpool command. When invoked, 'zpool wait' will block until a specified type of background activity completes. Currently, this subcommand can wait for any of the following: - Scrubs or resilvers to complete - Devices to initialized - Devices to be replaced - Devices to be removed - Checkpoints to be discarded - Background freeing to complete For example, a scrub that is in progress could be waited for by running zpool wait -t scrub <pool> This also adds a -w flag to the attach, checkpoint, initialize, replace, remove, and scrub subcommands. When used, this flag makes the operations kicked off by these subcommands synchronous instead of asynchronous. This functionality is implemented using a new ioctl. The type of activity to wait for is provided as input to the ioctl, and the ioctl blocks until all activity of that type has completed. An ioctl was used over other methods of kernel-userspace communiction primarily for the sake of portability. Porting Notes: This is ported from Delphix OS change DLPX-44432. The following changes were made while porting: - Added ZoL-style ioctl input declaration. - Reorganized error handling in zpool_initialize in libzfs to integrate better with changes made for TRIM support. - Fixed check for whether a checkpoint discard is in progress. Previously it also waited if the pool had a checkpoint, instead of just if a checkpoint was being discarded. - Exposed zfs_initialize_chunk_size as a ZoL-style tunable. - Updated more existing tests to make use of new 'zpool wait' functionality, tests that don't exist in Delphix OS. - Used existing ZoL tunable zfs_scan_suspend_progress, together with zinject, in place of a new tunable zfs_scan_max_blks_per_txg. - Added support for a non-integral interval argument to zpool wait. Future work: ZoL has support for trimming devices, which Delphix OS does not. In the future, 'zpool wait' could be extended to add the ability to wait for trim operations to complete. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: John Gallagher <[email protected]> Closes #9162
* QAT related bug fixesChengfei ZHu2019-09-121-5/+0
| | | | | | | | | | | | | | 1. Fix issue: Kernel BUG with QAT during decompression #9276. Now it is uninterruptible for a specific given QAT request, but Ctrl-C interrupt still works in user-space process. 2. Copy the digest result to the buffer only when doing encryption, and vise-versa for decryption. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chengfei Zhu <[email protected]> Closes #9276 Closes #9303
* Enable compiler to typecheck loggingMatthew Macy2019-09-121-3/+3
| | | | | | | | | | Annotate spa logging declarations with printflike Workaround gcc bug (non disable-able warning) by replacing "" with " " Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9316
* OpenZFS restructuring - move linux tracing code to platform directoriesMatthew Macy2019-09-1116-2122/+0
| | | | | | | | | | | Move Linux specific tracing headers and source to platform directories and update the build system. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed by: Brad Lewis <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #9290
* Fix /etc/hostid on root pool deadlockBrian Behlendorf2019-09-102-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock as a reader in zfs_blkptr_verify(). A deadlock can occur if the /etc/hostid file resides on a dataset in the same pool. This is because reading the /etc/hostid file may occur while the caller is holding the SCL_VDEV lock as a writer. For example, to perform a `zpool attach` as shown in the abbreviated stack below. To resolve the issue we cache the system's hostid when initializing the spa_t, or when modifying the multihost property. The cached value is then relied upon for subsequent accesses. Call Trace: spa_config_enter+0x1e8/0x350 [zfs] zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock zio_read+0x6c/0x140 [zfs] ... vfs_read+0xfc/0x1e0 kernel_read+0x50/0x90 ... spa_get_hostid+0x1c/0x38 [zfs] spa_config_generate+0x1a0/0x610 [zfs] vdev_label_init+0xa0/0xc80 [zfs] vdev_create+0x98/0xe0 [zfs] spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock Reviewed-by: loli10K <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9256 Closes #9285
* Enable SIMD for encryptionBrian Behlendorf2019-09-101-7/+7
| | | | | | | | | | | | | | | | | | | | | When adding the SIMD compatibility code in e5db313 the decryption of a dataset wrapping key was left in a user thread context. This was done intentionally since it's a relatively infrequent operation. However, this also meant that the encryption context templates were initialized using the generic operations. Therefore, subsequent encryption and decryption operations would use the generic implementation even when executed by an I/O pipeline thread. Resolve the issue by initializing the context templates in an I/O pipeline thread. And by updating zio_do_crypt_uio() to dispatch any encryption operations to a pipeline thread when called from the user context. For example, when performing a read from the ARC. Tested-by: Attila Fülöp <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9215 Closes #9296
* OpenZFS restructuring - move platform specific sourcesMatthew Macy2019-09-062-0/+205
| | | | | | | | | | | | Move platform specific Linux source under module/os/linux/ and update the build system accordingly. Additional code restructuring will follow to make the common code fully portable. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Macy <[email protected]> Closes #9206
* Make module tunables cross platformMatthew Macy2019-09-051-0/+11
| | | | | | | | | | | Adds ZFS_MODULE_PARAM to abstract module parameter setting to operating systems other than Linux. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #9230
* OpenZFS restructuring - move platform specific headersMatthew Macy2019-09-057-769/+0
| | | | | | | | | | | | | | | | | Move platform specific Linux headers under include/os/linux/. Update the build system accordingly to detect the platform. This lays some of the initial groundwork to supporting building for other platforms. As part of this change it was necessary to create both a user and kernel space sys/simd.h header which can be included in either context. No functional change, the source has been refactored and the relevant #include's updated. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Matthew Macy <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #9198
* Fix typos in include/Andrea Gelmini2019-08-3018-26/+26
| | | | | | | Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #9238
* Keep more metaslabs loadedPaul Dagnelie2019-08-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | With the other metaslab changes loaded onto a system, we can significantly reduce the memory usage of each loaded metaslab and unload them on demand if there is memory pressure. However, none of those changes actually result in us keeping more metaslabs loaded. If we don't keep more metaslabs loaded, we will still have to wait for demand-loading to finish when no loaded metaslab can satisfy our allocation, which can cause ZIL performance issues. In addition, performance is traditionally measured by IOs per unit time, while unloading is currently done on a txg-count basis. Txgs can take a widely varying range of times, from tenths of a second to several seconds. This can result in confusing, hard to predict behavior. This change simply adds a time-based component to metaslab unloading. A metaslab will remain loaded for one minute and 8 txgs (by default) after it was last used, unless it is evicted due to memory pressure. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> External-issue: DLPX-65016 External-issue: DLPX-65047 Closes #9197
* Fix zil replay panic when TX_REMOVE followed by TX_CREATEChunwei Chen2019-08-281-5/+2
| | | | | | | | | | | | | | | | | | | | If TX_REMOVE is followed by TX_CREATE on the same object id, we need to make sure the object removal is completely finished before creation. The current implementation relies on dnode_hold_impl with DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work fine before, in current version it does not guarantee the object removal is completed. We fix this by checking if DNODE_MUST_BE_FREE returns successful instead. Also add test and remove dead code in dnode_hold_impl. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #7151 Closes #8910 Closes #9123 Closes #9145
* Fix deadlock in 'zfs rollback'Tom Caputi2019-08-271-0/+1
| | | | | | | | | | | | | | | | | | | | Currently, the 'zfs rollback' code can end up deadlocked due to the way the kernel handles unreferenced inodes on a suspended fs. Essentially, the zfs_resume_fs() code path may cause zfs to spawn new threads as it reinstantiates the suspended fs's zil. When a new thread is spawned, the kernel may attempt to free memory for that thread by freeing some unreferenced inodes. If it happens to select inodes that are a a part of the suspended fs a deadlock will occur because freeing inodes requires holding the fs's z_teardown_inactive_lock which is still held from the suspend. This patch corrects this issue by adding an additional reference to all inodes that are still present when a suspend is initiated. This prevents them from being freed by the kernel for any reason. Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #9203