aboutsummaryrefslogtreecommitdiffstats
path: root/include/os
Commit message (Collapse)AuthorAgeFilesLines
* FreeBSD: Add option to rewind checkpoint while importing root poolMariusz Zaborski2020-08-191-1/+1
| | | | | | | | This option is used by FreeBSD boot loader. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mariusz Zaborski <[email protected]> Closes #10738
* FreeBSD: Fix UNIX permissions checkingMatthew Macy2020-08-183-0/+140
| | | | | | Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10727
* Make zc_nvlist_src_size limit tunableRyan Moeller2020-08-181-2/+0
| | | | | | | | | | | | | | | | | | We limit the size of nvlists passed to the kernel so a user cannot make the kernel do an unreasonably large allocation. On FreeBSD this limit was 128 kiB, which turns out to be a bit too small when doing some operations involving a large number of datasets or snapshots, for example replication. Make this limit tunable, with a platform-specific auto default. Linux keeps its limit at KMALLOC_MAX_SIZE. FreeBSD uses 1/4 of the system limit on user wired memory, which allows it to scale depending on system configuration. Reviewed-by: Matt Macy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Issue #6572 Closes #10706
* Remove KMC_KMEM and KMC_VMEMMatthew Ahrens2020-08-171-4/+0
| | | | | | | | | | | | | | | | | `KMC_KMEM` and `KMC_VMEM` are now unused since all SPL-implemented caches are `KMC_KVMEM`. KMC_KMEM: Given the default value of `spl_kmem_cache_kmem_limit`, we don't use kmalloc to back the SPL caches, instead we use kvmalloc (KMC_KVMEM). The flag, module parameter, /proc entries, and associated code are removed. KMC_VMEM: This flag is not used, and kvmalloc() is always preferable to vmalloc(). The flag, /proc entries, and associated code are removed. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10673
* FreeBSD: Create taskq threads in appropriate procRyan Moeller2020-08-171-3/+8
| | | | | | | | Stepping stone toward re-enabling spa_thread on FreeBSD. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10715
* Linux 5.9 compat: make_request_fn replaced with submit_bio interfaceColeman Kane2020-08-111-0/+2
| | | | | | | | | | The make_request_fn and associated API was replaced recently in a Linux 5.9 merge, to replace its functionality with a new submit_bio member in struct block_device_operations. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #10696
* Linux 5.9 compat: Update NR_SLAB_RECLAIMABLE to NR_SLAB_RECLAIMABLE_BColeman Kane2020-08-112-0/+21
| | | | | | | | | | | | This change appears to primarily be a name change for the enum. Had to update the test logic so that it works so long as either one of these is present (favoring the newer one). Additionally, as this is newer, it only shows up in node_page_item, so this commit doesn't test zone_page_item for the same enum. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #10696
* Remove KM_NODEBUGMatthew Ahrens2020-08-051-1/+0
| | | | | | | | | Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Ahrens <[email protected]> Closes #10650
* Remove KMC_NOMAGAZINEMatthew Ahrens2020-08-051-2/+0
| | | | | | | | | Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Ahrens <[email protected]> Closes #10650
* Remove KMC_QCACHEMatthew Ahrens2020-08-051-2/+0
| | | | | | | | | Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Ahrens <[email protected]> Closes #10650
* Remove KMC_NOHASHMatthew Ahrens2020-08-051-2/+0
| | | | | | | | | Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Ahrens <[email protected]> Closes #10650
* Remove KMC_NOTOUCHMatthew Ahrens2020-08-052-3/+0
| | | | | | | | | Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Ahrens <[email protected]> Closes #10650
* Remove KMC_OFFSLABMatthew Ahrens2020-08-051-2/+0
| | | | | | | | | Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Ahrens <[email protected]> Closes #10650
* FreeBSD: Add support for lockless lookupMatthew Macy2020-08-052-0/+16
| | | | | | Authored-by: mjg <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10657
* Fix arc__wait__for__eviction tracepointPavel Snajdr2020-08-041-2/+2
| | | | | | | | | | | | | 3442c2a02d added new `arc_wait_for_eviction` tracepoint, which fails to compile, when tracepoints are enabled. The tracepoint definition begins with `DEFINE_ARC_WAIT_FOR_EVICTION_EVENT` and is a multi-line definition, so this fixes the backslash and parenthesis accordingly. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Closes #10669
* Changes to make openzfs build within FreeBSD buildworldMatthew Macy2020-07-318-8/+34
| | | | | | | | | A collection of header changes to enable FreeBSD to build with vendored OpenZFS. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10635
* Revise ARC shrinker algorithmMatthew Ahrens2020-07-312-2/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ARC shrinker callback `arc_shrinker_count/_scan()` is invoked by the kernel's shrinker mechanism when the system is running low on free pages. This happens via 2 code paths: 1. "direct reclaim": The system is attempting to allocate a page, but we are low on memory. The ARC shrinker callback is invoked from the page-allocation code path. 2. "indirect reclaim": kswapd notices that there aren't many free pages, so it invokes the ARC shrinker callback. In both cases, the kernel's shrinker code requests that the ARC shrinker callback release some of its cache, and then it measures how many pages were released. However, it's measurement of released pages does not include pages that are freed via `__free_pages()`, which is how the ARC releases memory (via `abd_free_chunks()`). Rather, the kernel shrinker code is looking for pages to be placed on the lists of reclaimable pages (which is separate from actually-free pages). Because the kernel shrinker code doesn't detect that the ARC has released pages, it may call the ARC shrinker callback many times, resulting in the ARC "collapsing" down to `arc_c_min`. This has several negative impacts: 1. ZFS doesn't use RAM to cache data effectively. 2. In the direct reclaim case, a single page allocation may wait a long time (e.g. more than a minute) while we evict the entire ARC. 3. Even with the improvements made in 67c0f0dedc5 ("ARC shrinking blocks reads/writes"), occasionally `arc_size` may stay above `arc_c` for the entire time of the ARC collapse, thus blocking ZFS read/write operations in `arc_get_data_impl()`. To address these issues, this commit limits the ways that the ARC shrinker callback can be used by the kernel shrinker code, and mitigates the impact of arc_is_overflowing() on ZFS read/write operations. With this commit: 1. We limit the amount of data that can be reclaimed from the ARC via the "direct reclaim" shrinker. This limits the amount of time it takes to allocate a single page. 2. We do not allow the ARC to shrink via kswapd (indirect reclaim). Instead we rely on `arc_evict_zthr` to monitor free memory and reduce the ARC target size to keep sufficient free memory in the system. Note that we can't simply rely on limiting the amount that we reclaim at once (as for the direct reclaim case), because kswapd's "boosted" logic can invoke the callback an unlimited number of times (see `balance_pgdat()`). 3. When `arc_is_overflowing()` and we want to allocate memory, `arc_get_data_impl()` will wait only for a multiple of the requested amount of data to be evicted, rather than waiting for the ARC to no longer be overflowing. This allows ZFS reads/writes to make progress even while the ARC is overflowing, while also ensuring that the eviction thread makes progress towards reducing the total amount of memory used by the ARC. 4. The amount of memory that the ARC always tries to keep free for the rest of the system, `arc_sys_free` is increased. 5. Now that the shrinker callback is able to provide feedback to the kernel's shrinker code about our progress, we can safely enable the kswapd hook. This will allow the arc to receive notifications when memory pressure is first detected by the kernel. We also re-enable the appropriate kstats to track these callbacks. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: George Wilson <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10600
* Rename refcount.h to zfs_refcount.hMatthew Macy2020-07-293-3/+2
| | | | | | | | | Renamed to avoid conflicting with refcount.h when a different implementation is already provided by the platform. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10620
* Introduce names for ZTHRsSerapheim Dimitropoulos2020-07-292-3/+11
| | | | | | | | | | | | When debugging issues or generally analyzing the runtime of a system it would be nice to be able to tell the different ZTHRs running by name rather than having to analyze their stack. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Co-authored-by: Ryan Moeller <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #10630
* Prefix zfs internal endian checks with _ZFSMatthew Macy2020-07-286-65/+32
| | | | | | | | | | | FreeBSD defines _BIG_ENDIAN BIG_ENDIAN _LITTLE_ENDIAN LITTLE_ENDIAN on every architecture. Trying to do cross builds whilst hiding this from ZFS has proven extremely cumbersome. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10621
* Refactor ccompile.h to not include system headersMatthew Macy2020-07-259-136/+175
| | | | | | | | This is a step toward being able to vendor the OpenZFS code in FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10625
* Make use of ZFS_DEBUG consistent within kmod sourcesMatthew Macy2020-07-251-2/+7
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10623
* FreeBSD: Fixes required to build ZFS on PowerPCMatthew Macy2020-07-251-0/+15
| | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10622
* remove kmem_cache module parameter KMC_EXPIRE_AGEMatthew Ahrens2020-07-241-8/+0
| | | | | | | | | | | | | | | | | | | | By default, `spl_kmem_cache_expire` is `KMC_EXPIRE_MEM`, meaning that objects will be removed from kmem cache magazines by `spl_kmem_cache_reap_now()`. There is also a module parameter to change this to `KMC_EXPIRE_AGE`, which establishes a maximum lifetime for objects to stay in the magazine. This setting has rarely, if ever, been used, and is not regularly tested. This commit removes the code for `KMC_EXPIRE_AGE`, and associated module parameters. Additionally, the unused module parameter `spl_kmem_cache_obj_per_slab_min` is removed. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10608
* Remove skc_reclaim, hdr_recl, kmem_cache shrinkerMatthew Ahrens2020-07-191-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SPL kmem_cache implementation provides a mechanism, `skc_reclaim`, whereby individual caches can register a callback to be invoked when there is memory pressure. This mechanism is used in only one place: the ARC registers the `hdr_recl()` reclaim function. This function wakes up the `arc_reap_zthr`, whose job is to call `kmem_cache_reap()` and `arc_reduce_target_size()`. The `skc_reclaim` callbacks are invoked only by shrinker callbacks and `arc_reap_zthr`, and only callback only wakes up `arc_reap_zthr`. When called from `arc_reap_zthr`, waking `arc_reap_zthr` is a no-op. When called from shrinker callbacks, we are already aware of memory pressure and responding to it. Therefore there is little benefit to ever calling the `hdr_recl()` `skc_reclaim` callback. The `arc_reap_zthr` also wakes once a second, and if memory is low when allocating an ARC buffer. Therefore, additionally waking it from the shrinker calbacks has little benefit. The shrinker callbacks can be invoked very frequently, e.g. 10,000 times per second. Additionally, for invocation of the shrinker callback, skc_reclaim is invoked many times. Therefore, this mechanism consumes significant amounts of CPU time. The kmem_cache shrinker calls `spl_kmem_cache_reap_now()`, which, in addition to invoking `skc_reclaim()`, does two things to attempt to free pages for use by the system: 1. Return free objects from the magazine layer to the slab layer 2. Return entirely-free slabs to the page layer (i.e. free pages) These actions apply only to caches implemented by the SPL, not those that use the underlying kernel SLAB/SLUB caches. The SPL caches are used for objects >=32KB, which are primarily linear ABD's cached in the DBUF cache. These actions (freeing objects from the magazine layer and returning entirely-free slabs) are also taken whenever a `kmem_cache_free()` call finds a full magazine. So there would typically be zero entirely-free slabs, and the number of objects in magazines is limited (typically no more than 64 objects per magazine, and there's one magazine per CPU). Therefore the benefit of `spl_kmem_cache_reap_now()`, while nonzero, is modest. We also call `spl_kmem_cache_reap_now()` from the `arc_reap_zthr`, when memory pressure is detected. Therefore, calling `spl_kmem_cache_reap_now()` from the kmem_cache shrinker is not needed. This commit removes the `skc_reclaim` mechanism, its only callback `hdr_recl()`, and the kmem_cache shrinker callback. Reviewed-By: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10576
* FreeBSD: zfs commands backward compatibilityMatthew Macy2020-07-151-531/+14
| | | | | | | | | Update the zfs commands such that they're backwards compatible with the version of ZFS is the base FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10542
* Fixing gang ABD child removal race conditionBrian Atkinson2020-07-141-1/+3
| | | | | | | | | | | | | | | | | | | | On linux the list debug code has been setting off a failure when checking that the node->next->prev value is pointing back at the node. At times this check evaluates to 0xdead. When removing a child from a gang ABD we must acquire the child's abd_mtx to make sure that the same ABD is not being added to another gang ABD while it is being removed from a gang ABD. This fixes a race condition when checking if an ABDs link is already active and part of another gang ABD before adding it to a gang. Added additional debug code for the gang ABD in abd_verify() to make sure each child ABD has active links. Also check to make sure another gang ABD is not added to a gang ABD. Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Brian Atkinson <[email protected]> Closes #10511
* filesystem_limit/snapshot_limit is incorrectly enforced against rootMatthew Ahrens2020-07-112-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The filesystem_limit and snapshot_limit properties limit the number of filesystems or snapshots that can be created below this dataset. According to the manpage, "The limit is not enforced if the user is allowed to change the limit." Two types of users are allowed to change the limit: 1. Those that have been delegated the `filesystem_limit` or `snapshot_limit` permission, e.g. with `zfs allow USER filesystem_limit DATASET`. This works properly. 2. A user with elevated system privileges (e.g. root). This does not work - the root user will incorrectly get an error when trying to create a snapshot/filesystem, if it exceeds the `_limit` property. The problem is that `priv_policy_ns()` does not work if the `cred_t` is not that of the current process. This happens when `dsl_enforce_ds_ss_limits()` is called in syncing context (as part of a sync task's check func) to determine the permissions of the corresponding user process. This commit fixes the issue by passing the `task_struct` (typedef'ed as a `proc_t`) to syncing context, and then using `has_capability()` to determine if that process is privileged. Note that we still need to pass the `cred_t` to syncing context so that we can check if the user was delegated this permission with `zfs allow`. This problem only impacts Linux. Wrappers are added to FreeBSD but it continues to use `priv_check_cred()`, which works on arbitrary `cred_t`. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8226 Closes #10545
* FreeBSD: Use a hash table for taskqid lookupsMatthew Macy2020-07-111-9/+10
| | | | | | | | | Previously a tqent could be recycled prematurely, update the code to use a hash table for lookups to resolve this. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10529
* freebsd: changes necessary to coexist with dtrace in treeMatthew Macy2020-07-015-5/+13
| | | | | | | | Fix header conflicts when building zfs with openzfs as a vendor import. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10497
* Clean up OS-specific ARC and kmem codeMatthew Ahrens2020-06-293-15/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OS-specific code (e.g. under `module/os/linux`) does not need to share its code structure with any other operating systems. In particular, the ARC and kmem code need not be similar to the code in illumos, because we won't be syncing this OS-specific code between operating systems. For example, if/when illumos support is added to the common repo, we would add a file `module/os/illumos/zfs/arc_os.c` for the illumos versions of this code. Therefore, we can simplify the code in the OS-specific ARC and kmem routines. These changes do not impact system behavior, they are purely code cleanup. The changes are: Arenas are not used on Linux or FreeBSD (they are always `NULL`), so `heap_arena`, `zio_arena`, and `zio_alloc_arena` can be removed, along with code that uses them. In `arc_available_memory()`: * `desfree` is unused, remove it * rename `freemem` to avoid conflict with pre-existing `#define` * remove checks related to arenas * use units of bytes, rather than converting from bytes to pages and then back to bytes `SPL_KMEM_CACHE_REAP` is unused, remove it. `skc_reap` is unused, remove it. The `count` argument to `spl_kmem_cache_reap_now()` is unused, remove it. `vmem_size()` and associated type and macros are unused, remove them. In `arc_memory_throttle()`, use a less confusing variable name to store the result of `arc_free_memory()`. Reviewed-by: George Wilson <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10499
* Avoid installing kernel headers on FreeBSDArvind Sankar2020-06-276-20/+7
| | | | | | | | | The kernel headers are installed for DKMS on linux, so don't install them unless we're building on linux. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10506
* Style fixesBrian Behlendorf2020-06-271-1/+1
| | | | | | | * Fix cstyle issue in shrinker.h which exceeded 80 columns. * Silence shellcheck warning in zpool.d/smart script. Signed-off-by: Brian Behlendorf <[email protected]>
* Revise SPL wrapper for shrinker callbacksMatthew Ahrens2020-06-271-82/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SPL provides a wrapper for the kernel's shrinker callbacks, which enables the ZFS code to interface with multiple versions of the shrinker API's from different kernel versions. Specifically, Linux kernels 3.0 - 3.11 has a single "combined" callback, and Linux kernels 3.12 and later have two "split" callbacks. The SPL provides a wrapper function so that the ZFS code only needs to implement one version of the callbacks. Currently the SPL's wrappers are designed such that the ZFS code implements the older, "combined" callback. There are a few downsides to this approach: * The general design within ZFS is for the latest Linux kernel to be considered the "first class" API. * The newer, "split" callback API is easier to understand, because each callback has one purpose. * The current wrappers do not completely abstract out the differing API's, so ZFS code needs `#ifdef` code to handle the differing return values required for different kernel versions. This commit addresses these drawbacks by having the ZFS code provide the latest, "split" callbacks, and the SPL provides a wrapping function for the older, "combined" API. Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10502
* Use percpu_counter for obj_alloc counter of Linux-backed cachesSerapheim Dimitropoulos2020-06-263-0/+46
| | | | | | | | | | | | | | | | | | | | | | | A previous commit enabled the tracking of object allocations in Linux-backed caches from the SPL layer for debuggability. The commit is: 9a170fc6fe54f1e852b6c39630fe5ef2bbd97c16 Unfortunately, it also introduced minor performance regressions that were highlighted by the ZFS perf test-suite. Within Delphix we found that the regression would be from -1%, all the way up to -8% for some workloads. This commit brings performance back up to par by creating a separate counter for those caches and making it a percpu in order to avoid lock-contention. The initial performance testing was done by myself, and the final round was conducted by @tonynguien who was also the one that discovered the regression and highlighted the culprit. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #10397
* Fixes for make distArvind Sankar2020-06-266-21/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduce the usage of EXTRA_DIST. If files are conditionally included in _SOURCES, _HEADERS etc, automake is smart enough to dist all files that could possibly be included, but this does not apply to EXTRA_DIST, resulting in make dist depending on the configuration. Add some files that were missing altogether in various Makefile's. The changes to disted files in this commit (excluding deleted files): +./cmd/zed/agents/README.md +./etc/init.d/README.md +./lib/libspl/os/freebsd/getexecname.c +./lib/libspl/os/freebsd/gethostid.c +./lib/libspl/os/freebsd/getmntany.c +./lib/libspl/os/freebsd/mnttab.c -./lib/libzfs/libzfs_core.pc -./lib/libzfs/libzfs.pc +./lib/libzfs/os/freebsd/libzfs_compat.c +./lib/libzfs/os/freebsd/libzfs_fsshare.c +./lib/libzfs/os/freebsd/libzfs_ioctl_compat.c +./lib/libzfs/os/freebsd/libzfs_zmount.c +./lib/libzutil/os/freebsd/zutil_compat.c +./lib/libzutil/os/freebsd/zutil_device_path_os.c +./lib/libzutil/os/freebsd/zutil_import_os.c +./module/lua/README.zfs +./module/os/linux/spl/README.md +./tests/README.md +./tests/zfs-tests/tests/functional/cli_root/zfs_clone/zfs_clone_rm_nested.ksh +./tests/zfs-tests/tests/functional/cli_root/zfs_send/zfs_send_encrypted_unloaded.ksh +./tests/zfs-tests/tests/functional/inheritance/README.config +./tests/zfs-tests/tests/functional/inheritance/README.state +./tests/zfs-tests/tests/functional/rsend/rsend_016_neg.ksh +./tests/zfs-tests/tests/perf/fio/sequential_readwrite.fio Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10501
* Drop unnecessary srcdir pathsArvind Sankar2020-06-249-180/+180
| | | | | | | | | | There's no need to specify the srcdir explicitly in _HEADERS and EXTRA_DIST. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10493
* Add zfs_multihost_interval tunable handler for FreeBSDRyan Moeller2020-06-231-0/+3
| | | | | | | | | | This tunable required a handler to be implemented for ZFS_MODULE_PARAM_CALL. Add the handler so the tunable can be declared in common code. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10490
* Add prototypesArvind Sankar2020-06-184-13/+27
| | | | | | | | | Add prototypes/move prototypes to header files. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10470
* Remove dead codeArvind Sankar2020-06-181-1/+0
| | | | | | | | | Delete unused functions. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10470
* Mark functions as staticArvind Sankar2020-06-182-30/+4
| | | | | | | | | | | Mark functions used only in the same translation unit as static. This only includes functions that do not have a prototype in a header file either. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Closes #10470
* linux: add basic fallocate(mode=0/2) compatibilityadilger2020-06-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement semi-compatible functionality for mode=0 (preallocation) and mode=FALLOC_FL_KEEP_SIZE (preallocation beyond EOF) for ZPL. Since ZFS does COW and snapshots, preallocating blocks for a file cannot guarantee that writes to the file will not run out of space. Even if the first overwrite was guaranteed, it would not handle any later overwrite of blocks due to COW, so strict compliance is futile. Instead, make a best-effort check that at least enough free space is currently available in the pool (with a bit of margin), then create a sparse file of the requested size and continue on with life. This does not handle all cases (e.g. several fallocate() calls before writing into the files when the filesystem is nearly full), which would require a more complex mechanism to be implemented, probably based on a modified version of dmu_prealloc(), but is usable as-is. A new module option zfs_fallocate_reserve_percent is used to control the reserve margin for any single fallocate call. By default, this is 110% of the requested preallocation size, so an additional 10% of available space is reserved for overhead to allow the application a good chance of finishing the write when the fallocate() succeeds. If the heuristics of this basic fallocate implementation are not desirable, the old non-functional behavior of returning EOPNOTSUPP for calls can be restored by setting zfs_fallocate_reserve_percent=0. The parameter of zfs_statvfs() is changed to take an inode instead of a dentry, since no dentry is available in zfs_fallocate_common(). A few tests from @behlendorf cover basic fallocate functionality. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Arshad Hussain <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Signed-off-by: Andreas Dilger <[email protected]> Issue #326 Closes #10408
* Disambiguate condvar API contractMatthew Macy2020-06-182-16/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | On Illumos callers of cv_timedwait and cv_timedwait_hires can't distinguish between whether or not the cv was signaled or the call timed out. Illumos handles this (for some definition of handles) by calling cv_signal in the return path if we were signaled but the return value indicates instead that we timed out. This would make sense if it were possible to query the the cv for its net signal disposition. However, this isn't possible and, in spite of the fact that there are places in the code that clearly take a different and incompatible path if a timeout value is indicated, this distinction appears to be rather subtle to most developers. This problem is further compounded by the fact that on Linux, calling cv_signal in the return path wouldn't even do the right thing unless there are other waiters. Since it is possible for the caller to independently determine how much time is remaining but it is not possible to query if the cv was in fact signaled, prioritizing signalling over timeout seems like a cleaner solution. In addition, judging from usage patterns within the code itself, it is also less error prone. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10471
* Fix FreeBSD condvar semanticsRyan Moeller2020-06-161-7/+20
| | | | | | | | | We should return -1 instead of negative deltas, and 0 if signaled. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10460
* Add convenience wrappers for common uio usageJorgen Lundman2020-06-142-0/+66
| | | | | | | | | The macOS uio struct is opaque and the API must be used, this makes the smallest changes to the code for all platforms. Reviewed-by: Matt Macy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #10412
* FreeBSD: Don't require zeroing new locks before initRyan Moeller2020-06-131-4/+2
| | | | | | | | | This has not shown to be of use enough to justify the inconvenience. Reviewed-by: Matt Macy <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Allan Jude <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10449
* Remove unnecessary references to slaveryMatthew Ahrens2020-06-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | The horrible effects of human slavery continue to impact society. The casual use of the term "slave" in computer software is an unnecessary reference to a painful human experience. This commit removes all possible references to the term "slave". Implementation notes: The zpool.d/slaves script is renamed to dm-deps, which uses the same terminology as `dmsetup deps`. References to the `/sys/class/block/$dev/slaves` directory remain. This directory name is determined by the Linux kernel. Although `dmsetup deps` provides the same information, it unfortunately requires elevated privileges, whereas the `/sys/...` directory is world-readable. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10435
* Fix typosAndrea Gelmini2020-06-094-5/+5
| | | | | | | | | Correct various typos in the comments and tests. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrea Gelmini <[email protected]> Closes #10423
* Move GFP flags kernel compatibility codeMichael Niewöhner2020-06-081-0/+12
| | | | | | | Move the GFP flags kernel compat code from c file to kmem header. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Michael Niewöhner <[email protected]> Closes #10424
* Linux 5.8 compat: __vmalloc()Michael Niewöhner2020-06-081-0/+9
| | | | | | | | | | | | | | | The `pgprot` argument has been removed from `__vmalloc` in Linux 5.8, being `PAGE_KERNEL` always now [1]. Detect this during configure and define a wrapper for older kernels. [1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/mm/vmalloc.c?h=next-20200605&id=88dca4ca5a93d2c09e5bbc6a62fbfc3af83c4fca Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Sebastian Gottschall <[email protected]> Co-authored-by: Michael Niewöhner <[email protected]> Signed-off-by: Sebastian Gottschall <[email protected]> Signed-off-by: Michael Niewöhner <[email protected]> Closes #10422