summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* ZTS: ztest may cause mmp tests failuresBrian Behlendorf2020-08-171-0/+2
| | | | | | | | | | | | The mmp_exported_import and mmp_inactive_import tests depend on ztest simulating an active pool. If ztest unexpectedly terminates due to an unrelated issue the test case will fail. Since ztest is not yet 100% reliable I've added these tests to the maybe exception list. They can be removed when the issues with ztest are resolved or if the test cases are updated to handle these unexpected failures. Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #10726
* Include scatter_chunk_waste in arc_sizeMatthew Ahrens2020-08-176-11/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ARC caches data in scatter ABD's, which are collections of pages, which are typically 4K. Therefore, the space used to cache each block is rounded up to a multiple of 4K. The ABD subsystem tracks this wasted memory in the `scatter_chunk_waste` kstat. However, the ARC's `size` is not aware of the memory used by this round-up, it only accounts for the size that it requested from the ABD subsystem. Therefore, the ARC is effectively using more memory than it is aware of, due to the `scatter_chunk_waste`. This impacts observability, e.g. `arcstat` will show that the ARC is using less memory than it effectively is. It also impacts how the ARC responds to memory pressure. As the amount of `scatter_chunk_waste` changes, it appears to the ARC as memory pressure, so it needs to resize `arc_c`. If the sector size (`1<<ashift`) is the same as the page size (or larger), there won't be any waste. If the (compressed) block size is relatively large compared to the page size, the amount of `scatter_chunk_waste` will be small, so the problematic effects are minimal. However, if using 512B sectors (`ashift=9`), and the (compressed) block size is small (e.g. `compression=on` with the default `volblocksize=8k` or a decreased `recordsize`), the amount of `scatter_chunk_waste` can be very large. On a production system, with `arc_size` at a constant 50% of memory, `scatter_chunk_waste` has been been observed to be 10-30% of memory. This commit adds `scatter_chunk_waste` to `arc_size`, and adds a new `waste` field to `arcstat`. As a result, the ARC's memory usage is more observable, and `arc_c` does not need to be adjusted as frequently. Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10701
* Remove KMC_KMEM and KMC_VMEMMatthew Ahrens2020-08-175-151/+10
| | | | | | | | | | | | | | | | | `KMC_KMEM` and `KMC_VMEM` are now unused since all SPL-implemented caches are `KMC_KVMEM`. KMC_KMEM: Given the default value of `spl_kmem_cache_kmem_limit`, we don't use kmalloc to back the SPL caches, instead we use kvmalloc (KMC_KVMEM). The flag, module parameter, /proc entries, and associated code are removed. KMC_VMEM: This flag is not used, and kvmalloc() is always preferable to vmalloc(). The flag, /proc entries, and associated code are removed. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10673
* FreeBSD: fix the build with Clang 11Ryan Moeller2020-08-176-2/+12
| | | | | | | | | | | | * Cast void * to uintptr_t before casting to boolean_t. * Avoid clashing definition of __asm when not on Linux to prevent duplicate __volatile__. This was already done in some places but not all. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10723
* FreeBSD: fix merge error in zfs_acl_ids_createMatthew Macy2020-08-171-27/+16
| | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10721
* Fix typo in btree.cSerapheim Dimitropoulos2020-08-171-2/+2
| | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #10725
* FreeBSD: fallback to /boot/ to look for zpool.cacheMatthew Macy2020-08-172-1/+5
| | | | | | | | | Up until now zpool.cache has always lived in /boot on FreeBSD. For the sake of compatibility fallback to /boot if zpool.cache isn't found in /etc/zfs. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10720
* Fix reporting of L2ARC writes in arc_summary3George Amanakis2020-08-171-3/+3
| | | | | | | | | | arc_summary3 reports L2ARC writes in bytes. However, the related arc_stat is reported as hits. arc_summary2 report this correctly. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10717
* Fix l2arc_dev_rebuild_start thread nameRyan Moeller2020-08-171-4/+5
| | | | | | | | | | | | | | | `thread_create` on FreeBSD stringifies the argument passed as the thread function to create a name for the thread. The thread name for `l2arc_dev_rebuild_start` ended up with `(void (*)(void *))` in it. Change the type signature so the function does not need to be cast when creating the thread. Rename the function to `l2arc_dev_rebuild_thread` for clarity and consistency, as well. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10716
* FreeBSD: Create taskq threads in appropriate procRyan Moeller2020-08-174-19/+27
| | | | | | | | Stepping stone toward re-enabling spa_thread on FreeBSD. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10715
* Fix L2ARC reads when compressed ARC disabledAllan Jude2020-08-135-2/+215
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reading compressed blocks from the L2ARC, with compressed ARC disabled, arc_hdr_size() returns LSIZE rather than PSIZE, but the actual read is PSIZE. This causes l2arc_read_done() to compare the checksum against the wrong size, resulting in checksum failure. This manifests as an increase in the kstat l2_cksum_bad and the read being retried from the main pool, making the L2ARC ineffective. Add new L2ARC tests with Compressed ARC enabled/disabled Blocks are handled differently depending on the state of the zfs_compressed_arc_enabled tunable. If a block is compressed on-disk, and compressed_arc is enabled: - the block is read from disk - It is NOT decompressed - It is added to the ARC in its compressed form - l2arc_write_buffers() may write it to the L2ARC (as is) - l2arc_read_done() compares the checksum to the BP (compressed) However, if compressed_arc is disabled: - the block is read from disk - It is decompressed - It is added to the ARC (uncompressed) - l2arc_write_buffers() will use l2arc_apply_transforms() to recompress the block, before writing it to the L2ARC - l2arc_read_done() compares the checksum to the BP (compressed) - l2arc_read_done() will use l2arc_untransform() to uncompress it This test writes out a test file to a pool consisting of one disk and one cache device, then randomly reads from it. Since the arc_max in the tests is low, this will feed the L2ARC, and result in reads from the L2ARC. We compare the value of the kstat l2_cksum_bad before and after to determine if any blocks failed to survive the trip through the L2ARC. Sponsored-by: The FreeBSD Foundation Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #10693
* Release onexit/events with any missed zfsdev_stateJorgen Lundman2020-08-133-7/+12
| | | | | | | | | | | | | | Linux and FreeBSD will most likely never see this issue. On macOS when kext is unloaded, but zed is still connected, zed will be issued ENODEV. As the cdevsw is released, the kernel will not have zfsdev_release() called to release minor/onexit/events, and it "leaks". This ensures it is cleaned up before unload. Changed the for loop from zsprev, to zsnext style, for less code duplication. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #10700
* Github workflow: checkstyleGeorge Melikov2020-08-131-0/+27
| | | | | | | | | | | | Use github workflow to run checkstyle - use free (for OS projects) resources - starts for every commit and branch - work on forks, contributors may use it before creating PRs Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #10705
* cstyle.pl: echo commands for github workflowGeorge Melikov2020-08-131-4/+9
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #10705
* Remove stale .travis.ymlGeorge Melikov2020-08-131-38/+0
| | | | | | | | | | - It doesn't work now. - It has to be manually edited on tests changes. (even on test runtime changes!) - Travis gives too small time to run to be useful. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #10704
* Use zfs_dbgmsg to log metaslab_load/unloadMatthew Ahrens2020-08-121-19/+34
| | | | | | | | | | | | | Metaslabs are now (usually) loaded and unloaded infrequently, but when that is not the case, it is useful to have a log of when and why these events happened. This commit enables the zfs_dbgmsg() in metaslab_load(), and adds a zfs_dbgmsg() in metaslab_unload(). Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10683
* Restore ARC MFU/MRU pressureMatthew Macy2020-08-121-22/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The arc_adapt() function tunes LRU/MLU balance according to 4 types of cache hits (which is passed as state agrument): ghost LRU, LRU, MRU, ghost MRU. If this function is called with wrong cache hit (state), adaptation will be sub-optimal and performance will suffer. Some time ago upstream received this commit: 6950 ARC should cache compressed data) in arc_read() do next sequence (access to ghost buffer) Before this commit, hit to any ghost list was passed arc_adapt() before call to arc_access() which revive element in cache and change state from ghost to real hit. After this commit, the order of calls was reverted and arc_adapt() is now called only with «real» hits even if hit was in one of two ghost lists, which renders ghost lists useless and breaks the ARC algorithm. FreeBSD fixed this problem locally in Change D19094 / Commit r348772. This change is an adaptation of the above commit to the current arc code. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10548 Closes #10618
* 'zfs share -a' should handle 'canmount=noauto'George Wilson2020-08-112-2/+21
| | | | | | | | | | | | | | | | The 'zfs share -a' currently skips any filesystems which have 'canmount=noauto' set. This behavior is unexpected since the one would expect 'zfs share -a' to share any mounted filesystem that has the 'sharenfs' property already set. This changes the behavior of 'zfs share -a' to allow the sharing of 'canmount=noauto' datasets if they are mounted. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Don Brady <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: George Wilson <[email protected]> External-issue: DLPX-71313 Closes #10688
* FreeBSD: Fix module autoloading when built in baseMatthew Macy2020-08-111-2/+12
| | | | | | | | | | | The KMOD name is "zfs" instead of "openzfs" when building in FreeBSD. Define a ZFS_KMOD symbol as "zfs" when IN_BASE is defined, otherwise "openzfs". Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Ryan Moeller <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10699
* Linux 5.9 compat: make_request_fn replaced with submit_bio interfaceColeman Kane2020-08-113-34/+70
| | | | | | | | | | The make_request_fn and associated API was replaced recently in a Linux 5.9 merge, to replace its functionality with a new submit_bio member in struct block_device_operations. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #10696
* Linux 5.9 compat: Update NR_SLAB_RECLAIMABLE to NR_SLAB_RECLAIMABLE_BColeman Kane2020-08-113-1/+26
| | | | | | | | | | | | This change appears to primarily be a name change for the enum. Had to update the test logic so that it works so long as either one of these is present (favoring the newer one). Additionally, as this is newer, it only shows up in node_page_item, so this commit doesn't test zone_page_item for the same enum. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #10696
* Linux 5.9 compat: add linux/blkdev.h includeColeman Kane2020-08-112-0/+6
| | | | | | | | | | | | | Many of the block device operations (often functions with bdev in the name) were moved into linux/blkdev.h from linux/fs.h. Seems that this header is already included where needed in the code, but in the autoconf tests it was missing causing false negatives. This commit has those tests include linux/fs.h (old location) and now also linux/blkdev.h (new locations). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #10696
* Fix typoAllan Jude2020-08-111-1/+1
| | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #10694
* Move ZVOL_DIR back to zfs.hRyan Moeller2020-08-112-1/+3
| | | | | | | | | | This was previously moved because nothing else in-tree uses it, but evidently DilOS uses it out of tree. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10361 Closes #10685
* FreeBSD: update vaccess signature on most recent HEADMatthew Macy2020-08-071-0/+5
| | | | | | Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10682
* Clarify error message when a range-tree double-add occursPaul Dagnelie2020-08-071-8/+22
| | | | | | | | | | | | | In various other pieces of logic have resulted in situations where we double-free space in ZFS. This in turn results in a double-add to the range trees. These issues have been much more difficult to diagnose than they should have been, because the error handling around this case is much weaker than around the double remove case. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #10654
* ZTS: Remove bashisms from zfs-tests.shRyan Moeller2020-08-072-81/+70
| | | | | | | | Bring zfs-tests.sh in to compliance with the other scripts by converting it /bin/sh for to avoid a dependency on bash. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10640
* Remove commented-out codeMatthew Ahrens2020-08-051-27/+0
| | | | | | | | | Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Ahrens <[email protected]> Closes #10650
* Remove KM_NODEBUGMatthew Ahrens2020-08-051-1/+0
| | | | | | | | | Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Ahrens <[email protected]> Closes #10650
* Remove KMC_NOMAGAZINEMatthew Ahrens2020-08-053-11/+2
| | | | | | | | | Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Ahrens <[email protected]> Closes #10650
* Remove KMC_QCACHEMatthew Ahrens2020-08-052-4/+0
| | | | | | | | | Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Ahrens <[email protected]> Closes #10650
* Remove KMC_NOHASHMatthew Ahrens2020-08-053-6/+0
| | | | | | | | | Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Ahrens <[email protected]> Closes #10650
* Remove KMC_NOTOUCHMatthew Ahrens2020-08-054-5/+1
| | | | | | | | | Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Ahrens <[email protected]> Closes #10650
* Remove KMC_OFFSLABMatthew Ahrens2020-08-052-96/+39
| | | | | | | | | Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Ahrens <[email protected]> Closes #10650
* Fix i/o error handling of livelists and zap iterationMatthew Ahrens2020-08-054-69/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pool-wide metadata is stored in the MOS (Meta Object Set). This metadata is stored in triplicate, in addition to any pool-level reduncancy (e.g. RAIDZ). However, if all 3+ copies of this metadata are not available, we can still get EIO/ECKSUM when reading from the MOS. If we encounter such an error in syncing context, we have typically already committed to making a change that we now can't do because of the corrupt/missing metadata. We typically "handle" this with a `VERIFY()` or `zfs_panic_recover()`. This prevents the system from continuing on in an undefined state, while minimizing the amount of error-handling code. However, there are some code paths that ignore these i/o errors, or `ASSERT()` that they don't happen. Since assertions are disabled on non-debug builds, they effectively ignore them as well. This can lead to ZFS continuing on in an incorrect state, potentially leading to on-disk inconsistencies. This commit adds handling for these i/o errors on MOS metadata, typically with a `VERIFY()`: * Handle error return from `zap_cursor_retrieve()` in 4 places in `dsl_deadlist.c`. * Handle error return from `zap_contains()` in `dsl_dir_hold_obj()`. Turns out this call isn't necessary because we can always call `zap_lookup()`. * Handle error return from `zap_lookup()` in `dsl_fs_ss_limit_check()`. * Handle error return from `zap_remove()` in `dsl_dir_rename_sync()`. * Handle error return from `zap_lookup()` in `dsl_dir_remove_livelist()`. * Handle error return from `dsl_process_sub_livelist()` in `spa_livelist_delete_cb()`. Additionally: * Augment the internal history log message for `zfs destroy` to note which method is used (e.g. bptree, livelist, or, synchronous) and the mintxg. * Correct a comment in `dbuf_init()`. * Correct indentation in `dsl_dir_remove_livelist()`. Reviewed by: Sara Hartse <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10643
* FreeBSD: Add support for lockless lookupMatthew Macy2020-08-057-12/+176
| | | | | | Authored-by: mjg <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10657
* Add missed thread_exit() to vdev_{autotrim,rebuild}_threadMatthew Macy2020-08-052-0/+4
| | | | | | | Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10668
* Fix arc__wait__for__eviction tracepointPavel Snajdr2020-08-041-2/+2
| | | | | | | | | | | | | 3442c2a02d added new `arc_wait_for_eviction` tracepoint, which fails to compile, when tracepoints are enabled. The tracepoint definition begins with `DEFINE_ARC_WAIT_FOR_EVICTION_EVENT` and is a multi-line definition, so this fixes the backslash and parenthesis accordingly. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Closes #10669
* Verify zfs module loaded before starting servicesJonathon2020-08-013-3/+3
| | | | | | | | | | | | | | | This is a minor change to the systemd service templates that verifies the zfs kernel module is loaded by the kernel prior to attempting to import any zpool. The services check for the presence of /sys/module/zfs which indicates the zfs is module is loaded. This uses the systemd built-in check ConditionPathIsDirectory. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Thode <[email protected]> Signed-off-by: Jonathon Fernyhough <[email protected]> Closes #10663
* Fix logging in l2arc_rebuild()George Amanakis2020-08-011-0/+7
| | | | | | | | | | | | | | | | | | | | | In case the L2ARC rebuild was canceled, do not log to spa history log as the pool may be in the process of being removed and a panic may occur: BUG: kernel NULL pointer dereference, address: 0000000000000018 RIP: 0010:spa_history_log_internal+0xb1/0x120 [zfs] Call Trace: l2arc_rebuild+0x464/0x7c0 [zfs] l2arc_dev_rebuild_start+0x2d/0x130 [zfs] ? l2arc_rebuild+0x7c0/0x7c0 [zfs] thread_generic_wrapper+0x78/0xb0 [spl] kthread+0xfb/0x130 ? IS_ERR+0x10/0x10 [spl] ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10659
* FreeBSD: Fix `zfs jail` and add a testRyan Moeller2020-08-0110-2/+194
| | | | | | | | | zfs_jail was not using zfs_ioctl so failed to map the IOC number correctly. Use zfs_ioctl to perform the jail ioctl and add a test case for FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10658
* Fix page fault in zfsctl_snapdir_getattrMatthew Macy2020-08-011-1/+2
| | | | | | | | | | | Must acquire the z_teardown_lock before accessing the zfsvfs_t object. I can't reproduce this panic on demand, but this looks like the correct solution. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Authored-by: asomers <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10656
* Change the error handling for invalid property valuesAllan Jude2020-08-015-2/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | ZFS recv should return a useful error message when an invalid index property value is provided in the send stream properties nvlist With a compression= property outside of the understood range: Before: ``` receiving full stream of zof/zstd_send@send2 into testpool/recv@send2 internal error: Invalid argument Aborted (core dumped) ``` Note: the recv completes successfully, the abort() is likely just to make it easier to track the unexpected error code. After: ``` receiving full stream of zof/zstd_send@send2 into testpool/recv@send2 cannot receive compression property on testpool/recv: invalid property value received 28.9M stream in 1 seconds (28.9M/sec) ``` Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #10631
* Changes to make openzfs build within FreeBSD buildworldMatthew Macy2020-07-3121-29/+79
| | | | | | | | | A collection of header changes to enable FreeBSD to build with vendored OpenZFS. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10635
* Convert Linux-isms to FreeBSD-isms in platform zfs_debug.cRyan Moeller2020-07-311-11/+8
| | | | | | | | | | | Change some comments copied from the Linux code to describe the appropriate methods on FreeBSD. Convert some tunables to ZFS_MODULE_PARAM so they get created on FreeBSD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10647
* ZTS: FreeBSD does have a l2arc.trim_ahead tunableRyan Moeller2020-07-311-1/+1
| | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10633
* Revise ARC shrinker algorithmMatthew Ahrens2020-07-317-200/+387
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ARC shrinker callback `arc_shrinker_count/_scan()` is invoked by the kernel's shrinker mechanism when the system is running low on free pages. This happens via 2 code paths: 1. "direct reclaim": The system is attempting to allocate a page, but we are low on memory. The ARC shrinker callback is invoked from the page-allocation code path. 2. "indirect reclaim": kswapd notices that there aren't many free pages, so it invokes the ARC shrinker callback. In both cases, the kernel's shrinker code requests that the ARC shrinker callback release some of its cache, and then it measures how many pages were released. However, it's measurement of released pages does not include pages that are freed via `__free_pages()`, which is how the ARC releases memory (via `abd_free_chunks()`). Rather, the kernel shrinker code is looking for pages to be placed on the lists of reclaimable pages (which is separate from actually-free pages). Because the kernel shrinker code doesn't detect that the ARC has released pages, it may call the ARC shrinker callback many times, resulting in the ARC "collapsing" down to `arc_c_min`. This has several negative impacts: 1. ZFS doesn't use RAM to cache data effectively. 2. In the direct reclaim case, a single page allocation may wait a long time (e.g. more than a minute) while we evict the entire ARC. 3. Even with the improvements made in 67c0f0dedc5 ("ARC shrinking blocks reads/writes"), occasionally `arc_size` may stay above `arc_c` for the entire time of the ARC collapse, thus blocking ZFS read/write operations in `arc_get_data_impl()`. To address these issues, this commit limits the ways that the ARC shrinker callback can be used by the kernel shrinker code, and mitigates the impact of arc_is_overflowing() on ZFS read/write operations. With this commit: 1. We limit the amount of data that can be reclaimed from the ARC via the "direct reclaim" shrinker. This limits the amount of time it takes to allocate a single page. 2. We do not allow the ARC to shrink via kswapd (indirect reclaim). Instead we rely on `arc_evict_zthr` to monitor free memory and reduce the ARC target size to keep sufficient free memory in the system. Note that we can't simply rely on limiting the amount that we reclaim at once (as for the direct reclaim case), because kswapd's "boosted" logic can invoke the callback an unlimited number of times (see `balance_pgdat()`). 3. When `arc_is_overflowing()` and we want to allocate memory, `arc_get_data_impl()` will wait only for a multiple of the requested amount of data to be evicted, rather than waiting for the ARC to no longer be overflowing. This allows ZFS reads/writes to make progress even while the ARC is overflowing, while also ensuring that the eviction thread makes progress towards reducing the total amount of memory used by the ARC. 4. The amount of memory that the ARC always tries to keep free for the rest of the system, `arc_sys_free` is increased. 5. Now that the shrinker callback is able to provide feedback to the kernel's shrinker code about our progress, we can safely enable the kswapd hook. This will allow the arc to receive notifications when memory pressure is first detected by the kernel. We also re-enable the appropriate kstats to track these callbacks. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: George Wilson <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10600
* ZTS: zvol_misc_volmode is flaky on FreeBSDRyan Moeller2020-07-311-0/+1
| | | | | | | Mark this as a known issue. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10655
* ZTS: Use POSIX-compatible space character classRyan Moeller2020-07-311-1/+1
| | | | | | | | | | FreeBSD recently integrated a change which causes \s in a regex to throw an error instead of silently being misinterpreted as an s. Change the regex in zpool_colors.ksh to use [[:space:]]. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10651
* lua: Increase reserved stack space for FreeBSD in debug configRyan Moeller2020-07-311-0/+9
| | | | | | | | | | | | | | | | FreeBSD uses more stack space in debug configurations and can overflow the stack while formatting the error message when the call depth limit of 20 frames is reached. This is readily reproduced by running the gsub recursion test with increased kstack size. I hit the panic with 16 pages per kstack, and noticed it go away when bumped to 17. Reserve an additional 64 bytes on the stack when building for FreeBSD. This is enough to avoid the panic with a deep stack while not wasting too much space when the default stack size is used. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10634