aboutsummaryrefslogtreecommitdiffstats
path: root/module/zfs
Commit message (Collapse)AuthorAgeFilesLines
* Consolidate zfs_holey and zfs_accessMatthew Macy2020-10-311-0/+92
| | | | | | | | | The zfs_holey() and zfs_access() functions can be made common to both FreeBSD and Linux. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #11125
* Remove UIO_ZEROCOPY functions structuresMatthew Macy2020-10-302-173/+0
| | | | | | | | | | The original xuio zero copy functionality has always been unused on Linux and FreeBSD. Remove this disabled code to avoid any confusion and improve readability. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #11124
* Yield periodically when rebuilding L2ARCAlexander Motin2020-10-301-0/+1
| | | | | | | | | | | | | | | | L2ARC devices of several terabytes filled with 4KB blocks may take 15 minutes to rebuild. Due to the way L2ARC log reading is implemented it is quite likely that for all that time rebuild thread will never sleep. At least on FreeBSD kernel threads have absolute priority and can not be preempted by threads with lower priorities. If some thread is also bound to that specific CPU it may not get any CPU time for all the 15 minutes. Reviewed-by: Cedric Berger <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: George Amanakis <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #11116
* Update references to nonexistent man pages in codeRyan Moeller2020-10-306-7/+7
| | | | | | | | Refer to the correct section or alternative for FreeBSD and Linux. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11132
* Add missing zfs_arc_evict_batch_limit tunableRyan Moeller2020-10-221-1/+4
| | | | | | | | It's even documented already. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11094
* Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSDMatthew Macy2020-10-212-0/+638
| | | | | | | | | | The zfs_fsync, zfs_read, and zfs_write function are almost identical between Linux and FreeBSD. With a little refactoring they can be moved to the common code which is what is done by this commit. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #11078
* Non-l2arc pool reads shouldn't be l2arc missesAdam D. Moss2020-10-201-8/+21
| | | | | | | | | | | | | | | | | | | | The current l2_misses accounting behavior treats all reads to pools without a configured l2arc as an l2arc miss, IFF there is at least one other pool on the system which does have an l2arc configured. This makes it extremely hard to tune for an improved l2arc hit/miss ratio because this ratio will be modulated by reads from pools which do not (and should not) have l2arc devices; its upper limit will depend on the ratio of reads from l2arc'd pools and non-l2arc'd pools. This PR prevents ARC reads affecting l2arc stats (n.b. l2_misses is the only relevant one) where the target spa doesn't have an l2arc. Includes new test - l2arc_l2miss_pos.ksh Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Adam Moss <[email protected]> Closes #10921
* Ignore special vdev ashift for spa ashift min/maxDon Brady2020-10-152-17/+25
| | | | | | | | | | | | | | | | | | | | | The removal of a vdev in the normal class would fail if there was a special or deup vdev that had a different ashift than the vdevs in the normal class. Moved the initialization of spa_min_ashift / spa_max_ashift from vdev_open so that it occurs after the vdev allocation bias was initialized (i.e. after vdev_load). Caveat -- In order to remove a special/dedup vdev it must have the same ashift as the normal pool vdevs. This could perhaps be lifted in the future (i.e. for the case where there is ample space in any surviving special class vdevs) Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #9363 Closes #9364 Closes #11053
* Fix crash caused by invalid snapshot names in redactnvlChristian Schwarz2020-10-141-1/+1
| | | | | | | | | | This is a follow up fix for commit 0fdd6106bb. The VERIFY is only true when we haven't hit an error code path. See added test case for a reproducer. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Christian Schwarz <[email protected]> Closes #11048
* Fix incorrect deletion order in range_tree_add_impl gap casePaul Dagnelie2020-10-141-1/+1
| | | | | | | | | | | | After a side-effectful call like add or remove, references to range segs stored in btrees can no longer be used safely. We move the remove call to just before the reinsertion call so that the seg remains valid for as long as we need it. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #11044 Closes #11056
* dmu_zfetch: don't leak unreferenced stream when zfetch is freedMatthew Macy2020-10-131-2/+6
| | | | | | | | | | | | | | | | | | | | | Currently streams are only freed when: - They have no referencing zfetch and and their I/O references go to zero. - They are more than 2s old and a new I/O request comes in on the same zfetch. This means that we will leak unreferenced streams when their zfetch structure is freed. This change checks the reference count on a stream at zfetch free time. If it is zero we free it immediately. If it has remaining references we allow the prefetch callback to free it at I/O completion time. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #11052
* Expose zfetch_max_idistance tunableRyan Moeller2020-10-131-1/+4
| | | | | | | | | | | | | | | FreeBSD had this value tunable before the switch to the new OpenZFS. The tunable name has changed, breaking legacy compat. Restore legacy compat for this tunable, properly expose the tunable with the new name on all platforms, and document it in zfs-module-parameters(5). While here, clean up the documentation for zfetch_max_distance a bit. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11038
* zil_parse: make callback parameters constChristian Schwarz2020-10-093-15/+26
| | | | | | | | | Code cleanup, a follow up commit to 4d55ea81. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Co-authored-by: Ryan Moeller <[email protected]> Signed-off-by: Christian Schwarz <[email protected]> Closes #11020
* Replace ZFS on Linux references with OpenZFSBrian Behlendorf2020-10-083-10/+2
| | | | | | | | | | | | | This change updates the documentation to refer to the project as OpenZFS instead ZFS on Linux. Web links have been updated to refer to https://github.com/openzfs/zfs. The extraneous zfsonlinux.org web links in the ZED and SPL sources have been dropped. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11007
* Fix ubsan: shift exponent is too largeChuck Tuffli2020-10-081-1/+2
| | | | | | | | | | | | | | | | | | | When running libzpool with the Undefined Behavior Sanitizer (ubsan) enabled, a zpool create causes a run-time error: module/zfs/vdev_label.c:600:14: runtime error: shift exponent 64 is too large for 64-bit type 'long long unsigned int'` in vdev_config_generate() Fix is to convert vdev_removal_max_span to its base-2 logarithm, using highbit64(), and then compare the "shifts". Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Chuck Tuffli <[email protected]> Closes #9744 Closes #11024
* Make L2ARC tests more robustGeorge Amanakis2020-10-051-11/+6
| | | | | | | | | | Instead of relying on arbitrary timers after pool export/import or cache device off/online rely on arcstats. This makes the L2ARC tests more robust. Also cleanup some functions related to persistent L2ARC. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10983
* Throw const on some stringsRyan Moeller2020-10-025-28/+27
| | | | | | | | | | In C, const indicates to the reader that mutation will not occur. It can also serve as a hint about ownership. Add const in a few places where it makes sense. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10997
* Mismatched nvlist names in zfs_keys_send_spaceJohn Poduska2020-10-021-4/+6
| | | | | | | | | | | | | | | This causes "zfs send -vt ..." to fail with: cannot resume send: Unknown error 1030 It turns out that some of the name/value pairs in the verification list for zfs_ioc_send_space(), zfs_keys_send_space, had the wrong name, so the ioctl got kicked out in zfs_check_input_nvpairs(). Update the names accordingly. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: John Poduska <[email protected]> Closes #10978
* Eliminate gratuitous bzeroing in dbuf_stats_hash_table_dataMatthew Macy2020-09-301-1/+2
| | | | | | | | | | `dbuf_stats_hash_table_data` can take much longer than it needs to by repeatedly bzeroing its buffer when in fact the buffer only needs to be NULL terminated. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10993
* do a cyclic seek for unused memory objects in poolSebastian Gottschall2020-09-301-0/+10
| | | | | | | | | | | | | | | | | | | | In non regular use cases allocated memory might stay persistent in memory pool. This small patch checks every minute if there are old objects which can be released from memory pool. Right now with regular use, the pool is checked for old objects on each allocation attempt from this pool. so basically polling by its use. Now consider what happens if someone writes a lot of files and stops use of the volume or even unmounts it. So the code will no longer check if objects can be released from the pool. Already allocated objects will still stay in pool cache. this is no big issue for common use. But someone discovered this issue while doing tests. personally i know this behavior and I'm aware of it. Its no big issue. just a enhancement Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Kjeld Schouten-Lebbing <[email protected]> Signed-off-by: Sebastian Gottschall <[email protected]> Closes #10938 Closes #10969
* Drop references when skipping dmu_send due to EXDEVRyan Moeller2020-09-301-4/+7
| | | | | | | | | | | | | | When an invalid incremental send is requested where the "to" ds is before the "from" ds, make sure to drop the reference to the pool and the dataset before returning the error. Add an assert on FreeBSD to make sure we don't hold any locks after returning from an ioctl. Add some test coverage. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10919
* zfetch: Don't issue new streams when old have not completedMatthew Macy2020-09-272-37/+150
| | | | | | | | | | | | | | | | | The current dmu_zfetch code implicitly assumes that I/Os complete within min_sec_reap seconds. With async dmu and a readonly workload (and thus no exponential backoff in operations from the "write throttle") such as L2ARC rebuild it is possible to saturate the drives with I/O requests. These are then effectively compounded with prefetch requests. This change reference counts streams and prevents them from being recycled after their min_sec_reap timeout if they still have outstanding I/Os. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10900
* Add DB_RF_NOPREFETCH to dbuf_read()s in dnode.cAdam D. Moss2020-09-251-2/+4
| | | | | | | | | | | | | | Prefetching of dnodes in dbuf_read() can cause significant mutex contention for some workloads and isn't very helpful. This is because we already get 32 dnodes for each block read, and when iterating over a directory we prefetch the dnodes in the directory. Disable this prefetching to prevent the lock contention. Reviewed-by: Brian Behlendorf <[email protected]> Submitted-by: Adam Moss <[email protected]> Submitted-by: Matthew Ahrens <[email protected]> Signed-off-by: Adam Moss <[email protected]> Closes #10877 Closes #10953
* Prune dead branch reported by CoverityRyan Moeller2020-09-251-5/+1
| | | | | | | | | wkey is NULL at every `goto error;`. dcp is never NULL. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10884
* zfs_log_write: simplify data copying code for WR_COPIED recordsChristian Schwarz2020-09-251-8/+15
| | | | | | | | | | | | | | | | | | lr_write_t records that are WR_COPIED have the record data directly appended to them (see lr_write_t type definition). The data is copied from the debuf using dmu_read_by_dnode. This function was called, only for WR_COPIED records, as part of a short-circuiting if-statement's if-expression. I found this side-effectful call to dmu_read_by_dnode pretty hard to spot. This patch improves readability by moving the call to its own line. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Christian Schwarz <[email protected]> Closes #10956
* FreeBSD: Add support for procfs_listMatthew Macy2020-09-232-20/+7
| | | | | | | | | | The procfs_list interface is required by several kstats. Implement this functionality for FreeBSD to provide access to these kstats. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10890
* Don't set numobjs to UINT64_MAX or near itPaul Dagnelie2020-09-221-3/+1
| | | | | | | | | | Resolves an issue with `zfs send` streams from 0.8.4 which prevents them from being received by versions < 0.7. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Paul Zuchowski <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #10911 Closes #10916
* Restore clearing of L2CACHE flag in arc_read_done()George Amanakis2020-09-221-3/+3
| | | | | | | | | | | Commit 45152dc removed clearing of L2CACHE flag in arc_read_done() and moved related code in l2arc_write_eligible(). After careful code inspection arc_read_done() is not bypassed in the case of prefetches. Thus restore the old behavior. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: adam moss <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10951
* vdev_ashift should only be set onceGeorge Wilson2020-09-185-44/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | == Motivation and Context The new vdev ashift optimization prevents the removal of devices when a zfs configuration is comprised of disks which have different logical and physical block sizes. This is caused because we set 'spa_min_ashift' in vdev_open and then later call 'vdev_ashift_optimize'. This would result in an inconsistency between spa's ashift calculations and that of the top-level vdev. In addition, the optimization logical ignores the overridden ashift value that would be provided by '-o ashift=<val>'. == Description This change reworks the vdev ashift optimization so that it's only set the first time the device is configured. It still allows the physical and logical ahsift values to be set every time the device is opened but those values are only consulted on first open. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Cedric Berger <[email protected]> Signed-off-by: George Wilson <[email protected]> External-Issue: DLPX-71831 Closes #10932
* Fix stack frame size: dnode_dirty_l1range()Pavel Snajdr2020-09-151-10/+13
| | | | | | | Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Closes #10879
* dmu_redact_snap: fix possible memleakPavel Snajdr2020-09-151-0/+2
| | | | | | | Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Closes #10879
* Fix stack frame size: dmu_redact_snap()Pavel Snajdr2020-09-151-8/+20
| | | | | | | Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Closes #10879
* Fix stack frame size: spa_livelist_delete_cb()Pavel Snajdr2020-09-151-5/+7
| | | | | | | Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Closes #10879
* zfs label bootenv should store data as nvlistToomas Soome2020-09-152-36/+100
| | | | | | | | | | | | | nvlist does allow us to support different data types and systems. To encapsulate user data to/from nvlist, the libzfsbootenv library is provided. Reviewed-by: Arvind Sankar <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Toomas Soome <[email protected]> Closes #10774
* Add L2ARC arcstats for MFU/MRU buffers and buffer content typeGeorge Amanakis2020-09-141-26/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the ARC state (MFU/MRU) of cached L2ARC buffer and their content type is unknown. Knowing this information may prove beneficial in adjusting the L2ARC caching policy. This commit adds L2ARC arcstats that display the aligned size (in bytes) of L2ARC buffers according to their content type (data/metadata) and according to their ARC state (MRU/MFU or prefetch). It also expands the existing evict_l2_eligible arcstat to differentiate between MFU and MRU buffers. L2ARC caches buffers from the MRU and MFU lists of ARC. Upon caching a buffer, its ARC state (MRU/MFU) is stored in the L2 header (b_arcs_state). The l2_m{f,r}u_asize arcstats reflect the aligned size (in bytes) of L2ARC buffers according to their ARC state (based on b_arcs_state). We also account for the case where an L2ARC and ARC cached MRU or MRU_ghost buffer transitions to MFU. The l2_prefetch_asize reflects the alinged size (in bytes) of L2ARC buffers that were cached while they had the prefetch flag set in ARC. This is dynamically updated as the prefetch flag of L2ARC buffers changes. When buffers are evicted from ARC, if they are determined to be L2ARC eligible then their logical size is recorded in evict_l2_eligible_m{r,f}u arcstats according to their ARC state upon eviction. Persistent L2ARC: When committing an L2ARC buffer to a log block (L2ARC metadata) its b_arcs_state and prefetch flag is also stored. If the buffer changes its arcstate or prefetch flag this is reflected in the above arcstats. However, the L2ARC metadata cannot currently be updated to reflect this change. Example: L2ARC caches an MRU buffer. L2ARC metadata and arcstats count this as an MRU buffer. The buffer transitions to MFU. The arcstats are updated to reflect this. Upon pool re-import or on/offlining the L2ARC device the arcstats are cleared and the buffer will now be counted as an MRU buffer, as the L2ARC metadata were not updated. Bug fix: - If l2arc_noprefetch is set, arc_read_done clears the L2CACHE flag of an ARC buffer. However, prefetches may be issued in a way that arc_read_done() is bypassed. Instead, move the related code in l2arc_write_eligible() to account for those cases too. Also add a test and update manpages for l2arc_mfuonly module parameter, and update the manpages and code comments for l2arc_noprefetch. Move persist_l2arc tests to l2arc. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10743
* Initialize mmp_last_write when the mmp thread startsOlaf Faaland2020-09-091-8/+12
| | | | | | | | | | | | | | | | | | | | | A great deal of time may go by between when mmp_init() is called and the MMP thread starts, particularly if there are bad devices, because there is I/O checking configs etc. If this time is too long, (gethrtime() - mmp_last_write) > mmp_fail_ns at the time the MMP thread starts. If MMP is configured to suspend the pool, the pool will be suspended immediately. This can be seen in issue #10838 The value of mmp_last_write doesn't matter before the mmp thread starts. To give the MMP thread time to issue and land MMP writes, initialize mmp_last_write when the MMP thread starts. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #10873
* Introduce ZFS module parameter l2arc_mfuonlyGeorge Amanakis2020-09-081-0/+18
| | | | | | | | | | | In certain workloads it may be beneficial to reduce wear of L2ARC devices by not caching MRU metadata and data into L2ARC. This commit introduces a new tunable l2arc_mfuonly for this purpose. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10710
* dnode_special_open() error: unchecked function return 'zrl_tryenter'Toomas Soome2020-09-081-1/+1
| | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Toomas Soome <[email protected]> Closes #10876
* FreeBSD: reduce priority of ZIO_TASKQ_ISSUE writes by a larger valueMatthew Macy2020-09-041-5/+17
| | | | | | | | | | On FreeBSD, if priorities divided by four (RQ_PPQ) are equal then a difference between them is insignificant. In other words, incrementing pri by only one as on Linux is insufficient. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10872
* Sequential scrub and resilver updated commentsBrian Behlendorf2020-09-043-0/+3
| | | | | | | | Commit d4a72f2 which introduced multi-phase scrubs and resilvers continued the work presented by Nexenta at the 2016 ZFS developer summit. Update the source to reflect their contribution. Signed-off-by: Brian Behlendorf <[email protected]>
* Avoid posting duplicate zpool eventsDon Brady2020-09-0410-71/+418
| | | | | | | | | | | | | | | | | | | Duplicate io and checksum ereport events can misrepresent that things are worse than they seem. Ideally the zpool events and the corresponding vdev stat error counts in a zpool status should be for unique errors -- not the same error being counted over and over. This can be demonstrated in a simple example. With a single bad block in a datafile and just 5 reads of the file we end up with a degraded vdev, even though there is only one unique error in the pool. The proposed solution to the above issue, is to eliminate duplicates when posting events and when updating vdev error stats. We now save recent error events of interest when posting events so that we can easily check for duplicates when posting an error. Reviewed by: Brad Lewis <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #10861
* nowait synctask must succeedMatthew Ahrens2020-09-049-39/+24
| | | | | | | | | | | | | | If a `zfs_space_check_t` other than `ZFS_SPACE_CHECK_NONE` is used with `dsl_sync_task_nowait()`, the sync task may fail due to ENOSPC. However, there is no way to notice or communicate this failure, so it's extremely difficult to use this functionality correctly, and in fact almost all callers use `ZFS_SPACE_CHECK_NONE`. This commit removes the `zfs_space_check_t` argument from `dsl_sync_task_nowait()`, and always uses `ZFS_SPACE_CHECK_NONE`. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #10855
* Retain thread name when resuming a zthrRyan Moeller2020-09-031-3/+8
| | | | | | | | | | | | When created, a zthr is given a name to identify it by. This name is lost when a cancelled zthr is resumed. Retain the name of a zthr so it can be used when resuming. Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10881
* Replace cv_{timed}wait_sig with cv_{timed}wait_idle where appropriateMatthew Macy2020-09-036-19/+10
| | | | | | | | | | | | | | | | There are a number of places where cv_?_sig is used simply for accounting purposes but the surrounding code has no ability to cope with actually receiving a signal. On FreeBSD it is possible to send signals to individual kernel threads so this could enable undesirable behavior. This patch adds routines on Linux that will do the same idle accounting as _sig without making the task interruptible. On FreeBSD cv_*_idle are all aliases for cv_* Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10843
* Make spa_stats.c tunables visible on FreeBSDRyan Moeller2020-09-011-18/+12
| | | | | | | | Use ZFS_MODULE_PARAM for cross-platform tunables in spa_stats.c, and add update tunables.cfg in tests for the newly supported ones. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10858
* FreeBSD: Fix up after spa_stats.c moveMatthew Macy2020-09-011-1/+3
| | | | | | | | | | | | | | Moving spa_stats added the additional burden of supporting KSTAT_TYPE_IO. spa_state_addr will always return a valid value regardless of the value of 'n'. On FreeBSD this will cause an infinite loop as it relies on the raw ops addr routine to indicate that there is no more data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10860
* Add 'zfs rename -u' to rename without remountingRyan Moeller2020-09-012-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Allow to rename file systems without remounting if it is possible. It is possible for file systems with 'mountpoint' property set to 'legacy' or 'none' - we don't have to change mount directory for them. Currently such file systems are unmounted on rename and not even mounted back. This introduces layering violation, as we need to update 'f_mntfromname' field in statfs structure related to mountpoint (for the dataset we are renaming and all its children). In my opinion it is worth it, as it allow to update FreeBSD in even cleaner way - in ZFS-only configuration root file system is ZFS file system with 'mountpoint' property set to 'legacy'. If root dataset is named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone it (system/oldrootfs), update FreeBSD and if it doesn't boot we can boot back from system/oldrootfs and rename it back to system/rootfs while it is mounted as /. Before it was not possible, because unmounting / was not possible. Authored by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported by: Matt Macy <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #10839
* zio_ereport_post() and zio_ereport_start() return values are ignoredToomas Soome2020-08-318-20/+24
| | | | | | | | use (void) to silence analyzers. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Toomas Soome <[email protected]> Closes #10857
* Move spa_stats.c to common codeMatthew Macy2020-08-302-0/+1048
| | | | | | | | | | | Initially it was considered simplest to stub out all of the functions on FreeBSD. Now that FreeBSD supports KSTAT_TYPE_RAW at least some of the functionality should be made available. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Elling <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #10842
* dnode_sync is careless with range treePatrick Mooney2020-08-261-2/+12
| | | | | | | | | | | | | | Because dnode_sync_free_range() must drop dn_mtx during its processing, using it as a callback to range_tree_vacate() is not safe. No other operations (besides destroy) are allowed once range_tree_vacate() has begun, and dropping dn_mtx would leave a window open for another thread to observe that invalid (and unsafe) state via dnode_block_freed(). Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Patrick Mooney <[email protected]> Closes #10708 Closes #10823