aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* zfs.8 has wrong description of "zfs program -t"Matthew Ahrens2019-02-264-16/+19
| | | | | | | | | | | | | | The "-t" argument to "zfs program" specifies a limit on the number of LUA instructions that can be executed. The zfs.8 manpage has the wrong description. It should be updated to match what's in zfs-program.8 Also fix the formatting of the zfs help message. Reviewed by: Allan Jude <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8410
* Sort by full path name instead of by GUID when importingkpande2019-02-261-3/+3
| | | | | | | | | | | | | Preferentially sort by the full path name instead of GUID when determining which device links to use. This helps ensure that the pool vdevs are named consistently when multiple links for a device appear in the same directory. For example, the /dev/disk/by-id/scsi* and /dev/disk/by-id/wwn* links. Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Richard Elling <[email protected]> Authored-by: Brian Behlendorf <[email protected]> Signed-off-by: Kash Pande <[email protected]> Closes #8108 Closes #8440
* Improve error message for zfs create with @ or # in nameDamian Wojsław2019-02-251-38/+39
| | | | | | | | | | | | | Reorder the `zfs create` error messages in order to return the most specific one first. If none of them apply then an expanded version of the invalid name message is used. Reviewed by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Signed-off-by: Damian Wojsław <[email protected]> Closes #8155 Closes #8352
* zfs(8): improve document of compression behavioursDeHackEd2019-02-251-0/+13
| | | | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed by: Allan Jude <[email protected]> Reviewed-by: bunder2015 <[email protected]> Signed-off-by: DHE <[email protected]> Closes #4660 Closes #8423
* Error path in metaslab_load_impl() forgets to drop ms_sync_lockSerapheim Dimitropoulos2019-02-251-1/+3
| | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8444
* zvol: allow rename of in use ZVOL datasetloli10K2019-02-227-10/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While ZFS allow renaming of in use ZVOLs at the DSL level without issues the ZVOL layer does not correctly update the renamed dataset if the device node is open (zv->zv_open_count > 0): trying to access the stale dataset name, for instance during a zfs receive, will cause the following failure: VERIFY3(zv->zv_objset->os_dsl_dataset->ds_owner == zv) failed ((null) == ffff8800dbb6fc00) PANIC at zvol.c:1255:zvol_resume() Showing stack for process 1390 CPU: 0 PID: 1390 Comm: zfs Tainted: P O 3.16.0-4-amd64 #1 Debian 3.16.51-3 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000000 ffffffff8151ea00 ffffffffa0758a80 ffff88028aefba30 ffffffffa0417219 ffff880037179220 ffffffff00000030 ffff88028aefba40 ffff88028aefb9e0 2833594649524556 6f5f767a3e2d767a 6f3e2d7465736a62 Call Trace: [<0>] ? dump_stack+0x5d/0x78 [<0>] ? spl_panic+0xc9/0x110 [spl] [<0>] ? mutex_lock+0xe/0x2a [<0>] ? zfs_refcount_remove_many+0x1ad/0x250 [zfs] [<0>] ? rrw_exit+0xc8/0x2e0 [zfs] [<0>] ? mutex_lock+0xe/0x2a [<0>] ? dmu_objset_from_ds+0x9a/0x250 [zfs] [<0>] ? dmu_objset_hold_flags+0x71/0xc0 [zfs] [<0>] ? zvol_resume+0x178/0x280 [zfs] [<0>] ? zfs_ioc_recv_impl+0x88b/0xf80 [zfs] [<0>] ? zfs_refcount_remove_many+0x1ad/0x250 [zfs] [<0>] ? zfs_ioc_recv+0x1c2/0x2a0 [zfs] [<0>] ? dmu_buf_get_user+0x13/0x20 [zfs] [<0>] ? __alloc_pages_nodemask+0x166/0xb50 [<0>] ? zfsdev_ioctl+0x896/0x9c0 [zfs] [<0>] ? handle_mm_fault+0x464/0x1140 [<0>] ? do_vfs_ioctl+0x2cf/0x4b0 [<0>] ? __do_page_fault+0x177/0x410 [<0>] ? SyS_ioctl+0x81/0xa0 [<0>] ? async_page_fault+0x28/0x30 [<0>] ? system_call_fast_compare_end+0x10/0x15 Reviewed by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6263 Closes #8371
* zpool reports 16E expandsize on disks with oddball number of sectorsloli10K2019-02-222-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The issue is caused by a small discrepancy in how userland creates the partition layout and the kernel estimates available space: * zpool command: subtract 9M from the usable device size, then align to 1M boundary. 9M is the sum of 1M "start" partition alignment + 8M EFI "reserved" partition. * kernel module: subtract 10M from the device size. 10M is the sum of 1M "start" partition alignment + 1m "end" partition alignment + 8M EFI "reserved" partition. For devices where the number of sectors is not a multiple of the alignment size the zpool command will create a partition layout which reserves less than 1M after the 8M EFI "reserved" partition: Disk /dev/sda: 1024 MiB, 1073739776 bytes, 2097148 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: 49811D40-16F4-4E41-84A9-387703950D7F Device Start End Sectors Size Type /dev/sda1 2048 2078719 2076672 1014M Solaris /usr & Apple ZFS /dev/sda9 2078720 2095103 16384 8M Solaris reserved 1 When the kernel module vdev_open() the device its max_asize ends up being slightly smaller than asize: this results in a huge number (16E) reported by metaslab_class_expandable_space(). This change prevents bdev_max_capacity() from returing a size smaller than bdev_capacity(). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed by: Sara Hartse <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #1468 Closes #8391
* Fix dnode_hold_impl() soft lockuplidongyang2019-02-222-56/+53
| | | | | | | | | | | | | | Soft lockups could happen when multiple threads trying to get zrl on the same dnode handle in order to allocate and initialize the dnode marked as DN_SLOT_ALLOCATED. Don't loop from beginning when we can't get zrl, otherwise we would increase the zrl refcount and nobody can actually lock it. Reviewed by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Li Dongyang <[email protected]> Closes #8433
* Clarify zpool iostat statistics reportingkpande2019-02-211-4/+9
| | | | | | | | | | | Document expected behavior for zpool iostat statistics reporting. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Allan Jude <[email protected]> Signed-off-by: Kash Pande <[email protected]> Closes #2888 Closes #8417
* Fix '-T u|d' descriptions in zpool(8)Anatoly Borodin2019-02-211-4/+4
| | | | | | | | | | | | | | | | | | | In -T u|d Display a time stamp. Specify -u for a printed representation of the internal representation of time. See time(2). Specify -d for standard date format. See date(1). 'Specify u' and 'Specify d' should be used instead. `zpool list -T -u` does not work. Bring the descriptions in `zpool list` and `zpool status` in sync with `zpool iostat`. Reviewed by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Anatoly Borodin <[email protected]> Closes #8438
* Don't enter zvol's rangelock for read bio with size 0Tomohiro Kusumi2019-02-201-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SCST driver (SCSI target driver implementation) and possibly others may issue read bio's with a length of zero bytes. Although this is unusual, such bio's issued under certain condition can cause kernel oops, due to how rangelock is implemented. rangelock_add_reader() is not made to handle overlap of two (or more) ranges from read bio's with the same offset when one of them has size of 0, even though they conceptually overlap. Allowing them to enter rangelock results in kernel oops by dereferencing invalid pointer, or assertion failure on AVL tree manipulation with debug enabled kernel module. For example, this happens when read bio whose (offset, size) is (0, 0) enters rangelock followed by another read bio with (0, 4096) when (0, 0) rangelock is still locked, when there are no pending write bio's. It can also happen with reverse order, which is (0, N) followed by (0, 0) when (0, N) is still locked. More details mentioned in #8379. Kernel Oops on ->make_request_fn() of ZFS volume https://github.com/zfsonlinux/zfs/issues/8379 Prevent this by returning bio with size 0 as success without entering rangelock. This has been done for write bio after checking flusher bio case (though not for the same reason), but not for read bio. Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #8379 Closes #8401
* Add diffutils dependency for dkms buildkpande2019-02-201-1/+1
| | | | | | | | | The cmp and diff utilities are required at configure time. Add a dependency on diffutils to ensure they are installed. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Kash Pande <[email protected]> Closes #5205 Closes #8428
* Introduce auxiliary metaslab histogramsSerapheim Dimitropoulos2019-02-205-18/+341
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces 3 new histograms per metaslab. These histograms track segments that have made it to the metaslab's space map histogram (and are part of the spacemap) but have not yet reached the ms_allocatable tree on loaded metaslab's because these metaslab's are currently syncing and haven't gone through metaslab_sync_done() yet. The histograms help when we decide whether to load an unloaded metaslab in-order to allocate from it. When calculating the weight of an unloaded metaslab traditionally, we look at the highest bucket of its spacemap's histogram. The problem is that we are not guaranteed to be able to allocated that segment when we load the metaslab because it may still be at the freeing, freed, or defer trees. The new histograms are used when we try to calculate an unloaded metaslab's weight to deal with this issue by removing segments that have would not be in the allocatable tree at runtime. Note, that this method of dealing with this is not completely accurate as adjacent segments are not always consolidated in the space map histogram of a metaslab. In addition and to make things deterministic, we always reset the weight of unloaded metaslabs based on their space map weight (instead of doing that on a need basis). Thus, every time a metaslab is loaded and its weight is reset again (from the weight based on its space map to the one based on its allocatable range tree) we expect (and assert) that this change in weight can only get better if it doesn't stay the same. Reviewed by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8358
* Prevent user accounting on readonly poolloli10K2019-02-194-4/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trying to mount a dataset from a readonly pool could inadvertently start the user accounting upgrade task, leading to the following failure: VERIFY3(tx->tx_threads == 2) failed (0 == 2) PANIC at txg.c:680:txg_wait_synced() Showing stack for process 2541 CPU: 2 PID: 2541 Comm: z_upgrade Tainted: P O 3.16.0-4-amd64 #1 Debian 3.16.51-3 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: [<0>] ? dump_stack+0x5d/0x78 [<0>] ? spl_panic+0xc9/0x110 [spl] [<0>] ? dnode_next_offset+0x1d4/0x2c0 [zfs] [<0>] ? dmu_object_next+0x77/0x130 [zfs] [<0>] ? dnode_rele_and_unlock+0x4d/0x120 [zfs] [<0>] ? txg_wait_synced+0x91/0x220 [zfs] [<0>] ? dmu_objset_id_quota_upgrade_cb+0x10f/0x140 [zfs] [<0>] ? dmu_objset_upgrade_task_cb+0xe3/0x170 [zfs] [<0>] ? taskq_thread+0x2cc/0x5d0 [spl] [<0>] ? wake_up_state+0x10/0x10 [<0>] ? taskq_thread_should_stop.part.3+0x70/0x70 [spl] [<0>] ? kthread+0xbd/0xe0 [<0>] ? kthread_create_on_node+0x180/0x180 [<0>] ? ret_from_fork+0x58/0x90 [<0>] ? kthread_create_on_node+0x180/0x180 This patch updates both functions responsible for checking if we can perform user accounting to verify the pool is not readonly. Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8424
* Add missing copyright notice to large_dnode testsNed Bass2019-02-199-0/+45
| | | | | | | | | | | Missing copyright notices were noticed during the Illumos RTI process. Add LLNS 2016 copyright based on original merge date. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #8435
* Fix zdb crashIgor K2019-02-191-2/+2
| | | | | | | | | We have to use umem_free() instead of free() if we are using umem_zalloc() Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Igor Kozhukhov <[email protected]> Closes #8402
* ZTS: user_property_002_pos fails to destroy volumeJohn Wren Kennedy2019-02-191-4/+2
| | | | | | | | | | | | During the cleanup function of this test, an attempt to destroy a volume can fail because the volume is busy. This leaves the system with unexpected datasets which in turn causes subsequent failures. Reviewed-by: bunder2015 <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: John Kennedy <[email protected]> Closes #8422
* zfs mount man page should document legacy behaviourkpande2019-02-191-3/+11
| | | | | | | | | | | | Document legacy mount behavior. Reviewed by: Allan Jude <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: bunder2015 <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Kash Pande <[email protected]> Closes #2900 Closes #8414
* Delay injection can cause indefinitely hung ziosSara Hartse2019-02-151-0/+1
| | | | | | | | | | | If we hit the (NSEC_TO_TICK(diff) == 0) condition in zio_delay_interrupt, zio_interrupt is never called and the zio does not progress. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: sara hartse <[email protected]> Closes #8404
* ZFS mounted NFSv3 shares fail lock reclaimsDon Brady2019-02-151-0/+1
| | | | | | | | | | ZFS NFS shares mounted on a client with NFSv3 and with open locks will fail to reclaim those locks after a server reboot. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #8398
* ZTS: clone_001_pos fails in cleanup on busy datasetJohn Wren Kennedy2019-02-151-1/+1
| | | | | | | | | | | | | | | The "cleanup_all" function in this test calls "zfs destroy" which fails approximately 30% of the time in our environment due to the dataset being busy. Since the failure happens during cleanup, the error is propagated to subsequent tests. Tested by running the snapshot test group in a loop without seeing any failures. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: John Kennedy <[email protected]> Closes #8409
* zio_deadman_impl() fix and enhancementTim Chase2019-02-152-9/+30
| | | | | | | | | | | | | Add the zio_deadman_log_all tunable to print all zios in zio_deadman_impl(). Also, in all cases, display the depth of the zio relative to the original parent zio. This is meant to be used by developers to gain diagnostic information for hangs which don't involve fully set-up zio trees or are otherwise stuck or hung in an early stage. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #8362
* zfs should optionally send holdsPaul Zuchowski2019-02-1510-28/+324
| | | | | | | | | | | | | Add -h switch to zfs send command to send dataset holds. If holds are present in the stream, zfs receive will create them on the target dataset, unless the zfs receive -h option is used to skip receive of holds. Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Signed-off-by: Paul Zuchowski <[email protected]> Closes #7513
* Linux 4.20 compat: Fix VERIFY(RW_READ_HELD(&hash->mh_contents))Tony Hutter2019-02-151-10/+45
| | | | | | | | | | | The 4.20 kernel changed the meaning of the rw_semaphore.owner bits, causing an assertion when loading the module under the 4.20 kernel. This patch fixes the issue. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #8360 Closes #8389
* Fix obsolete comment on rangelockTomohiro Kusumi2019-02-141-1/+1
| | | | | | | | 5d43cc9a59 renamed it to rangelock_enter(). Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #8408
* zdb: replace label_t to zdb_label_t for reduce collisionsIgor K2019-02-131-8/+8
| | | | | | | | | | | | | with builds on illumos based platform we can see build issue because label_t has been redefined. for reduce build issues on others platforms we should rename label_t to zdb_label_t. Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Igor Kozhukhov <[email protected]> Closes #8397
* Freeing throttle should account for holesAlek P2019-02-122-16/+37
| | | | | | | | | | | | | Deletion throttle currently does not account for holes in a file. This means that it can activate when it shouldn't. To fix it we switch the throttle to be based on the number of L1 blocks we will have to dirty when freeing Reviewed by: Tom Caputi <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> Closes #7725 Closes #7888
* port async unlinked drain from illumos-nexentaAlek P2019-02-1213-10/+300
| | | | | | | | | | | | | | | | | This patch is an async implementation of the existing sync zfs_unlinked_drain() function. This function is called at mount time and is responsible for freeing znodes that we didn't get to freeing before. We don't have to hold mounting of the dataset until the unlinked list is fully drained as is done now. Since we can process the unlinked set asynchronously this results in a better user experience when mounting a dataset with entries in the unlinked set. Reviewed by: Jorgen Lundman <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> Closes #8142
* Get rid of space_map_update() for ms_synced_lengthSerapheim Dimitropoulos2019-02-1212-215/+248
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initially, metaslabs and space maps used to be the same thing in ZFS. Later, we started differentiating them by referring to the space map as the on-disk state of the metaslab, making the metaslab a higher-level concept that is metadata that deals with space accounting. Today we've managed to split that code furthermore, with the space map being its own on-disk data structure used in areas of ZFS besides metaslabs (e.g. the vdev-wide space maps used for zpool checkpoint or vdev removal features). This patch refactors the space map code to further split the space map code from the metaslab code. It does so by getting rid of the idea that the space map can have a different in-core and on-disk length (sm_length vs smp_length) which is something that is only used for the metaslab code, and other consumers of space maps just have to deal with. Instead, this patch introduces changes that move the old in-core length of the metaslab's space map to the metaslab structure itself (see ms_synced_length field) while making the space map code only care about the actual space map's length on-disk. The result of this is that space map consumers no longer have to deal with syncing two different lengths for the same structure (e.g. space_map_update() goes away) while metaslab specific behavior stays within the metaslab code. Specifically, the ms_synced_length field keeps track of the amount of data metaslab_load() can read from the metaslab's space map while working concurrently with metaslab_sync() that may be appending to that same space map. As a side note, the patch also adds a few comments around the metaslab code documenting some assumptions and expected behavior. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8328
* ZVOLs should not be allowed to have childrenloli10K2019-02-0817-86/+358
| | | | | | | | | | | | | | | zfs create, receive and rename can bypass this hierarchy rule. Update both userland and kernel module to prevent this issue and use pyzfs unit tests to exercise the ioctls directly. Note: this commit slightly changes zfs_ioc_create() ABI. This allow to differentiate a generic error (EINVAL) from the specific case where we tried to create a dataset below a ZVOL (ZFS_ERR_WRONG_PARENT). Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: loli10K <[email protected]>
* Pool allocation classes misplacing small file blocksloli10K2019-02-083-4/+47
| | | | | | | | | | | | | | Due to an off-by-one condition in spa_preferred_class() we are picking the "normal" allocation class instead of the "special" one for file blocks with size equal to the special_small_blocks property value. This change fix the small code issue, update the ZFS Test Suite and the zfs(8) man page. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8351 Closes #8361
* Fix ARC stats for embedded blkptrsTim Chase2019-02-041-15/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | Re-factor arc_read() to better account for embedded data blkptrs. Previously, reading the payload from an embedded blkptr would cause arcstats such as demand_metadata_misses to be bumped when there was actually no cache "miss" because the data are already available in the blkptr. The following test procedure was used to demonstrate the problem: zpool create tank ... zfs create -o compression=lz4 tank/fs echo blah > /tank/fs/blah stat /tank/fs/blah grep 'meta.*mis' /proc/spl/kstat/zfs/arcstats and repeating the last two steps to watch the metadata miss counter increment. This can also be demonstrated via the zfs_arc_miss DTRACE4 probe in arc_read(). Reviewed-by: loli10K <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #8319
* OpenZFS 9185 - Enable testing over NFS in ZFS performance testsAhmed Ghanem2019-02-0412-3/+115
| | | | | | | | | | | | | | | | | | | | This change makes additions to the ZFS test suite that allows the performance tests to run over NFS. The test is run and performance data collected from the server side, while IO is generated on the NFS client. This has been tested with Linux and illumos NFS clients. Authored by: Ahmed Ghanem <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Kevin Greene <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: John Kennedy <[email protected]> Signed-off-by: John Kennedy <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9185 Closes #8367
* zstreamdump: -d option is not documented in manpageloli10K2019-02-041-2/+12
| | | | | | | | | | | This change simply documents the missing -d (dump contents) option in zstreamdump(8). Reviewed-by: bunder2015 <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8369
* shellcheck passbunder20152019-02-044-7/+7
| | | | | | | | | | note: which is non-standard. Use builtin 'command -v' instead. [SC2230] note: Use -n instead of ! -z. [SC2236] Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: bunder2015 <[email protected]> Closes #8367
* flake8 passbunder20152019-02-042-21/+21
| | | | | | | | | | F632 use ==/!= to compare str, bytes, and int literals Reviewed-by: Håkan Johansson <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: bunder2015 <[email protected]> Closes #8368
* Fix zpool iostat -w header namesTony Hutter2019-01-311-2/+2
| | | | | | | | | | | The zpool iostat latency histograms (-w) has column names 'sync_queue' and 'async_queue', which do not match the man page, nor the equivalent columns in average latency. Change the column names to be 'syncq_wait' and 'asyncq_wait' to be consistent. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #8338
* Simplify log vdev removal codeSerapheim Dimitropoulos2019-01-312-54/+14
| | | | | | | | | | Get rid of the majority metaslab metadata when removing log vdevs in spa_vdev_remove_log() with a call to metaslab_fini() instead of duplicating a lot of that in vdev_remove_empty_log(). Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8347
* vs_alloc can underflow in L2ARC vdevsSerapheim Dimitropoulos2019-01-312-6/+16
| | | | | | | | | | | | | | | | | | | | The current L2 ARC device code consistently uses psize to increment vs_alloc but varies between psize and lsize when decrementing it. The result of this behavior is that vs_alloc can be decremented more that it is incremented and underflow. This patch changes the code so asize is used anywhere. In addition, it ensures that vs_alloc gets incremented by the L2 ARC device code as buffers are written and not at the end of the l2arc_write_buffers() routine. The latter (and old) way would temporarily underflow vs_alloc as buffers that were just written, would be destroyed while l2arc_write_buffers() was still looping. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8298
* Don't acquire zthr_request_lock in zthr_wakeupSara Hartse2019-01-301-18/+36
| | | | | | | | | | | | Address a deadlock caused by simultaneous wakeup and cancel on a zthr by remove the hold of zthr_request_lock from zthr_wakeup. This allows thr_wakeup to not block a thread that is in the process of being cancelled. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Sara Hartse <[email protected]> Closes #8333
* zdb -L should skip leak detection altogetherSerapheim Dimitropoulos2019-01-302-121/+134
| | | | | | | | | | | | | | | | Currently the point of -L option in zdb is to disable leak tracing and the loading of space maps because they are expensive, yet still do leak detection in terms of space. Unfortunately, there is a scenario where this is a lie. If we are using zdb -L on a pool where a vdev is being removed, zdb_claim_removing() will open the metaslab space maps of that device. This patch makes it so zdb -L skips leak detection altogether and ensures that no space maps are loaded. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8335
* Exclude test-runner.py from the rpmbuild shebang checkTony Hutter2019-01-281-0/+4
| | | | | | | | | Exclude test-runner.py from the rpmbuild shebang check to allow it to run under Python 2 and 3. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #8331
* GCC 9.0: Fix ztest "directive argument is not a nul-terminated string"Tony Hutter2019-01-281-2/+2
| | | | | | | | | | | | | GCC 9.0 is complaining because we're trying to print strings that are defined like this: .zo_pool = { 'z', 't', 'e', 's', 't', '\0' }, Fix them by making them actual strings. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #8330
* Linux 5.0 compat: Fix bio_set_dev()Brian Behlendorf2019-01-282-4/+60
| | | | | | | | | | | The Linux 5.0 kernel updated the bio_set_dev() macro so it calls the GPL-only bio_associate_blkg() symbol thus inadvertently converting the entire macro. Provide a minimal version which always assigns the request queue's root_blkg to the bio. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8287
* Linux 5.0 compat: Disable vector instructions on 5.0+ kernelsTony Hutter2019-01-282-39/+144
| | | | | | | | | | The 5.0 kernel no longer exports the functions we need to do vector (SSE/SSE2/SSE3/AVX...) instructions. Disable vector-based checksum algorithms when building against those kernels. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #8259
* Linux 5.0 compat: Fix SUBDIRsTony Hutter2019-01-281-3/+3
| | | | | | | | | SUBDIRs has been deprecated for a long time, and was finally removed in the 5.0 kernel. Use "M=" instead. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #8257
* Linux 5.0 compat: Convert MS_* macros to SB_*Tony Hutter2019-01-282-12/+14
| | | | | | | | | | | In the 5.0 kernel, only the mount namespace code should use the MS_* macos. Filesystems should use the SB_* ones. https://patchwork.kernel.org/patch/10552493/ Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #8264
* Linux 5.0 compat: Use totalram_pages()Tony Hutter2019-01-284-3/+28
| | | | | | | | | | | | | totalram_pages() was converted to an atomic variable in 5.0: https://patchwork.kernel.org/patch/10652795/ Its value should now be read though the totalram_pages() helper function. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #8263
* Linux 5.0 compat: access_ok() drops 'type' parameterTony Hutter2019-01-284-2/+31
| | | | | | | | access_ok no longer needs a 'type' parameter in the 5.0 kernel. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #8261
* Linux 4.18 compat: Use ktime_get_coarse_real_ts64()Tony Hutter2019-01-283-0/+31
| | | | | | | | | Newer kernels remove current_kernel_time64(). Use ktime_get_coarse_real_ts64() in its place. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #8258