aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Support re-prioritizing asynchronous prefetchesTom Caputi2017-12-2111-43/+124
| | | | | | | | | | | | | | | | | | | | | When sequential scrubs were merged, all calls to arc_read() (including prefetch IOs) were given ZIO_PRIORITY_ASYNC_READ. Unfortunately, this behaves badly with an existing issue where prefetch IOs cannot be re-prioritized after the issue. The result is that synchronous reads end up in the same vdev_queue as the scrub IOs and can have (in some workloads) multiple seconds of latency. This patch incorporates 2 changes. The first ensures that all scrub IOs are given ZIO_PRIORITY_SCRUB to allow the vdev_queue code to differentiate between these I/Os and user prefetches. Second, this patch introduces zio_change_priority() to provide the missing capability to upgrade a zio's priority. Reviewed by: George Wilson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #6921 Closes #6926
* vdev_id: new slot type sesSimon Guest2017-12-202-1/+24
| | | | | | | | | | | | | | | This extends vdev_id to support a new slot type, ses, for SCSI Enclosure Services. With slot type ses, the disk slot numbers are determined by using the device slot number reported by sg_ses for the device with matching SAS address, found by querying all available enclosures. This is primarily of use on systems with a deficient driver omitting support for bay_identifier in /sys/devices. In my testing, I found that the existing slot types of port and id were not stable across disk replacement, so an alternative was required. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Simon Guest <[email protected]> Closes #6956
* Handle broken pipes in arc_summaryGiuseppe Di Natale2017-12-192-0/+16
| | | | | | | | | | | Using a command similar to 'arc_summary.py | head' causes a broken pipe exception. Gracefully exit in the case of a broken pipe in arc_summary.py. Reviewed-by: Richard Elling <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #6965 Closes #6969
* Handle invalid options in arc_summaryLOLi2017-12-194-6/+51
| | | | | | | | | | If an invalid option is provided to arc_summary.py we handle any error thrown from the getopt Python module and print the usage help message. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6983
* OpenZFS 8794 - cstyle generates warnings with recent perlDominik Hassler2017-12-191-18/+18
| | | | | | | | | | | | | | Authored by: Dominik Hassler <[email protected]> Reviewed by: Andy Fiddaman <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Reviewed by: Toomas Soome <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8794 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/578f67364c Closes #6973
* ZTS: Fix create-o_ashift test caseLOLi2017-12-192-34/+26
| | | | | | | | | | | | The function that fills the uberblock ring buffer on every device label has been reworked to avoid occasional failures caused by a race condition that prevents 'zpool sync' from writing some uberblock sequentially: this happens when the pool sync ioctl dispatch code calls txg_wait_synced() while we're already waiting for a TXG to sync. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6924 Closes #6977
* Fix multihost stale cache file importBrian Behlendorf2017-12-184-9/+26
| | | | | | | | | | | | | | | | | | | | | When the multihost property is enabled it should be impossible to import an active pool even using the force (-f) option. This patch prevents a forced import from succeeding when importing with a stale cache file. The root cause of the problem is that the kernel modules trusted the hostid provided in configuration. This is always correct when the configuration is generated by scanning for the pool. However, when using an existing cache file the hostid could be stale which would result in the activity check being skipped. Resolve the issue by always using the hostid read from the label configuration where the best uberblock was found. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6933 Closes #6971
* Honor --with-mounthelperdir where applicableLOLi2017-12-175-6/+8
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6962
* Fix --with-systemd on Debian-based distributions (#6963)LOLi2017-12-173-6/+21
| | | | | | | | | | These changes propagate the "--with-systemd" configure option to the RPM spec file, allowing Debian-based distributions to package systemd-related files. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6591 Closes #6963
* Remove lib/libspl/include/sys/frame.hBrian Behlendorf2017-12-172-132/+0
| | | | | | | | | | | The functionality provided by this header is not required by any of the ZFS user space code. Minimal functionality was provided in commit c28a677 which added include/sys/frame.h. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6960 Closes #6972
* Enable QAT support in zfs-dkms RPMDavid Qian2017-12-141-0/+6
| | | | | | | | | | | Enable QAT accelerated gzip compression in zfs-dkms RPM package when environment variant ICP_ROOT is set to QAT drive source code folder and QAT hardware presence. Otherwise, use default gzip compression. Reviewed-by: George Melikov <[email protected]> Reviewed-by: David Qian <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6932
* Add zfs-import.target services in spec fileLalufu2017-12-131-1/+1
| | | | | | | | | | | Add missing zfs-import.target to list of systemd services in zfs RPM spec file. Reviewed-by: Niklas Wagner <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ralf Ertzinger <[email protected]> Issue #6953 Closes #6955
* Various ZED fixesLOLi2017-12-0819-101/+368
| | | | | | | | | | | | | | | | | | | | | | | | | * Teach ZED to handle spares usingi the configured ashift: if the zpool 'ashift' property is set then ZED should use its value when kicking in a hotspare; with this change 512e disks can be used as spares for VDEVs that were created with ashift=9, even if ZFS natively detects them as 4K block devices. * Introduce an additional auto_spare test case which verifies that in the face of multiple device failures an appropiate number of spares are kicked in. * Fix zed_stop() in "libtest.shlib" which did not correctly wait the target pid. * Fix ZED crashing on startup caused by a race condition in libzfs when used in multi-threaded context. * Convert ZED over to using the tpool library which is already present in the Illumos FMA code. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #2562 Closes #6858
* Disable vdev_zaps_004_posBrian Behlendorf2017-12-071-0/+5
| | | | | | | | | | | Occasionally observed failure of vdev_zaps_004_pos due to the test case not being 100% reliable. In order to prevent false positives disable this test case until it can be made reliable. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #6935 Closes #6936
* Suppress incorrect objtool warningsBrian Behlendorf2017-12-075-1/+69
| | | | | | | | | | | | Suppress incorrect warnings from versions of objtool which are not aware of x86 EVEX prefix instructions used for AVX512. module/zfs/vdev_raidz_math_avx512bw.o: warning: objtool: <func+offset>: can't find jump dest instruction at .text Reviewed-by: Don Brady <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6928
* Fix segfault in zpool iostat when adding VDEVsTony Hutter2017-12-061-7/+15
| | | | | | | | | | Fix a segfault when running 'zpool iostat -v 1' while adding a VDEV. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #6748 Closes #6872
* OpenZFS 8603 - rename zilog's "zl_writer_lock" to "zl_issuer_lock"Prakash Surya2017-12-062-27/+27
| | | | | | | | | | | | | | | | This is a purely cosmetic change. The zilog's "zl_writer_lock" field is being renamed to "zl_issuer_lock" to try and make the code easier to understand; no other changes are made. Authored by: Prakash Surya <[email protected]> Reviewed by: C Fraire <[email protected]> Approved by: Dan McDonald <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8603 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/2daf06546b Closes #6927
* Disable create-o_ashiftBrian Behlendorf2017-12-061-0/+5
| | | | | | | | | | | | Occasionally observed failure of create-o_ashift due to the test case not being 100% reliable. In order to prevent false positives disable this test case until it can be made reliable. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #6924 Closes #6925
* OpenZFS 8585 - improve batching done in zil_commit()Prakash Surya2017-12-0515-378/+1587
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Authored by: Prakash Surya <[email protected]> Reviewed by: Brad Lewis <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Prakash Surya <[email protected]> Problem ======= The current implementation of zil_commit() can introduce significant latency, beyond what is inherent due to the latency of the underlying storage. The additional latency comes from two main problems: 1. When there's outstanding ZIL blocks being written (i.e. there's already a "writer thread" in progress), then any new calls to zil_commit() will block waiting for the currently oustanding ZIL blocks to complete. The blocks written for each "writer thread" is coined a "batch", and there can only ever be a single "batch" being written at a time. When a batch is being written, any new ZIL transactions will have to wait for the next batch to be written, which won't occur until the current batch finishes. As a result, the underlying storage may not be used as efficiently as possible. While "new" threads enter zil_commit() and are blocked waiting for the next batch, it's possible that the underlying storage isn't fully utilized by the current batch of ZIL blocks. In that case, it'd be better to allow these new threads to generate (and issue) a new ZIL block, such that it could be serviced by the underlying storage concurrently with the other ZIL blocks that are being serviced. 2. Any call to zil_commit() must wait for all ZIL blocks in its "batch" to complete, prior to zil_commit() returning. The size of any given batch is proportional to the number of ZIL transaction in the queue at the time that the batch starts processing the queue; which doesn't occur until the previous batch completes. Thus, if there's a lot of transactions in the queue, the batch could be composed of many ZIL blocks, and each call to zil_commit() will have to wait for all of these writes to complete (even if the thread calling zil_commit() only cared about one of the transactions in the batch). To further complicate the situation, these two issues result in the following side effect: 3. If a given batch takes longer to complete than normal, this results in larger batch sizes, which then take longer to complete and further drive up the latency of zil_commit(). This can occur for a number of reasons, including (but not limited to): transient changes in the workload, and storage latency irregularites. Solution ======== The solution attempted by this change has the following goals: 1. no on-disk changes; maintain current on-disk format. 2. modify the "batch size" to be equal to the "ZIL block size". 3. allow new batches to be generated and issued to disk, while there's already batches being serviced by the disk. 4. allow zil_commit() to wait for as few ZIL blocks as possible. 5. use as few ZIL blocks as possible, for the same amount of ZIL transactions, without introducing significant latency to any individual ZIL transaction. i.e. use fewer, but larger, ZIL blocks. In theory, with these goals met, the new allgorithm will allow the following improvements: 1. new ZIL blocks can be generated and issued, while there's already oustanding ZIL blocks being serviced by the storage. 2. the latency of zil_commit() should be proportional to the underlying storage latency, rather than the incoming synchronous workload. Porting Notes ============= Due to the changes made in commit 119a394ab0, the lifetime of an itx structure differs than in OpenZFS. Specifically, the itx structure is kept around until the data associated with the itx is considered to be safe on disk; this is so that the itx's callback can be called after the data is committed to stable storage. Since OpenZFS doesn't have this itx callback mechanism, it's able to destroy the itx structure immediately after the itx is committed to an lwb (before the lwb is written to disk). To support this difference, and to ensure the itx's callbacks can still be called after the itx's data is on disk, a few changes had to be made: * A list of itxs was added to the lwb structure. This list contains all of the itxs that have been committed to the lwb, such that the callbacks for these itxs can be called from zil_lwb_flush_vdevs_done(), after the data for the itxs is committed to disk. * A list of itxs was added on the stack of the zil_process_commit_list() function; the "nolwb_itxs" list. In some circumstances, an itx may not be committed to an lwb (e.g. if allocating the "next" ZIL block on disk fails), so this list is used to keep track of which itxs fall into this state, such that their callbacks can be called after the ZIL's writer pipeline is "stalled". * The logic to actually call the itx's callback was moved into the zil_itx_destroy() function. Since all consumers of zil_itx_destroy() were effectively performing the same logic (i.e. if callback is non-null, call the callback), it seemed like useful code cleanup to consolidate this logic into a single function. Additionally, the existing Linux tracepoint infrastructure dealing with the ZIL's probes and structures had to be updated to reflect these code changes. Specifically: * The "zil__cw1" and "zil__cw2" probes were removed, so they had to be removed from "trace_zil.h" as well. * Some of the zilog structure's fields were removed, which affected the tracepoint definitions of the structure. * New tracepoints had to be added for the following 3 new probes: * zil__process__commit__itx * zil__process__normal__itx * zil__commit__io__error OpenZFS-issue: https://www.illumos.org/issues/8585 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/5d95a3a Closes #6566
* Fix NFS sticky bit permission denied errorBrian Behlendorf2017-12-041-3/+2
| | | | | | | | | | | | | | | | | | | | | | When zfs_sticky_remove_access() was originally adapted for Linux a typo was made which altered the intended behavior. As described in the block comment, the intended behavior is that permission should be granted when the entry is a regular file and you have write access. That is, S_ISREG should have been used instead of S_ISDIR. Restricting permission to regular files made good sense for older systems where setting the bit on executable files would instruct the system to save the program's text segment on the swap device. On modern systems this behavior has been replaced by the sticky bit acting as a restricted deletion flag and the plain file restriction has been relaxed. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6889 Closes #6910
* Add /usr/bin/env to COPY_EXEC_LIST initramfs hookJKDingwall2017-12-041-0/+1
| | | | | | | | | | | | | 5dc1ff29 changed the user space program to mount a zfs snapshot from /bin/sh to /usr/bin/env. If the executable is not present in the initramfs then snapshots cannot be automounted. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: James Dingwall <[email protected]> Closes #5360 Closes #6913
* Fix 'zpool create|add' replication level checkBrian Behlendorf2017-12-042-12/+61
| | | | | | | | | | | | | When the pool configuration contains a hole due to a previous device removal ignore this top level vdev. Failure to do so will result in the current configuration being assessed to have a non-uniform replication level and the expected warning will be disabled. The zpool_add_010_pos test case was extended to cover this scenario. Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6907 Closes #6911
* Preserve itx alloc size for zio_data_buf_free()Brian Behlendorf2017-12-042-2/+6
| | | | | | | | | | | | | | Using zio_data_buf_alloc() to allocate the itx's may be unsafe because the itx->itx_lr.lrc_reclen field is not constant from allocation to free. Using a different itx->itx_lr.lrc_reclen size in zio_data_buf_free() can result in the allocation being returned to the wrong kmem cache. This issue can be avoided entirely by storing the allocation size in itx->itx_size and using that for zio_data_buf_free(). Reviewed by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6912
* Unbreak the scan status ABITom Caputi2017-11-302-6/+6
| | | | | | | | | | When d4a72f23 was merged, pss_pass_issued was incorrectly added to the middle of the pool_scan_stat_t structure instead of the end. This patch simply corrects this issue. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #6909
* Fix 'zfs get {user|group}objused@' functionalityLOLi2017-11-293-7/+19
| | | | | | | | | | | | | | Fix a regression accidentally introduced in 1b81ab4 that prevents 'zfs get {user|group}objused@' from correctly reporting the requested value. Update "userspace_003_pos.ksh" and "groupspace_003_pos.ksh" to verify this functionality. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6908
* Linux 4.14 compat: CONFIG_GCC_PLUGIN_RANDSTRUCTMark Wright2017-11-2812-124/+126
| | | | | | | | | | | | | | | Fix build errors with gcc 7.2.0 on Gentoo with kernel 4.14 built with CONFIG_GCC_PLUGIN_RANDSTRUCT=y such as: module/nvpair/nvpair.c:2810:2:error: positional initialization of field in ?struct? declared with 'designated_init' attribute [-Werror=designated-init] nvs_native_nvlist, ^~~~~~~~~~~~~~~~~ Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mark Wright <[email protected]> Closes #5390 Closes #6903
* initramfs: Honor canmount=offRichard Laager2017-11-281-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | The initramfs script was not honoring canmount=off. With this change, it does. If the administrator has asked that a filesystem not be mounted, that should be honored. As an exception, the initramfs script ignores canmount=off on the rootfs. The rootfs should not have canmount=off set either. However, mounting it anyway seems harmless because it is being asked for explicitly. The point of this exception is to avoid the risk of breaking existing systems, just in case someone has canmount=off set on their rootfs. The initramfs still mounts filesystems with canmount=noauto. This is necessary because it is typical to set that on the rootfs so that it can be cloned. Without canmount=noauto, the clones' duplicate mountpoints would conflict. This is the remainder of the fix for: https://github.com/zfsonlinux/pkg-zfs/issues/221 Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #6897
* initramfs: Honor mountpoint=none/legacyRichard Laager2017-11-281-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For filesystems that are children of the rootfs, when mountpoint=none or mountpoint=legacy, the initrafms script would assume a mountpoint based on the dataset path. Given that the rootfs should have mountpoint=/ and mountpoint inheritance is is the default behavior of ZFS, this behavior seems unnecessary. In any event, it turns mountpoint=none into a no-op. That removes this option from the administrator, and if someone uses it, it does not work as expected. Worse yet, if the mountpoint directory does not exist (which is the typical case for mountpoint=none), the mounting and thus the boot process will fail. For the case of mountpoint=legacy, the assumed mountpoint may not be the correct value set in /etc/fstab. This change makes the initramfs script not mount the filesystem in either case. For mountpoint=none, this means we are correctly honoring the setting. For mountpoint=legacy, there are two scenarios: If canmount=on, the filesystem will be mounted by the normal mechanisms later in the boot process. If canmount=noauto, the filesystem will not be mounted at all, unless the administrator has done something special. If they're not doing something special and they want it mounted by the initramfs, they can simply not set mountpoint=legacy. This is part of the fix for: https://github.com/zfsonlinux/pkg-zfs/issues/221 Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #6897
* zpool(8): Fix "zpool import -t"DeHackEd2017-11-281-1/+1
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: DHE <[email protected]> Closes #6894
* Update for cppcheck v1.80Brian Behlendorf2017-11-189-50/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resolve new warnings and errors from cppcheck v1.80. * [lib/libshare/libshare.c:543]: (warning) Possible null pointer dereference: protocol * [lib/libzfs/libzfs_dataset.c:2323]: (warning) Possible null pointer dereference: srctype * [lib/libzfs/libzfs_import.c:318]: (error) Uninitialized variable: link * [module/zfs/abd.c:353]: (error) Uninitialized variable: sg * [module/zfs/abd.c:353]: (error) Uninitialized variable: i * [module/zfs/abd.c:385]: (error) Uninitialized variable: sg * [module/zfs/abd.c:385]: (error) Uninitialized variable: i * [module/zfs/abd.c:553]: (error) Uninitialized variable: i * [module/zfs/abd.c:553]: (error) Uninitialized variable: sg * [module/zfs/abd.c:763]: (error) Uninitialized variable: i * [module/zfs/abd.c:763]: (error) Uninitialized variable: sg * [module/zfs/abd.c:305]: (error) Uninitialized variable: tmp_page * [module/zfs/zpl_xattr.c:342]: (warning) Possible null pointer dereference: value * [module/zfs/zvol.c:208]: (error) Uninitialized variable: p Convert the following suppression to inline. * [module/zfs/zfs_vnops.c:840]: (error) Possible null pointer dereference: aiov Exclude HAVE_UIO_ZEROCOPY and HAVE_DNLC from analysis since these macro's will never be defined until this functionality is implemented. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6879
* Fix data on evict_skips in arc_summary.pyScot W. Stevenson2017-11-181-2/+3
| | | | | | | | | | | | Display correct data from kstat arcstats for evict_skips, which is currently repeating the data from mutex_misses. Fixes #6882 Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Scot W. Stevenson <[email protected]> Closes #6882 Closes #6883
* Fix ARC pointer overrunDeHackEd2017-11-171-9/+11
| | | | | | | | | | | Only access the `b_crypt_hdr` field of an ARC header if the content is encrypted. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: DHE <[email protected]> Closes #6877
* Sequential scrub and resilversTom Caputi2017-11-1537-808/+3028
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, scrubs and resilvers can take an extremely long time to complete. This is largely due to the fact that zfs scans process pools in logical order, as determined by each block's bookmark. This makes sense from a simplicity perspective, but blocks in zfs are often scattered randomly across disks, particularly due to zfs's copy-on-write mechanisms. This patch improves performance by splitting scrubs and resilvers into a metadata scanning phase and an IO issuing phase. The metadata scan reads through the structure of the pool and gathers an in-memory queue of I/Os, sorted by size and offset on disk. The issuing phase will then issue the scrub I/Os as sequentially as possible, greatly improving performance. This patch also updates and cleans up some of the scan code which has not been updated in several years. Reviewed-by: Brian Behlendorf <[email protected]> Authored-by: Saso Kiselkov <[email protected]> Authored-by: Alek Pinchuk <[email protected]> Authored-by: Tom Caputi <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #3625 Closes #6256
* Minor code cleanups in arc_python.pyScot W. Stevenson2017-11-151-5/+7
| | | | | | | | | | | Remove unused library re and associated variable kstat_pobj. Add note to documentation at start of program about required support for old versions of Python. Change variable "format" (which is a built-in function) to "fmt". Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Scot W. Stevenson <[email protected]> Closes #6869
* Fix dirty check in dmu_offset_next()Brian Behlendorf2017-11-151-6/+4
| | | | | | | | | | | | The correct way to determine if a dnode is dirty is to check if any of the dn->dn_dirty_link's are active. Relying solely on the dn->dn_dirtyctx can result in the dnode being mistakenly reported as clean. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3125 Closes #6867
* Disable automatic dependencies in zfs-test packageBrian Behlendorf2017-11-151-5/+6
| | | | | | | | | | | | All of the ZTS test scripts specify /bin/ksh as the interpreter. Unfortunately, as of Fedora 27 only /usr/bin/ksh is provided by the package manager. Rather than change all the scripts to accommodate the latest Fedora disable automatic dependencies for the zfs-test package. Functionally this will not cause any problems since /bin is a symlink to /usr/bin. Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6868
* Disable zvol_ENOSPC_001_pos on 32-bit systemsBrian Behlendorf2017-11-131-0/+5
| | | | | | | | | | | Occasionally observed failure of zvol_ENOSPC_001_pos due to the test case taking too long to complete. Disable the test case until it can be improved. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #5848 Closes #6862
* Fix truncate(2) mtime and ctime handlingLOLi2017-11-137-4/+221
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Linux, ftruncate(2) always changes the file timestamps, even if the file size is not changed. However, in case of a successfull truncate(2), the timestamps are updated only if the file size changes. This translates to the VFS calling the ZFS Posix Layer "setattr" function (zpl_setattr) with ATTR_MTIME and ATTR_CTIME unconditionally set on the iattr mask only when doing a ftruncate(2), while the truncate(2) is left to the filesystem implementation to be dealt with. This behaviour is consistent with POSIX:2004/SUSv3 specifications where there's no explicit requirement for file size changes to update the timestamps only for ftruncate(2): http://pubs.opengroup.org/onlinepubs/009695399/functions/truncate.html http://pubs.opengroup.org/onlinepubs/009695399/functions/ftruncate.html This has been later updated in POSIX:2008/SUSv4 where, for both truncate(2)/ftruncate(2), there's no mention of this size change requirement: http://austingroupbugs.net/view.php?id=489 http://pubs.opengroup.org/onlinepubs/9699919799/functions/truncate.html http://pubs.opengroup.org/onlinepubs/9699919799/functions/ftruncate.html Unfortunately the Linux VFS is still calling into the ZPL without ATTR_MTIME/ATTR_CTIME set in the truncate(2) case: we fix this by explicitly updating the timestamps when detecting the ATTR_SIZE bit, which is always set in do_truncate(), on the iattr mask. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6811 Closes #6819
* Add .travis.ymlGeorge Melikov2017-11-131-0/+38
| | | | | | | | | | Travis builders have maximum work time ~49 minutes, so we have to use 5 builders and spread the ZTS over them using test group tags. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #6829
* Fix arc_summary.py -d crash with Python3Scot W. Stevenson2017-11-111-8/+16
| | | | | | | | | | Prevents arc_summary.py crashing when called with parameter -d or long form --description with Python3. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Scot W. Stevenson <[email protected]> Closes #6849 Closes #6850
* OpenZFS 7531 - Assign correct flags to prefetched buffersbenrubson2017-11-112-0/+18
| | | | | | | | | | | Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Prakash Surya <[email protected]> Authored by: abraunegg <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/7531 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/468008cb
* Long hold the dataset during upgradeArkadiusz Bubała2017-11-103-9/+24
| | | | | | | | | | | | | | | If the receive or rollback is performed while filesystem is upgrading the objset may be evicted in `dsl_dataset_clone_swap_sync_impl`. This will lead to NULL pointer dereference when upgrade tries to access evicted objset. This commit adds long hold of dataset during whole upgrade process. The receive and rollback will return an EBUSY error until the upgrade is not finished. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Arkadiusz Bubała <[email protected]> Closes #5295 Closes #6837
* Fix encryption root hierarchy issueTom Caputi2017-11-081-1/+2
| | | | | | | | | | | | | | After doing a recursive raw receive, zfs userspace performs a final pass to adjust the encryption root hierarchy as needed. Unfortunately, the FORCE_INHERIT ioctl had a bug which caused the encryption root to always be assigned to the direct parent instead of the inheriting parent. This patch simply fixes this issue. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #6847 Closes #6848
* Handle compressed buffers in __dbuf_hold_impl()Tim Chase2017-11-082-13/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | In __dbuf_hold_impl(), if a buffer is currently syncing and is still referenced from db_data, a copy is made in case it is dirtied again in the txg. Previously, the buffer for the copy was simply allocated with arc_alloc_buf() which doesn't handle compressed or encrypted buffers (which are a special case of a compressed buffer). The result was typically an invalid memory access because the newly-allocated buffer was of the uncompressed size. This commit fixes the problem by handling the 2 compressed cases, encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and arc_alloc_compressed_buf(). Although using the proper allocation functions fixes the invalid memory access by allocating a buffer of the compressed size, another unrelated issue made it impossible to properly detect compressed buffers in the first place. The header's compression flag was set to ZIO_COMPRESS_OFF in arc_write() when it was possible that an attached buffer was actually compressed. This commit adds logic to only set ZIO_COMPRESS_OFF in the non-ZIO_RAW case which wil handle both cases of compressed buffers (encrypted or unencrypted). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #5742 Closes #6797
* Refresh TEST file to include new variablesGiuseppe Di Natale2017-11-081-1/+8
| | | | | | | Add any missing variables for CI control to TEST. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #6846
* Cleanup systemd dependenciesAntonio Russo2017-11-087-7/+4
| | | | | | | | | | Some redundancy is present in the systemd dependencies, as noticed in PR#6764. Existing setups might rely on these quirks, so these cleanups have been moved to the development branch. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #6822
* Fix undefined %{systemd_svcs} in RPM scriptletsLOLi2017-11-081-1/+1
| | | | | | | | | | | This allows RPM-based systems to properly control package installation and removal when using systemd. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6838 Closes #6841
* OpenZFS 8607 - variable set but not usedBrian Behlendorf2017-11-082-13/+11
| | | | | | | | | | | | | | Reviewed by: Yuri Pankov <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Robert Mustacchi <[email protected]> Authored by: Toomas Soome <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8607 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b852c2f5 Closes #6842
* Provide tags in perf-regression.runGiuseppe Di Natale2017-11-071-0/+2
| | | | | | | | | | | A prior commit changed test-runner to enable tagging of testgroups within a test suite runfile. They must be specified in each runfile. Update the runfile for performance regressions so it is properly tagged. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #6830
* Fix zfs-tests.sh single test functionalityLOLi2017-11-071-2/+11
| | | | | | | | | | | Without any tag specified into the runtime-generated runfile the test-runner will not execute the test provided from the command line: fix this by adding tag information to the custom runfile. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6826