aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* OpenZFS 9076 - Adjust perf test concurrency settingsStephen Blinick2018-03-155-10/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ZFS Performance test concurrency should be lowered for better latency Work by Stephen Blinick. Nightly performance runs typically consist of two levels of concurrency; and both are fairly high. Since the IO runs are to a ZFS filesystem, within a zpool, which is based on some variable number of vdev's, the amount of IO driven to each device is variable. Additionally, different device types (HDD vs SSD, etc) can generally handle a different amount of concurrent IO before saturating. Nevertheless, in practice, it appears that most tests are well past the concurrency saturation point and therefore both perform with the same throughput, the maximum of the device. Because the queuedepth to the device(s) is so high however, the latency is much higher than the best possible at that throughput, and increases linearly with the increase in concurrency. This means that changes in code that impact latency during normal operation (before saturation) may not be apparent when a large component of the measured latency is from the IO sitting in a queue to be serviced. Therefore, changing the concurrency settings is recommended Authored by: Stephen Blinick <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: John Wren Kennedy <[email protected]> Ported-by: John Wren Kennedy <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9076 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/562 Upstream bug: DLPX-45477 Closes #7302
* receive_spill does not byte swap spill contentsPaul Zuchowski2018-03-151-0/+9
| | | | | | | | | In zfs receive, the function receive_spill should account for spill block endian conversion as a defensive measure. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Paul Zuchowski <[email protected]> Closes #7300
* OpenZFS 9188 - increase size of dbuf cache to reduce indirect block ↵Brian Behlendorf2018-03-132-11/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | decompression With compressed ARC (bug 6950) we use up to 25% of our CPU to decompress indirect blocks, under a workload of random cached reads. To reduce this decompression cost, we would like to increase the size of the dbuf cache so that more indirect blocks can be stored uncompressed. If we are caching entire large files of recordsize=8K, the indirect blocks use 1/64th as much memory as the data blocks (assuming they have the same compression ratio). We suggest making the dbuf cache be 1/32nd of all memory, so that in this scenario we should be able to keep all the indirect blocks decompressed in the dbuf cache. (We want it to be more than the 1/64th that the indirect blocks would use because we need to cache other stuff in the dbuf cache as well.) In real world workloads, this won't help as dramatically as the example above, but we think it's still worth it because the risk of decreasing performance is low. The potential negative performance impact is that we will be slightly reducing the size of the ARC (by ~3%). Porting Notes: * Added modules options to zfs-module-parameters.5 man page. * Preserved scaling based on target ARC size rather than max ARC size. Authored by: George Wilson <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Prashanth Sreenivasa <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9188 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/564 Upstream bug: DLPX-46942 Closes #7273
* Add kernel module auto-loadingBrian Behlendorf2018-03-1310-17/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically a dynamic misc minor number was registered for the /dev/zfs device in order to prevent minor number collisions. This was fine but it prevented us from being able to use the kernel module auto-loaded which requires a known reserved value. Resolve this issue by adding a configure test to find an available misc minor number which can then be used in MODULE_ALIAS_MISCDEV at build time. By adding this alias the zfs kmod is added to the list of known static-nodes and the systemd-tmpfiles-setup-dev service will create a /dev/zfs character device at boot time. This in turn allows us to update the 90-zfs.rules file to make it aware this is a static node. The upshot of this is that whenever a process (zpool, zfs, zed) opens the /dev/zfs the kmods will be automatic loaded. This even works for unprivileged users so there is no longer a need to manually load the modules at boot time. As an additional bonus the zed now no longer needs to start after the zfs-import.service since it will trigger the module load. In the unlikely event the minor number we selected conflicts with another out of tree unregistered minor number the code falls back to dynamically allocating it. In this case the modules again must be manually loaded. Note that due to the change in the method of registering the minor number the zimport.sh test case may incorrectly fail when the static node for the installed packages is created instead of the dynamic one. This issue will only transiently impact zimport.sh for this single commit when we transition and are mixing and matching methods. Reviewed-by: Fabian Grünbichler <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> TEST_ZIMPORT_SKIP="yes" Closes #7287
* Add zfs_scan_ignore_errors tunableTim Chase2018-03-132-0/+30
| | | | | | | | | When it's set, a DTL range will be cleared even if its scan/scrub had errors. This allows to work around resilver/scrub upon import when the pool has errors. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #7293
* Destroy makes full snap list before destroyingPaul Zuchowski2018-03-122-7/+18
| | | | | | | | | | Change zfs destroy logic so destroying begins before the entire list of snapshots is built. Reviewed-by: loli10K <[email protected]> Reviewed-by: Kash Pande <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Zuchowski <[email protected]> Closes #7271
* Fix race in trace point in zrl_add_implChunwei Chen2018-03-122-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We hit an illegal memory access in the zrlock trace point. The problem is that zrl->zr_owner and zrl->zr_caller are assigned locklessly. And if zrl->zr_owner got assigned a longer string between when __string() calculate the strlen, and when __assign_str() does strcpy. The copy will overflow the buffer. == For example: Initial condition: zrl->zr_owner = A zrl->zr_caller = "abc" Thread A Thread B ------------------------------------------------- if (zrl->zr_owner == A) { DTRACE_PROBE2() { __string() { strlen(zrl->zr_caller) -> 3 allocate buf[4] } zrl->zr_owner = B zrl->zr_caller = "abcd" __assign_str() { strcpy(buf, zrl->zr_caller) <- buffer overflow == Dereferencing zrl->zr_owner->pid may also be problematic, in that the zrl->zr_owner got changed to other task, and that task exits, freeing the task_struct. This should be very unlikely, as the other task need to zrl_remove and exit between the dereferencing zr->zr_owner and zr->zr_owner->pid. Nevertheless, we'll deal with it as well. To fix the zrl->zr_caller issue, instead of copy the string content, we just copy the pointer, this is safe because it always points to __func__, which is static. As for the zrl->zr_owner issue, we pass in curthread instead of using zrl->zr_owner. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #7291
* Fix MMP write frequency for large poolsBrian Behlendorf2018-03-121-3/+3
| | | | | | | | | | | | | | | | | | | When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays. This issue was reported on Arch Linux where HZ defaults to only 100 and this could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 and thus are unlikely to be impacted. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7205 Closes #7289
* Hold SCL_VDEV when counting leavesOlaf Faaland2018-03-091-1/+7
| | | | | | | | | | | | A config lock should be held while vdev_count_leaves() walks the tree, otherwise the pointers reference may become invalid during the walk. SCL_VDEV is a minimal lock provided for such uses cases. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7286
* Handle zio_resume and mmp => offOlaf Faaland2018-03-091-4/+10
| | | | | | | | | | | | | | | When multihost is disabled on a pool, and the pool is resumed via zpool clear, within a single cycle of the mmp thread's loop (e.g. while it's in the cv_timedwait call), both mmp_last_write and mmp_delay should be updated. The original code mistakenly treated the two cases as if they could not occur at the same time. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7286
* Document allowed pool namesTomohiro Kusumi2018-03-091-2/+6
| | | | | | | | | | | | PR #7208 was a patch to allow non-reserved pool names which begin with mirror, raidz, spare (but do not equal), however we'd rather document it in the man page for compatibility with other OpenZFS implementations, to avoid pool names that may not work on non-Linux platforms. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #7216
* Fix zfs-kmod builds when using rpm >= 4.14LOLi2018-03-091-0/+1
| | | | | | | | | | | With rpm-software-management/rpm@5e94633 a package version containing invalid characters (most commonly a double '-') causes the kmod package generation to terminate with an error. This change takes advantage of the newly introduced rpm macro "_wrong_version_format_terminate_build" to allow kmod packages to be built. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7284
* Change functions which return literals to return `const char*`Tomohiro Kusumi2018-03-094-5/+5
| | | | | | | | | | | | | | get_format_prompt_string() and zpool_state_to_name() return a string literal which is read-only, thus they should return `const char*`. zpool_get_prop_string() returns a non-const string after successful nv-lookup, and returns a string literal otherwise. Since this function is designed to be used for read-only purpose, the return type should also be `const char*`. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #7285
* QAT support for AES-GCMTom Caputi2018-03-0910-232/+758
| | | | | | | | | | This patch adds support for acceleration of AES-GCM encryption with Intel Quick Assist Technology. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chengfeix Zhu <[email protected]> Signed-off-by: Weigang Li <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7282
* zdb and inuse tests don't pass with real disksPaul Zuchowski2018-03-078-15/+59
| | | | | | | | | | Due to zpool create auto-partioning in Linux (i.e. sdb1), certain utilities need to use the parition (sdb1) while others use the whole disk name (sdb). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Zuchowski <[email protected]> Closes #6939 Closes #7261
* Take user namespaces into account in policy checksWolfgang Bumiller2018-03-0717-7/+521
| | | | | | | | | | | | | | | | | | | | | | | | | Change file related checks to use user namespaces and make sure involved uids/gids are mappable in the current namespace. Note that checks without file ownership information will still not take user namespaces into account, as some of these should be handled via 'zfs allow' (otherwise root in a user namespace could issue commands such as `zpool export`). This also adds an initial user namespace regression test for the setgid bit loss, with a user_ns_exec helper usable in further tests. Additionally, configure checks for the required user namespace related features are added for: * ns_capable * kuid/kgid_has_mapping() * user_ns in cred_t Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Wolfgang Bumiller <[email protected]> Closes #6800 Closes #7270
* ZTS: fix send-c_stream_size_estimateBrian Behlendorf2018-03-071-0/+1
| | | | | | | | | | | | The test could fail when attempting to write to a newly created volume which was missing its device node. Resolve the issue by calling block_device_wait() which blocks until udev creates the needed entry. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7276 Closes #7277
* Fix dbufstats_001_posGiuseppe Di Natale2018-03-072-4/+28
| | | | | | | | | | | | | | Implement a new helper within_tolerance to test if a value is within range of a target. Because the dbufstats and dbufs kstat file are being read at slightly different times, it is possible for stats to be slightly off. Use within_tolerance to determine if the value is "close enough" to the target. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #7239 Closes #7266
* Allow to limit zed's syslog chattinessTony Hutter2018-03-069-7/+178
| | | | | | | | | | | | | | | | | | Some usage patterns like send/recv of replication streams can produce a large number of events. In such a case, the current all-syslog.sh zedlet will hold up to its name, and flood the logs with mostly redundant information. Two mitigate this situation, this changeset introduces to new variables ZED_SYSLOG_SUBCLASS_INCLUDE and ZED_SYSLOG_SUBCLASS_EXCLUDE to zed.rc that give more control over which event classes end up in the syslog. Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Signed-off-by: Daniel Kobras <[email protected]> Closes #6886 Closes #7260
* Record skipped MMP writes in multihost_historyOlaf Faaland2018-03-0611-50/+263
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once per pass through the MMP thread's loop, the vdev tree is walked to find a suitable leaf to write the next MMP block to. If no such leaf is found, the thread sleeps for a while and resumes at the top of the loop. Add an entry to multihost_history when no leaf can be found, and record the reason in the error column. The error code for such entries is a bitfield, displayed in hex: 0x1 At least one vdev (interior or leaf) was not writeable. 0x2 At least one writeable leaf vdev was found, but it had a pending MMP write. timestamp = the time in seconds since the epoch when no leaf could be found originally. duration = the time (in ns) during which no MMP block was written for this reason. This does not include the preceeding inter-write period nor the following inter-write period. vdev_guid = the number of sequential cycles of the MMP thread looop when this occurred. Sample output, truncated to fit: For records of skipped MMP writes the right-most column, vdev_path, is reported as "-". id txg timestamp error duration mmp_delay vdev_guid ... 936 11 1520036441 0 146264 891422313 1740883117838 ... 937 11 1520036441 0 163956 888356657 7320395061548 ... 938 11 1520036442 0 130690 885314969 7320395061548 ... 939 11 1520036442 0 2001068577 882296582 1740883117838 ... 940 11 1520036443 0 161806 882296582 7320395061548 ... 941 11 1520036443 0x2 0 998020546 1 ... 942 11 1520036444 0 136585 998020546 7320395061548 ... 943 11 1520036444 0x2 0 998020257 1 ... 944 11 1520036445 5 2002662964 994160219 1740883117838 ... 945 11 1520036445 0x2 998073118 994160219 3 ... 946 11 1520036447 0 247136 994160219 7320395061548 ... Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7212
* Detect long config lock acquisition in mmpOlaf Faaland2018-03-061-0/+6
| | | | | | | | | | | | | | If something holds the config lock as a writer for too long, MMP will fail to issue MMP writes in a timely manner. This will result either in the pool being suspended, or in an extreme case, in the pool not being protected. If the time to acquire the config lock exceeds 1/10 of the minimum zfs_multihost_interval, report it in the zfs debug log. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7212
* Introduce a destroy_dataset helperGiuseppe Di Natale2018-03-064-24/+47
| | | | | | | | | | | | | Datasets can be busy when calling zfs destroy. Introduce a helper function to destroy datasets and use it to destroy datasets in zfs_allow_004_pos, zfs_promote_008_pos, and zfs_destroy_002_pos. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #7224 Closes #7246 Closes #7249 Closes #7267
* Misc fixes and cleanup for project quotaNasf-Fan2018-03-054-6/+9
| | | | | | | | | | | | | | | | | | | | | | | 1) The Coverity Scan reports some issues for the project quota patch, including: 1.1) zfs_prop_get_userquota() directly uses the const quota type value as the condition check by wrong. 1.2) dmu_objset_userquota_get_ids() may cause dnode::dn_newgid to be overwritten by dnode::dn->dn_oldprojid. 2) This patch fixes related issues. It also enhances the logic for zfs_project_item_alloc() to avoid buffer overflow. 3) Skip project quota ability check if does not change project quota related things (id or flag). Otherwise, it will cause chattr (for other non project quota flags) operation failed if project quota disabled. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fan Yong <[email protected]> Closes #7251 Closes #7265
* Linux 4.16 compat: get_disk_and_module()Giuseppe Di Natale2018-03-054-1/+29
| | | | | | | | | | As of https://github.com/torvalds/linux/commit/fb6d47a, get_disk() is now get_disk_and_module(). Add a configure check to determine if we need to use get_disk_and_module(). Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #7264
* Change checksum & IO delay ratelimit valuesTony Hutter2018-03-046-15/+56
| | | | | | | | | | | | | | | | Change checksum & IO delay ratelimit thresholds from 5/sec to 20/sec. This allows zed to actually trigger if a bunch of these events arrive in a short period of time (zed has a threshold of 10 events in 10 sec). Previously, if you had, say, 100 checksum errors in 1 sec, it would get ratelimited to 5/sec which wouldn't trigger zed to fault the drive. Also, convert the checksum and IO delay thresholds to module params for easy testing. Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #7252
* Increment zil_itx_needcopy_bytes properlychrisrd2018-03-021-2/+1
| | | | | | | | | | | | | | | | | In zil_lwb_commit() with TX_WRITE, we copy the log write record (lrw) into the log write block (lwb) and send it off using zil_lwb_add_txg(). If we also have WR_NEED_COPY, we additionally copy the lwr's data into the lwb to be sent off. If the lwr + data doesn't fit into the lwb, we send the lrw and as much data as will fit (dnow bytes), then go back and do the same with the remaining data. Each time through this loop we're sending dnow data bytes. I.e. zil_itx_needcopy_bytes should be incremented by dnow. Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlop <[email protected]> Closes #6988 Closes #7176
* ZTS: fix spurious failures in mv_fileschrisrd2018-03-021-14/+3
| | | | | | | | | | | | | | | The test could fail because of a race condition between the files being generated in the background and attempting to move the files. Wait for all file generation to complete before trying to move the files around. Also, clean up the waiting: the 'wait' command without arguments waits for all child pids. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlop <[email protected]> Closes #7220 Closes #7242 Closes #7258
* Add ZFS perf test for dbuf cacheJohn Wren Kennedy2018-02-287-5/+118
| | | | | | | | | | | | | This change adds a test for sequential reads out of the dbuf cache. It's essentially a copy of sequential_reads_cached, using a smaller data set. The sequential read tests are renamed to differentiate them. Authored by: Dan Kimmel <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: John Wren Kennedy <[email protected]> Closes #7225
* Fix some typosJohn Eismeier2018-02-2816-30/+30
| | | | | | | Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: George Melikov <[email protected]> Signed-off-by: John Eismeier <[email protected]> Closes #7237
* Fix zpool(8) list example to match actual formatTomohiro Kusumi2018-02-281-10/+10
| | | | | | | | | | | | | a05dfd00 (Illumos 5147) has swapped FRAG and EXPANDSZ, so it's natural to modify these examples. # zpool list | head -1 NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT ^^^^^^^^^^^^^^^ Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #7244
* Add Python 3 rewrite of arc_summary.pyScot W. Stevenson2018-02-286-2/+918
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new script arc_summary3.py as a complete rewrite of the arc_summary.py tool (see issue #6873) Add new options: -g/--graph - Display crude graphic representation of ARC status and quit -r/--raw - Print all available information as minimally formatted list (for grep) -s/--section - Print a single section. This replaces -p/--page, which is kept for backwards use but marked as depreciated Add new sections with information on ZIL and SPL. Notify user if sections L2ARC and VDEV are skipped instead of failing silently. Add warning that -p/--page option is depreciated. Developed for Python 3.5. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Scot W. Stevenson <[email protected]> Closes #6873 Closes #6892
* Add SMART self-test results to zpool status -cTony Hutter2018-02-277-18/+133
| | | | | | | | | | | | | Add in SMART self-test results to zpool status|iostat -c. This works for both SAS and SATA drives. Also, add plumbing to allow the 'smart' script to take smartctl output from a directory of output text files instead of running it against the vdevs. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #7178
* Raw DRR_OBJECT records must write raw dataTom Caputi2018-02-274-55/+72
| | | | | | | | | | | | | | | | | | | | | | b1d21733 made it possible for empty metadnode blocks to be compressed to a hole, fixing a bug that would cause invalid metadnode MACs when a send stream attempted to free objects and allowing the blocks to be reclaimed when they were no longer needed. However, this patch also introduced a race condition; if a txg sync occurred after a DRR_OBJECT_RANGE record was received but before any objects were added, the metadnode block would be compressed to a hole and lose all of its encryption parameters. This would cause subsequent DRR_OBJECT records to fail when they attempted to write their data into an unencrypted block. This patch defers the DRR_OBJECT_RANGE handling to receive_object() so that the encryption parameters are set with each object that is written into that block. Reviewed-by: Kash Pande <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7215 Closes #7236
* Incorrect maximum DVA value in DDE_GET_NDVAS()Tim Chase2018-02-261-1/+1
| | | | | | | | | | The conditional was reversed which caused garbage values to be used when calculating dds_ref_dsize. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #7234
* Fix segfault in zfs_do_bookmark()LOLi2018-02-262-2/+31
| | | | | | | | | | | | | | When invoked with wrong parameters 'zfs bookmark' fails to gracefully validate user input and crashes. This is a regression accidentally introduced in 587e228; this commit adds additional tests to the ZFS Test Suite to exercise this codepath. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: KireinaHoro <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7228 Closes #7229
* ZTS: Fix zfs_share_* test case failuresBrian Behlendorf2018-02-241-1/+7
| | | | | | | | | | | | Prevent false positives when running the zfs_share_* test cases due to leftover stale /var/lib/nfs/etab entries. When starting the test group re-synchronize the /var/lib/nfs/etab file with /etc/exports. At this point in the testing there will be no additional `zfs share` entries to add. Reviewed by: George Melikov <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7226
* Shellcheck cleanup for initrd scriptsKash Pande2018-02-236-101/+98
| | | | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Kash Pande <[email protected]> Co-authored-by: Matthew Thode <[email protected]> Signed-off-by: Kash Pande <[email protected]> Signed-off-by: Matthew Thode <[email protected]> Closes #7214
* Enable booting from nested encrypted datasetsKash Pande2018-02-233-33/+82
| | | | | | | | | | | | - enable booting from nested encrypted datasets - fix plymouth boot splash passphrase entry - optimize unlock process Co-authored-by: Kash Pande <[email protected]> Co-authored-by: Matthew Thode <[email protected]> Signed-off-by: Kash Pande <[email protected]> Signed-off-by: Matthew Thode <[email protected]> Closes #7214
* Add scrub after resilver zed scriptTony Hutter2018-02-2316-21/+187
| | | | | | | | | | | | | | | | | | | | * Add a zed script to kick off a scrub after a resilver. The script is disabled by default. * Add a optional $PATH (-P) option to zed to allow it to use a custom $PATH for its zedlets. This is needed when you're running zed under the ZTS in a local workspace. * Update test scripts to not copy in all-debug.sh and all-syslog.sh by default. They can be optionally copied in as part of zed_setup(). These scripts slow down zed considerably under heavy events loads and can cause events to be dropped or their delivery delayed. This was causing some sporadic failures in the 'fault' tests. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #4662 Closes #7086
* Fix free memory calculation on v3.14+chrisrd2018-02-238-41/+255
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide infrastructure to auto-configure to enum and API changes in the global page stats used for our free memory calculations. arc_free_memory has been broken since an API change in Linux v3.14: 2016-07-28 v4.8 599d0c95 mm, vmscan: move LRU lists to node 2016-07-28 v4.8 75ef7184 mm, vmstat: add infrastructure for per-node vmstats These commits moved some of global_page_state() into global_node_page_state(). The API change was particularly egregious as, instead of breaking the old code, it silently did the wrong thing and we continued using global_page_state() where we should have been using global_node_page_state(), thus indexing into the wrong array via NR_SLAB_RECLAIMABLE et al. There have been further API changes along the way: 2017-07-06 v4.13 385386cf mm: vmstat: move slab statistics from zone to node counters 2017-09-06 v4.14 c41f012a mm: rename global_page_state to global_zone_page_state ...and various (incomplete, as it turns out) attempts to accomodate these changes in ZoL: 2017-08-24 2209e409 Linux 4.8+ compatibility fix for vm stats 2017-09-16 787acae0 Linux 3.14 compat: IO acct, global_page_state, etc 2017-09-19 661907e6 Linux 4.14 compat: IO acct, global_page_state, etc The config infrastructure provided here resolves these issues going back to the original API change in v3.14 and is robust against further Linux changes in this area. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Chris Dunlop <[email protected]> Closes #7170
* Report duration and error in mmp_history entriesOlaf Faaland2018-02-225-13/+60
| | | | | | | | | | | | | | | | | | | | | | After an MMP write completes, update the relevant mmp_history entry with the time between submission and completion, and the error status of the write. [faaland1@toss3a zfs]$ cat /proc/spl/kstat/zfs/pool/multihost 39 0 0x01 100 8800 69147946270893 72723903122926 id txg timestamp error duration mmp_delay vdev_guid 10607 1166 1518985089 0 138301 637785455 4882... 10608 1166 1518985089 0 136154 635407747 1151... 10609 1166 1518985089 0 803618560 633048078 9740... 10610 1166 1518985090 0 144826 633048078 4882... 10611 1166 1518985090 0 164527 666187671 1151... Where duration = gethrtime_in_done_fn - gethrtime_at_submission, and error = zio->io_error. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7190
* Do not initiate MMP writes while pool is suspendedOlaf Faaland2018-02-221-1/+1
| | | | | | | | | | | While the pool is suspended on host A, it may be imported on host B. If host A continued to write MMP blocks, it would be blindly overwriting MMP blocks written by host B, and the blocks written by host A would have outdated txg information. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7182
* Linux 4.16 compat: use correct *_dec_and_test()Tony Hutter2018-02-223-1/+26
| | | | | | | | | | Use refcount_dec_and_test() on 4.16+ kernels, atomic_dec_and_test() on older kernels. https://lwn.net/Articles/714974/ Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes: #7179 Closes: #7211
* Fix bounds check in zio_crypt_do_objset_hmacsTom Caputi2018-02-221-4/+8
| | | | | | | | | | | | The current bounds check in zio_crypt_do_objset_hmacs() does not properly handle the possible sizes of the objset_phys_t and can therefore read outside the buffer's memory. If that memory happened to match what the check was actually looking for, the objset would fail to be owned, complaining that the MAC was invalid. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7210
* OpenZFS 9035 - zfs: this statement may fall throughToomas Soome2018-02-214-2/+4
| | | | | | | | | | | | | Authored by: Toomas Soome <[email protected]> Reviewed by: Yuri Pankov <[email protected]> Reviewed by: Andy Fiddaman <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9035 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/46ac8fdfc5 Closes #7206
* Allow modprobe to fail when called within systemdMatthew Thode2018-02-212-2/+2
| | | | | | | | | | This allows for systems with zfs built into the kernel manually to run these services. Otherwise the service will fail to start. Reviewed-by: loli10K <[email protected]> Reviewed-by: Kash Pande <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Thode <[email protected]> Closes #7174
* Add SMART attributes for SSD and NVMebunder20152018-02-213-2/+24
| | | | | | | | | | | | This adds the SMART attributes required to probe Samsung SSD and NVMe (and possibly others) disks when using the "zpool status -c" command. Reviewed-by: loli10K <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: bunder2015 <[email protected]> Closes #7183 Closes #7193
* Allow make checkstyle and paxscript in build dirchrisrd2018-02-211-6/+7
| | | | | | | Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Chris Dunlop <[email protected]> Closes #7202
* Want 'zfs send -b'LOLi2018-02-219-120/+257
| | | | | | | | | | | | | | | This change implements 'zfs send -b' which can be used to send only received property values whether or not they are overridden by local settings. This can be very useful during "restore" operations from a backup pool because it allows to send only the property values originally sent from the backup source, even though they were later modified on the destination either by a 'zfs set' operation, explicit 'zfs inherit' or overridden during the receive process via 'zfs receive -o|-x'. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7156
* Raw receive should change key atomicallyTom Caputi2018-02-215-225/+312
| | | | | | | | | | | | | | | | | Currently, raw zfs sends transfer the encrypted master keys and objset_phys_t encryption parameters in the DRR_BEGIN payload of each send file. Both of these are processed as soon as they are read in dmu_recv_stream(), meaning that the new keys are set before the new snapshot is received. In addition to the fact that this changes the user's keys for the dataset earlier than they might expect, the keys were never reset to what they originally were in the event that the receive failed. This patch splits the processing into objset handling and key handling, the later of which is moved to dmu_recv_end() so that they key change can be done atomically. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7200