aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* Extend zloop.sh for automated testingBrian Behlendorf2018-01-251-8/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | In order to debug issues encountered by ztest during automated testing it's important that as much debugging information as possible by dumped at the time of the failure. The following changes extend the zloop.sh script in order to make it easier to integrate with buildbot. * Add the `-m <maximum cores>` option to zloop.sh to place a limit of the number of core dumps generated. By default, the existing behavior is maintained and no limit is set. * Add the `-l` option to create a 'ztest.core.N' symlink in the current directory to the core directory. This functionality is provided primarily for buildbot which expects log files to have well known names. * Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump additional basic information on failure for latter analysis. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6999
* Prevent zdb(8) from occasionally hanging on I/OBrian Behlendorf2018-01-251-0/+10
| | | | | | | | | | | | | | | | | The zdb(8) command may not terminate in the case where the pool gets suspended and there is a caller in zio_wait() blocking on an outstanding read I/O that will never complete. This can in turn cause ztest(1) to block indefinitely despite the deadman. Resolve the issue by setting the default failure mode for zdb(8) to panic. In user space we always want the command to terminate when forward progress is no longer possible. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6999
* Extend deadman logicBrian Behlendorf2018-01-2521-57/+582
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The intent of this patch is extend the existing deadman code such that it's flexible enough to be used by both ztest and on production systems. The proposed changes include: * Added a new `zfs_deadman_failmode` module option which is used to dynamically control the behavior of the deadman. It's loosely modeled after, but independant from, the pool failmode property. It can be set to wait, continue, or panic. * wait - Wait for the "hung" I/O (default) * continue - Attempt to recover from a "hung" I/O * panic - Panic the system * Added a new `zfs_deadman_ziotime_ms` module option which is analogous to `zfs_deadman_synctime_ms` except instead of applying to a pool TXG sync it applies to zio_wait(). A default value of 300s is used to define a "hung" zio. * The ztest deadman thread has been re-enabled by default, aligned with the upstream OpenZFS code, and then extended to terminate the process when it takes significantly longer to complete than expected. * The -G option was added to ztest to print the internal debug log when a fatal error is encountered. This same option was previously added to zdb in commit fa603f82. Update zloop.sh to unconditionally pass -G to obtain additional debugging. * The FM_EREPORT_ZFS_DELAY event which was previously posted when the deadman detect a "hung" pool has been replaced by a new dedicated FM_EREPORT_ZFS_DEADMAN event. * The proposed recovery logic attempts to restart a "hung" zio by calling zio_interrupt() on any outstanding leaf zios. We may want to further restrict this to zios in either the ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages. Calling zio_interrupt() is expected to only be useful for cases when an IO has been submitted to the physical device but for some reasonable the completion callback hasn't been called by the lower layers. This shouldn't be possible but has been observed and may be caused by kernel/driver bugs. * The 'zfs_deadman_synctime_ms' default value was reduced from 1000s to 600s. * Depending on how ztest fails there may be no cache file to move. This should not be considered fatal, collect the logs which are available and carry on. * Add deadman test cases for spa_deadman() and zio_wait(). * Increase default zfs_deadman_checktime_ms to 60s. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6999
* OpenZFS 8731 - ASSERT3U(nui64s, <=, UINT16_MAX) fails for large blocksAndriy Gapon2018-01-251-6/+6
| | | | | | | | | | | | | | Authored by: Andriy Gapon <[email protected]> Approved by: Dan McDonald <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8731 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4c08500788 Closes #7079
* Revert "Remove wrong ASSERT in annotate_ecksum"Giuseppe Di Natale2018-01-251-2/+2
| | | | | | | | | | This reverts commit 093911f1945b5c164a45bb077103283dafdcae0c. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #7079
* OpenZFS 8972 - zfs holds: In scripted mode, do not pad columns with spacesAllan Jude2018-01-191-4/+7
| | | | | | | | | | | | Authored by: Allan Jude <[email protected]> Approved by: Dan McDonald <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Melikov <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8972 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3aace5c077 Closes #7063
* OpenZFS 8835 - Speculative prefetch in ZFS not working for misaligned readsAlexander Motin2018-01-191-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | In case of misaligned I/O sequential requests are not detected as such due to overlaps in logical block sequence: dmu_zfetch(fffff80198dd0ae0, 27347, 9, 1) dmu_zfetch(fffff80198dd0ae0, 27355, 9, 1) dmu_zfetch(fffff80198dd0ae0, 27363, 9, 1) dmu_zfetch(fffff80198dd0ae0, 27371, 9, 1) dmu_zfetch(fffff80198dd0ae0, 27379, 9, 1) dmu_zfetch(fffff80198dd0ae0, 27387, 9, 1) This patch makes single block overlap to be counted as a stream hit, improving performance up to several times. Authored by: Alexander Motin <[email protected]> Approved by: Gordon Ross <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Allan Jude <[email protected]> Reviewed by: Gvozden Neskovic <[email protected]> Reviewed by: George Melikov <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8835 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/aab6dd482a Closes #7062
* OpenZFS 8652 - Tautological comparisons with ZPROP_INVALBrian Behlendorf2018-01-194-16/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | usr/src/uts/common/sys/fs/zfs.h Change ZPROP_INVAL and ZPROP_CONT from macros to enum values. Clang and GCC both prefer to use unsigned ints to store enums. That was causing tautological comparison warnings (and likely eliminating error handling code at compile time) whenever a zfs_prop_t or zpool_prop_t was compared to ZPROP_INVAL or ZPROP_CONT. Making the error flags be explicity enum values forces the enum types to be signed. ZPROP_INVAL was also compared against two different enum types. I had to change its name to ZPOOL_PROP_INVAL whenever its compared to a zpool_prop_t. There are still some places where ZPROP_INVAL or ZPROP_CONT is compared to a plain int, in code that doesn't know whether the int is storing a zfs_prop_t or a zpool_prop_t. usr/src/uts/common/fs/zfs/spa.c s/ZPROP_INVAL/ZPOOL_PROP_INVAL/ Authored by: Alan Somers <[email protected]> Approved by: Gordon Ross <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Reviewed by: George Melikov <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8652 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c2de80dc74 Closes #7061
* OpenZFS 8641 - "zpool clear" and "zinject" don't work on "spare" or ↵Brian Behlendorf2018-01-191-5/+6
| | | | | | | | | | | | | | | | | | "replacing" vdevs Add "spare" and "replacing" to the list of interior vdev types in zpool_vdev_is_interior(), alongside the existing "mirror" and "raidz". This fixes running "zinject -d" and "zpool clear" on spare and replacing vdevs. Authored by: Alan Somers <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Melikov <[email protected]> Approved by: Gordon Ross <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8641 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/9a36801382 Closes #7060
* Run zfs load-key if needed in dracutMatthew Thode2018-01-188-11/+92
| | | | | | | | | | | | | | | | 'zfs load-key -a' will only be called if needed. If a dataset not needed for boot does not have its key loaded (home directories for example) boot can still continue. zfs:AUTO was not working via dracut, so we still need the generator script to do its thing. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Manuel Amador (Rudd-O) <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Matthew Thode <[email protected]> Closes #6982 Closes #7004
* Fix Debian packaging on ARMv7/ARM64LOLi2018-01-181-3/+6
| | | | | | | | | | | | | | | When building packages on Debian-based systems specify the target architecture used by 'alien' to convert .rpm packages into .deb: this avoids detecting an incorrect value which results in the following errors: <package>.aarch64.rpm is for architecture aarch64 ; the package cannot be built on this system <package>.armv7l.rpm is for architecture armel ; the package cannot be built on this system Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7046 Closes #7058
* Emit an error message before MMP suspends poolJohn L. Hammond2018-01-171-0/+5
| | | | | | | | | | | In mmp_thread(), emit an MMP specific error message before calling zio_suspend() so that the administrator will understand why the pool is being suspended. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: John L. Hammond <[email protected]> Closes #7048
* OpenZFS 8959 - Add notifications when a scrub is paused or resumedSean Eric Fagan2018-01-173-2/+36
| | | | | | | | | | | | | | | | | | | Authored by: Sean Eric Fagan <[email protected]> Reviewed by: Alek Pinchuk <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Approved by: Gordon Ross <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> Porting Notes: - Brought #defines in eventdefs.h in line with ZFS on Linux format. - Updated zfs-events.5 with the new events. OpenZFS-issue: https://www.illumos.org/issues/8959 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c862b93eea Closes #7049
* Fix shellcheck v0.4.6 warningsBrian Behlendorf2018-01-176-17/+20
| | | | | | | | | | | | | Resolve new warnings reported after upgrading to shellcheck version 0.4.6. This patch contains no functional changes. * egrep is non-standard and deprecated. Use grep -E instead. [SC2196] * Check exit code directly with e.g. 'if mycmd;', not indirectly with $?. [SC2181] Suppressed. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7040
* Remove l2arc_nocompress from zfs-module-parameters(5)DeHackEd2018-01-161-11/+0
| | | | | | | | Parameter was removed in d3c2ae1c0806 (OpenZFS 6950 - ARC should cache compressed data) Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: DHE <[email protected]> Closes #7043
* Fix copy-builtin to work with ASAN patchMatthew Thode2018-01-121-2/+4
| | | | | | | | | | Commit fed90353 didn't fully update the copy-builtin script as needed to perform in-kernel builds. Add the missing options and flags. Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Thode <[email protected]> Closes #7033 Closes #7037
* Force ztest to always use /dev/urandomBrian Behlendorf2018-01-125-16/+12
| | | | | | | | | | | | For ztest, which is solely for testing, using a pseudo random is entirely reasonable. Using /dev/urandom ensures the system entropy pool doesn't get depleted thus stalling the testing. This is a particular problem when testing in VMs. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7017 Closes #7036
* OpenZFS 8899 - zpool list property documentation doesn't match actual behaviourYuri Pankov2018-01-111-9/+5
| | | | | | | | | | | | Authored by: Yuri Pankov <[email protected]> Reviewed by: Alexander Pyhalov <[email protected]> Reviewed-by: George Melikov <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8899 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b0e142e57d Closes #7032
* OpenZFS 8898 - creating fs with checksum=skein on the boot pools fails ↵Yuri Pankov2018-01-112-2/+10
| | | | | | | | | | | | | | | ungracefully Authored by: Yuri Pankov <[email protected]> Reviewed by: Toomas Soome <[email protected]> Reviewed by: Andy Stormont <[email protected]> Reviewed-by: George Melikov <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8898 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/9fa2266d9a Closes #7031
* OpenZFS 8897 - zpool online -e fails assertion when run on non-leaf vdevsYuri Pankov2018-01-111-2/+4
| | | | | | | | | | | | | Authored by: Yuri Pankov <[email protected]> Reviewed by: Toomas Soome <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Reviewed-by: George Melikov <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8897 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/9a551dd645 Closes #7030
* OpenZFS 8930 - zfs_zinactive: do not remove the node if the filesystem is ↵Andriy Gapon2018-01-111-7/+26
| | | | | | | | | | | | | | readonly Authored by: Andriy Gapon <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: George Melikov <[email protected]> Approved by: Gordon Ross <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8930 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/93c618e0f4 Closes #7029
* Support -fsanitize=address with --enable-asanBrian Behlendorf2018-01-1040-222/+337
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When --enable-asan is provided to configure then build all user space components with fsanitize=address. For kernel support use the Linux KASAN feature instead. https://github.com/google/sanitizers/wiki/AddressSanitizer When using gcc version 4.8 any test case which intentionally generates a core dump will fail when using --enable-asan. The default behavior is to disable core dumps and only newer versions allow this behavior to be controled at run time with the ASAN_OPTIONS environment variable. Additionally, this patch includes some build system cleanup. * Rules.am updated to set the minimum AM_CFLAGS, AM_CPPFLAGS, and AM_LDFLAGS. Any additional flags should be added on a per-Makefile basic. The --enable-debug and --enable-asan options apply to all user space binaries and libraries. * Compiler checks consolidated in always-compiler-options.m4 and renamed for consistency. * -fstack-check compiler flag was removed, this functionality is provided by asan when configured with --enable-asan. * Split DEBUG_CFLAGS in to DEBUG_CFLAGS, DEBUG_CPPFLAGS, and DEBUG_LDFLAGS. * Moved default kernel build flags in to module/Makefile.in and split in to ZFS_MODULE_CFLAGS and ZFS_MODULE_CPPFLAGS. These flags are set with the standard ccflags-y kbuild mechanism. * -Wframe-larger-than checks applied only to binaries or libraries which include source files which are built in both user space and kernel space. This restriction is relaxed for user space only utilities. * -Wno-unused-but-set-variable applied only to libzfs and libzpool. The remaining warnings are the result of an ASSERT using a variable when is always declared. * -D_POSIX_PTHREAD_SEMANTICS and -D__EXTENSIONS__ dropped because they are Solaris specific and thus not needed. * Ensure $GDB is defined as gdb by default in zloop.sh. Signed-off-by: DHE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7027
* Disable history_004_posBrian Behlendorf2018-01-101-0/+5
| | | | | | | | | | | Occasionally observed failure of history_004_pos due to the test case not being 100% reliable. In order to prevent false positives disable this test case until it can be made reliable. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #7026 Closes #7028
* Fix incompatibility with Reiser4 patched kernelsRichard Yao2018-01-091-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | In ZFSOnLinux, our sources and build system are self contained such that we do not need to make changes to the Linux kernel sources. Reiser4 on the other hand exists solely as a kernel tree patch and opts to make changes to the kernel rather than adapt to it. After Linux 4.1 made a VFS change that replaced new_sync_read with do_sync_read, Reiser4's maintainer decided to modify the kernel VFS to export the old function. This caused our autotools check to misidentify the kernel API as predating Linux 4.1 on kernels that have been patched with Reiser4 support, which breaks our build. Reiser4 really should be patched to stop doing this, but lets modify our check to be more strict to help the affected users of both filesystems. Also, we were not checking the types of arguments and return value of new_sync_read() and new_sync_write() . Lets fix that too. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #6241 Closes #7021
* Use zap_count instead of cached z_size for unlinkAlex Zhuravlev2018-01-091-1/+15
| | | | | | | | | | | | | | | | | | As a performance optimization Lustre does not strictly update the SA_ZPL_SIZE when adding/removing from non-directory entries. This results in entries which cannot be removed through the ZPL layer even though the ZAP is empty and safe to remove. Resolve this issue by checking the zap_count() directly instead on relying on the cached SA_ZPL_SIZE. Micro-benchmarks show no significant performance impact due to the additional overhead of using zap_count(). Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Alex Zhuravlev <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7019
* Revert raidz_map and _col structure typesNathaniel Wesley Filardo2018-01-091-17/+17
| | | | | | | | | | | | | | | As part of the refactoring of ab9f4b0b824ab4cc64a4fa382c037f4154de12d6, several uint64_t-s and uint8_t-s were changed to other types. This caused ZoL github issue #6981, an overflow of a size_t on a 32-bit ARM machine. In absense of any strong motivation for the type changes, this simply puts them back, modulo the changes accumulated for ABD. Compile-tested on amd64 and run-tested on armhf. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Gvozden Neskovic <[email protected]> Signed-off-by: Nathaniel Wesley Filardo <[email protected]> Closes #6981 Closes #7023
* Fix unused variable warningsBrian Behlendorf2018-01-092-7/+6
| | | | | | | | | Resolved unused variable warnings observed after restricting -Wno-unused-but-set-variable to only libzfs and libzpool. Reviewed-by: DHE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6941
* Fix ztest_verify_dnode_bt() test caseBrian Behlendorf2018-01-091-1/+5
| | | | | | | | | In ztest_verify_dnode_bt the ztest_object_lock must be held in order to safely verify the unused bonus space. Reviewed-by: DHE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6941
* Fix -fsanitize=address memory leakDHE2018-01-091-2/+4
| | | | | | | | kmem_alloc(0, ...) in userspace returns a leakable pointer. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: DHE <[email protected]> Issue #6941
* Fix percentage styling in zfs-module-parameters.5George Amanakis2018-01-091-17/+17
| | | | | | | | | Replace "percent" with "%", add bold to default values. Reviewed-by: bunder2015 <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #7018
* Reduce codecov PR commentsBrian Behlendorf2018-01-091-1/+2
| | | | | | | | | | | | Attempt to reduce the number of comments posted by codecov to PR requests. Based on the codecov documenation setting "require_changes=yes" and "behavior=once" should result in a single comment under most circumstances. https://docs.codecov.io/v4.3.6/docs/pull-request-comments Signed-off-by: Brian Behlendorf <[email protected]> Issue #7022 Closes #7025
* zhack: fix getopt return typeNathaniel Wesley Filardo2018-01-091-3/+3
| | | | | | | | | This fixes zhack's command processing on ARM. On ARM char is unsigned, and so, in promotion to an int, it will never compare equal to -1. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Nathaniel Wesley Filardo <[email protected]> Closes #7016
* Fix ARC hit rateBrian Behlendorf2018-01-083-2/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | When the compressed ARC feature was added in commit d3c2ae1 the method of reference counting in the ARC was modified. As part of this accounting change the arc_buf_add_ref() function was removed entirely. This would have be fine but the arc_buf_add_ref() function served a second undocumented purpose of updating the ARC access information when taking a hold on a dbuf. Without this logic in place a cached dbuf would not migrate its associated arc_buf_hdr_t to the MFU list. This would negatively impact the ARC hit rate, particularly on systems with a small ARC. This change reinstates the missing call to arc_access() from dbuf_hold() by implementing a new arc_buf_access() function. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tim Chase <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6171 Closes #6852 Closes #6989
* Fix 'zpool add' handling of nested interior VDEVsLOLi2017-12-284-4/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | When replacing a faulted device which was previously handled by a spare multiple levels of nested interior VDEVs will be present in the pool configuration; the following example illustrates one of the possible situations: NAME STATE READ WRITE CKSUM testpool DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 spare-0 DEGRADED 0 0 0 replacing-0 DEGRADED 0 0 0 /var/tmp/fault-dev UNAVAIL 0 0 0 cannot open /var/tmp/replace-dev ONLINE 0 0 0 /var/tmp/spare-dev1 ONLINE 0 0 0 /var/tmp/safe-dev ONLINE 0 0 0 spares /var/tmp/spare-dev1 INUSE currently in use This is safe and allowed, but get_replication() needs to handle this situation gracefully to let zpool add new devices to the pool. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6678 Closes #6996
* OpenZFS 8909 - 8585 can cause a use-after-free kernel panicPrakash Surya2017-12-289-110/+254
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Authored by: Prakash Surya <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Brad Lewis <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Prakash Surya <[email protected]> PROBLEM ======= There's a race condition that exists if `zil_free_lwb` races with either `zil_commit_waiter_timeout` and/or `zil_lwb_flush_vdevs_done`. Here's an example panic due to this bug: > ::status debugging crash dump vmcore.0 (64-bit) from ip-10-110-205-40 operating system: 5.11 dlpx-5.2.2.0_2017-12-04-17-28-32b6ba51fb (i86pc) image uuid: 4af0edfb-e58e-6ed8-cafc-d3e9167c7513 panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff0010555970 addr=60 occurred in module "zfs" due to a NULL pointer dereference dump content: kernel pages only > $c zio_shrink+0x12() zil_lwb_write_issue+0x30d(ffffff03dcd15cc0, ffffff03e0730e20) zil_commit_waiter_timeout+0xa2(ffffff03dcd15cc0, ffffff03d97ffcf8) zil_commit_waiter+0xf3(ffffff03dcd15cc0, ffffff03d97ffcf8) zil_commit+0x80(ffffff03dcd15cc0, 9a9) zfs_write+0xc34(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0) fop_write+0x5b(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0) write+0x250(42, fffffd7ff4832000, 2000) sys_syscall+0x177() If there's an outstanding lwb that's in `zil_commit_waiter_timeout` waiting to timeout, waiting on it's waiter's CV, we must be sure not to call `zil_free_lwb`. If we end up calling `zil_free_lwb`, then that LWB may be freed and can result in a use-after-free situation where the stale lwb pointer stored in the `zil_commit_waiter_t` structure of the thread waiting on the waiter's CV is used. A similar situation can occur if an lwb is issued to disk, and thus in the `LWB_STATE_ISSUED` state, and `zil_free_lwb` is called while the disk is servicing that lwb. In this situation, the lwb will be freed by `zil_free_lwb`, which will result in a use-after-free situation when the lwb's zio completes, and `zil_lwb_flush_vdevs_done` is called. This race condition is prevented in `zil_close` by calling `zil_commit` before `zil_free_lwb` is called, which will ensure all outstanding (i.e. all lwb's in the `LWB_STATE_OPEN` and/or `LWB_STATE_ISSUED` states) reach the `LWB_STATE_DONE` state before the lwb's are freed (`zil_commit` will not return untill all the lwb's are `LWB_STATE_DONE`). Further, this race condition is prevented in `zil_sync` by only calling `zil_free_lwb` for lwb's that do not have their `lwb_buf` pointer set. All lwb's not in the `LWB_STATE_DONE` state will have a non-null value for this pointer; the pointer is only cleared in `zil_lwb_flush_vdevs_done`, at which point the lwb's state will be changed to `LWB_STATE_DONE`. This race *is* present in `zil_suspend`, leading to this bug. At first glance, it would appear as though this would not be true because `zil_suspend` will call `zil_commit`, just like `zil_close`, but the problem is that `zil_suspend` will set the zilog's `zl_suspend` field prior to calling `zil_commit`. Further, in `zil_commit`, if `zl_suspend` is set, `zil_commit` will take a special branch of logic and use `txg_wait_synced` instead of performing the normal `zil_commit` logic. This call to `txg_wait_synced` might be good enough for the data to reach disk safely before it returns, but it does not ensure that all outstanding lwb's reach the `LWB_STATE_DONE` state before it returns. This is because, if there's an lwb "stuck" in `zil_commit_waiter_timeout`, waiting for it's lwb to timeout, it will maintain a non-null value for it's `lwb_buf` field and thus `zil_sync` will not free that lwb. Thus, even though the lwb's data is already on disk, the lwb will be left lingering, waiting on the CV, and will eventually timeout and be issued to disk even though the write is unnecessary. So, after `zil_commit` is called from `zil_suspend`, we incorrectly assume that there are not outstanding lwb's, and proceed to free all lwb's found on the zilog's lwb list. As a result, we free the lwb that will later be used `zil_commit_waiter_timeout`. SOLUTION ======== The solution to this, is to ensure all outstanding lwb's complete before calling `zil_free_lwb` via `zil_destroy` in `zil_suspend`. This patch accomplishes this goal by forcing the normal `zil_commit` logic when called from `zil_sync`. Now, `zil_suspend` will call `zil_commit_impl` which will always use the normal logic of waiting/issuing lwb's to disk before it returns. As a result, any lwb's outstanding when `zil_commit_impl` is called will be guaranteed to reach the `LWB_STATE_DONE` state by the time it returns. Further, no new lwb's will be created via `zil_commit` since the zilog's `zl_suspend` flag will be set. This will force all new callers of `zil_commit` to use `txg_wait_synced` instead of creating and issuing new lwb's. Thus, all lwb's left on the zilog's lwb list when `zil_destroy` is called will be in the `LWB_STATE_DONE` state, and we'll avoid this race condition. OpenZFS-issue: https://www.illumos.org/issues/8909 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ece62b6f8d Closes #6940
* Call commit callbacks from the tail of the listlidongyang2017-12-223-5/+6
| | | | | | | | | | | | | | | | | | | | Our zfs backed Lustre MDT had soft lockups while under heavy metadata workloads while handling transaction callbacks from osd_zfs. The problem is zfs is not taking advantage of the fast path in Lustre's trans callback handling, where Lustre will skip the calls to ptlrpc_commit_replies() when it already saw a higher transaction number. This patch corrects this, it also has a positive impact on metadata performance on Lustre with osd_zfs, plus some cleanup in the headers. A similar issue for ext4/ldiskfs is described on: https://jira.hpdd.intel.com/browse/LU-6527 Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Li Dongyang <[email protected]> Closes #6986
* Remove empty files accidentally added by a8b2e306 Tom Caputi2017-12-222-0/+0
| | | | | | | | | This patch simply removes 2 empty files that were accidentally added a part of the scrub priority patch. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #6990
* Support re-prioritizing asynchronous prefetchesTom Caputi2017-12-2111-43/+124
| | | | | | | | | | | | | | | | | | | | | When sequential scrubs were merged, all calls to arc_read() (including prefetch IOs) were given ZIO_PRIORITY_ASYNC_READ. Unfortunately, this behaves badly with an existing issue where prefetch IOs cannot be re-prioritized after the issue. The result is that synchronous reads end up in the same vdev_queue as the scrub IOs and can have (in some workloads) multiple seconds of latency. This patch incorporates 2 changes. The first ensures that all scrub IOs are given ZIO_PRIORITY_SCRUB to allow the vdev_queue code to differentiate between these I/Os and user prefetches. Second, this patch introduces zio_change_priority() to provide the missing capability to upgrade a zio's priority. Reviewed by: George Wilson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #6921 Closes #6926
* vdev_id: new slot type sesSimon Guest2017-12-202-1/+24
| | | | | | | | | | | | | | | This extends vdev_id to support a new slot type, ses, for SCSI Enclosure Services. With slot type ses, the disk slot numbers are determined by using the device slot number reported by sg_ses for the device with matching SAS address, found by querying all available enclosures. This is primarily of use on systems with a deficient driver omitting support for bay_identifier in /sys/devices. In my testing, I found that the existing slot types of port and id were not stable across disk replacement, so an alternative was required. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Simon Guest <[email protected]> Closes #6956
* Handle broken pipes in arc_summaryGiuseppe Di Natale2017-12-192-0/+16
| | | | | | | | | | | Using a command similar to 'arc_summary.py | head' causes a broken pipe exception. Gracefully exit in the case of a broken pipe in arc_summary.py. Reviewed-by: Richard Elling <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #6965 Closes #6969
* Handle invalid options in arc_summaryLOLi2017-12-194-6/+51
| | | | | | | | | | If an invalid option is provided to arc_summary.py we handle any error thrown from the getopt Python module and print the usage help message. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6983
* OpenZFS 8794 - cstyle generates warnings with recent perlDominik Hassler2017-12-191-18/+18
| | | | | | | | | | | | | | Authored by: Dominik Hassler <[email protected]> Reviewed by: Andy Fiddaman <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Reviewed by: Toomas Soome <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8794 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/578f67364c Closes #6973
* ZTS: Fix create-o_ashift test caseLOLi2017-12-192-34/+26
| | | | | | | | | | | | The function that fills the uberblock ring buffer on every device label has been reworked to avoid occasional failures caused by a race condition that prevents 'zpool sync' from writing some uberblock sequentially: this happens when the pool sync ioctl dispatch code calls txg_wait_synced() while we're already waiting for a TXG to sync. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6924 Closes #6977
* Fix multihost stale cache file importBrian Behlendorf2017-12-184-9/+26
| | | | | | | | | | | | | | | | | | | | | When the multihost property is enabled it should be impossible to import an active pool even using the force (-f) option. This patch prevents a forced import from succeeding when importing with a stale cache file. The root cause of the problem is that the kernel modules trusted the hostid provided in configuration. This is always correct when the configuration is generated by scanning for the pool. However, when using an existing cache file the hostid could be stale which would result in the activity check being skipped. Resolve the issue by always using the hostid read from the label configuration where the best uberblock was found. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6933 Closes #6971
* Honor --with-mounthelperdir where applicableLOLi2017-12-175-6/+8
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6962
* Fix --with-systemd on Debian-based distributions (#6963)LOLi2017-12-173-6/+21
| | | | | | | | | | These changes propagate the "--with-systemd" configure option to the RPM spec file, allowing Debian-based distributions to package systemd-related files. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6591 Closes #6963
* Remove lib/libspl/include/sys/frame.hBrian Behlendorf2017-12-172-132/+0
| | | | | | | | | | | The functionality provided by this header is not required by any of the ZFS user space code. Minimal functionality was provided in commit c28a677 which added include/sys/frame.h. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6960 Closes #6972
* Enable QAT support in zfs-dkms RPMDavid Qian2017-12-141-0/+6
| | | | | | | | | | | Enable QAT accelerated gzip compression in zfs-dkms RPM package when environment variant ICP_ROOT is set to QAT drive source code folder and QAT hardware presence. Otherwise, use default gzip compression. Reviewed-by: George Melikov <[email protected]> Reviewed-by: David Qian <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6932
* Add zfs-import.target services in spec fileLalufu2017-12-131-1/+1
| | | | | | | | | | | Add missing zfs-import.target to list of systemd services in zfs RPM spec file. Reviewed-by: Niklas Wagner <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ralf Ertzinger <[email protected]> Issue #6953 Closes #6955
* Various ZED fixesLOLi2017-12-0819-101/+368
| | | | | | | | | | | | | | | | | | | | | | | | | * Teach ZED to handle spares usingi the configured ashift: if the zpool 'ashift' property is set then ZED should use its value when kicking in a hotspare; with this change 512e disks can be used as spares for VDEVs that were created with ashift=9, even if ZFS natively detects them as 4K block devices. * Introduce an additional auto_spare test case which verifies that in the face of multiple device failures an appropiate number of spares are kicked in. * Fix zed_stop() in "libtest.shlib" which did not correctly wait the target pid. * Fix ZED crashing on startup caused by a race condition in libzfs when used in multi-threaded context. * Convert ZED over to using the tpool library which is already present in the Illumos FMA code. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #2562 Closes #6858