summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* OpenZFS 8677 - Open-Context Channel ProgramsSerapheim Dimitropoulos2018-02-0826-133/+422
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Authored by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Chris Williamson <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Don Brady <[email protected]> We want to be able to run channel programs outside of synching context. This would greatly improve performance for channel programs that just gather information, as they won't have to wait for synching context anymore. === What is implemented? This feature introduces the following: - A new command line flag in "zfs program" to specify our intention to run in open context. (The -n option) - A new flag/option within the channel program ioctl which selects the context. - Appropriate error handling whenever we try a channel program in open-context that contains zfs.sync* expressions. - Documentation for the new feature in the manual pages. === How do we handle zfs.sync functions in open context? When such a function is found by the interpreter and we are running in open context we abort the script and we spit out a descriptive runtime error. For example, given the script below ... arg = ... fs = arg["argv"][1] err = zfs.sync.destroy(fs) msg = "destroying " .. fs .. " err=" .. err return msg if we run it in open context, we will get back the following error: Channel program execution failed: [string "channel program"]:3: running functions from the zfs.sync submodule requires passing sync=TRUE to lzc_channel_program() (i.e. do not specify the "-n" command line argument) stack traceback: [C]: in function 'destroy' [string "channel program"]:3: in main chunk === What about testing? We've introduced new wrappers for all channel program tests that run each channel program as both (startard & open-context) and expect the appropriate behavior depending on the program using the zfs.sync module. OpenZFS-issue: https://www.illumos.org/issues/8677 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/17a49e15 Closes #6558
* OpenZFS 8604 - Simplify snapshots unmounting codeSerapheim Dimitropoulos2018-02-083-30/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Authored by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Andy Stormont <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Don Brady <[email protected]> Every time we want to unmount a snapshot (happens during snapshot deletion or renaming) we unnecessarily iterate through all the mountpoints in the VFS layer (see zfs_get_vfs). The current patch completely gets rid of that code and changes the approach while keeping the behavior of that code path the same. Specifically, it puts a hold on the dataset/snapshot and gets its vfs resource reference directly, instead of linearly searching for it. If that reference exists we attempt to amount it. With the above change, it became obvious that the nvlist manipulations that we do (add_boolean and add_nvlist) take a significant amount of time ensuring uniqueness of every new element even though they don't have too. Thus, we updated the patch so those nvlists are not trying to enforce the uniqueness of their elements. A more complete analysis of the problem solved by this patch can be found below: https://sdimitro.github.io/post/snap-unmount-perf/ OpenZFS-issue: https://www.illumos.org/issues/8604 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/126118fb
* Increase code coverage for Lua librariesDon Brady2018-02-0817-696/+1372
| | | | | | | Add test coverage for lua libraries Remove dead code in Lua implementation Signed-off-by: Don Brady <[email protected]>
* Add basic functional tests for zcp user propertiesDon Brady2018-02-083-4/+103
| | | | Signed-off-by: Don Brady <[email protected]>
* OpenZFS 8600 - ZFS channel programs - snapshotChris Williamson2018-02-0822-33/+500
| | | | | | | | | | | | | | | | | Authored by: Chris Williamson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Brad Lewis <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Don Brady <[email protected]> ZFS channel programs should be able to create snapshots. In addition to the base snapshot functionality, this entails extra logic to handle edge cases which were formerly not possible, such as creating then destroying a snapshot in the same transaction sync. OpenZFS-issue: https://www.illumos.org/issues/8600 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/68089b8b
* OpenZFS 8592 - ZFS channel programs - rollbackBrad Lewis2018-02-088-16/+177
| | | | | | | | | | | | | Authored by: Brad Lewis <[email protected]> Reviewed by: Chris Williamson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Don Brady <[email protected]> ZFS channel programs should be able to perform a rollback. OpenZFS-issue: https://www.illumos.org/issues/8592 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d46b5ed6
* OpenZFS 8605 - zfs channel programs fix zfs.existsChris Williamson2018-02-086-3/+88
| | | | | | | | | | | | | | | | | Authored by: Chris Williamson <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Don Brady <[email protected]> zfs.exists() in channel programs doesn't return any result, and should have a man page entry. This patch corrects zfs.exists so that it returns a value indicating if the dataset exists or not. It also adds documentation about it in the man page. OpenZFS-issue: https://www.illumos.org/issues/8605 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1e85e111
* OpenZFS 7431 - ZFS Channel ProgramsChris Williamson2018-02-08179-272/+27055
| | | | | | | | | | | | | | | | | | | | | | | | | Authored by: Chris Williamson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Approved by: Garrett D'Amore <[email protected]> Ported-by: Don Brady <[email protected]> Ported-by: John Kennedy <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/7431 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/dfc11533 Porting Notes: * The CLI long option arguments for '-t' and '-m' don't parse on linux * Switched from kmem_alloc to vmem_alloc in zcp_lua_alloc * Lua implementation is built as its own module (zlua.ko) * Lua headers consumed directly by zfs code moved to 'include/sys/lua/' * There is no native setjmp/longjump available in stock Linux kernel. Brought over implementations from illumos and FreeBSD * The get_temporary_prop() was adapted due to VFS platform differences * Use of inline functions in lua parser to reduce stack usage per C call * Skip some ZFS Test Suite ZCP tests on sparc64 to avoid stack overflow
* OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a ↵WHR2018-02-081-1/+1
| | | | | | | | | | | | | | | | use after end of the lifetime of a local variable Authored by: WHR <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Andriy Gapon <[email protected]> Reviewed by: George Melikov <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Richard Lowe <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8966 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c95549fcdc Closes #7141
* Only run pre-mount hook zfs-load-key on systemdMatthew Thode2018-02-071-0/+3
| | | | | | | | Reviewed-by: Kash Pande <[email protected]> Reviewed-by: bunder2015 <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Thode <[email protected]> Closes #7136 Closes #7140
* Verify ZFS Test Suite scripts executabilityLOLi2018-02-077-1/+9
| | | | | | | | | | | | | | | | This change adds a make target 'testscheck' which verifies all ksh test scripts, part of the ZFS Test Suite, have their executable bit set: this should avoid adding new test scripts that cannot be executed by the test-runner.py which would result in the following warning message: Warning: Test removed from TestGroup because it failed verification. This also verifies both *.cfg and *.kshlib files used in the ZFS Test Suite are not executable. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7131
* Remove deprecated zfs_arc_p_aggressive_disableRichard Elling2018-02-072-15/+0
| | | | | | | | | | zfs_arc_p_aggressive_disable is no more. This PR removes docs and module parameters for zfs_arc_p_aggressive_disable. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Richard Elling <[email protected]> Closes #7135
* Add zfs-load-key.sh to .gitignoreBrian Behlendorf2018-02-062-53/+2
| | | | | | | | | | | The generated zfs-load-key.sh file should have been added to the .gitignore file as part of commit 7da8f8d8. And the generated file should not be included in the repo. Reviewed-by: Matthew Thode <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7134
* Fix `zpool status` overflow on fast scrubsDeHackEd2018-02-061-1/+1
| | | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: DHE <[email protected]> Closes #7133
* Fix default libdir for Debian/UbuntuBrian Behlendorf2018-02-051-0/+13
| | | | | | | | | | | | | The distribution provided architecture specific RPM macro files for x86_64 and other architectures on Debian/Ubuntu specify the wrong default libdir install location. When building deb packages override _lib with the correct location. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7083 Closes #7101
* Adjust ARC prefetch tunables to match docsTom Caputi2018-02-052-6/+6
| | | | | | | | | | | | | | Currently, the ARC exposes 2 tunables (zfs_arc_min_prefetch_ms and zfs_arc_min_prescient_prefetch_ms) which are documented to be specified in milliseconds. However, the code actually uses the values as though they were in seconds. This patch adjusts the code to match the names and documentation of the tunables. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed by: George Melikov <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7126
* Set persistent ztest failure modeBrian Behlendorf2018-02-051-17/+19
| | | | | | | | | | | | | In order to reliably detect deadlocks in the create and import path ztest should set the failure mode property. This ensures that the pool is always using the correct failmode behavior. Removed insidious use of local variable in MAXFAULTS macro. Converted VERIFY() to VERIFY0() where appropriate. Reviewed-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7111
* Bug fix in qat_compress.c for vmalloc addr checkwli52018-02-051-4/+0
| | | | | | | | | Remove the unused vmalloc address check, and function mem_to_page will handle the non-vmalloc address when map it to a physical address. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Weigang Li <[email protected]> Closes #7125
* Fix hash_lock / keystore.sk_dk_lock lock inversionBrian Behlendorf2018-02-041-48/+39
| | | | | | | | | | | | | The keystore.sk_dk_lock should not be held while performing I/O. Drop the lock when reading from disk and update the code so they the first successful caller adds the key. Improve error handling in spa_keystore_create_mapping_impl(). Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: RageLtMan <rageltman@sempervictus> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7112 Closes #7115
* Fix systemd_ RPM macros usage on Debian-based distributionsLOLi2018-02-021-1/+20
| | | | | | | | | | | | | | | | | | | Debian-based distributions do not seem to provide RPM macros for dealing with systemd pre- and post- (un)install actions: this results in errors when installing or upgrading .deb packages because the resulting control scripts contain the following unresolved macros: * %systemd_post * %systemd_preun * %systemd_postun Fix this by providing default values for postinstall, preuninstall and postuninstall scripts when these macros are not defined. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7074 Closes #7100
* Change os->os_next_write_raw to work per txgTom Caputi2018-02-026-7/+8
| | | | | | | | | | | | | | | | | | | Currently, os_next_write_raw is a single boolean used for determining whether or not the next call to dmu_objset_sync() should write out the objset_phys_t as a raw buffer. Since the boolean is not associated with a txg, the work simply happens during the next txg, which is not necessarily the correct one. In the current implementation this issue was misdiagnosed, resulting in a small hack in dmu_objset_sync() which seemed to resolve the problem. This patch changes os_next_write_raw to be an array of booleans, one for each txg in TXG_OFF and removes the hack. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #6864
* Raw sends must be able to decrease nlevelsTom Caputi2018-02-029-16/+289
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when a raw zfs send file includes a DRR_OBJECT record that would decrease the number of levels of an existing object, the object is reallocated with dmu_object_reclaim() which creates the new dnode using the old object's nlevels. For non-raw sends this doesn't really matter, but raw sends require that nlevels on the receive side match that of the send side so that the checksum-of-MAC tree can be properly maintained. This patch corrects the issue by freeing the object completely before allocating it again in this case. This patch also corrects several issues with dnode_hold_impl() and related functions that prevented dnodes (particularly multi-slot dnodes) from being reallocated properly due to the fact that existing dnodes were not being fully cleaned up when they were freed. This patch adds a test to make sure that zfs recv functions properly with incremental streams containing dnodes of different sizes. Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6821 Closes #6864
* Fix recovery import (-F) with encrypted poolTom Caputi2018-02-021-0/+2
| | | | | | | | | | | | | | | | | | | | When performing zil_claim() at pool import time, it is important that encrypted datasets set os_next_write_raw before writing to the zil_header_t. This prevents the code from attempting to re-authenticate the objset_phys_t when it writes it out, which is unnecessary because the zil_header_t is not protected by either objset MAC and impossible since the keys aren't loaded yet. Unfortunately, one of the code paths did not set this flag, which causes failed ASSERTs during 'zpool import -F'. This patch corrects this issue. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #6864 Closes #6916
* Encryption Stability and On-Disk Format FixesTom Caputi2018-02-0231-187/+787
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The on-disk format for encrypted datasets protects not only the encrypted and authenticated blocks themselves, but also the order and interpretation of these blocks. In order to make this work while maintaining the ability to do raw sends, the indirect bps maintain a secure checksum of all the MACs in the block below it along with a few other fields that determine how the data is interpreted. Unfortunately, the current on-disk format erroneously includes some fields which are not portable and thus cannot support raw sends. It is not possible to easily work around this issue due to a separate and much smaller bug which causes indirect blocks for encrypted dnodes to not be compressed, which conflicts with the previous bug. In addition, the current code generates incompatible on-disk formats on big endian and little endian systems due to an issue with how block pointers are authenticated. Finally, raw send streams do not currently include dn_maxblkid when sending both the metadnode and normal dnodes which are needed in order to ensure that we are correctly maintaining the portable objset MAC. This patch zero's out the offending fields when computing the bp MAC and ensures that these MACs are always calculated in little endian order (regardless of the host system's byte order). This patch also registers an errata for the old on-disk format, which we detect by adding a "version" field to newly created DSL Crypto Keys. We allow datasets without a version (version 0) to only be mounted for read so that they can easily be migrated. We also now include dn_maxblkid in raw send streams to ensure the MAC can be maintained correctly. This patch also contains minor bug fixes and cleanups. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #6845 Closes #6864 Closes #7052
* tx_waited -> tx_dirty_delayed in trace_dmu.hDr. András Korn2018-01-311-5/+6
| | | | | | | | | | This change was missed in 0735ecb33485e91a78357a274e47c2782858d8b9. Reviewed-by: Fabian Grünbichler <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: András Korn <[email protected]> Closes #7096
* Change movaps to movups in AES-NI codeTom Caputi2018-01-313-48/+49
| | | | | | | | | | | | | | | | | | | | Currently, the ICP contains accelerated assembly code to be used specifically on CPUs with AES-NI enabled. This code makes heavy use of the movaps instruction which assumes that it will be provided aes keys that are 16 byte aligned. This assumption seems to hold on Illumos, but on Linux some kernel options such as 'slub_debug=P' will violate it. This patch changes all instances of this instruction to movups which is the same except that it can handle unaligned memory. This patch also adds a few flags which were accidentally never given to the assembly compiler, resulting in objtool warnings. Reviewed by: Gvozden Neskovic <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Nathaniel R. Lewis <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7065 Closes #7108
* Fix txg_sync_thread hang in scan_exec_io()Brian Behlendorf2018-01-311-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | When scn->scn_maxinflight_bytes has not been initialized it's possible to hang on the condition variable in scan_exec_io(). This issue was uncovered by ztest and is only possible when deduplication is enabled through the following call path. txg_sync_thread() spa_sync() ddt_sync_table() ddt_sync_entry() dsl_scan_ddt_entry() dsl_scan_scrub_cb() dsl_scan_enqueuei() scan_exec_io() cv_wait() Resolve the issue by always initializing scn_maxinflight_bytes to a reasonable minimum value. This value will be recalculated in dsl_scan_sync() to pick up changes to zfs_scan_vdev_limit and the addition/removal of vdevs. Reviewed-by: Tom Caputi <[email protected]> Reviewed by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7098
* remove pools without a bootfs from BOOTFS variableMatthew Thode2018-01-302-2/+2
| | | | | | | | Use the same method used in zfs-load-key. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Llewelyn Trahaearn <[email protected]> Signed-off-by: Matthew Thode <[email protected]> Closes #7089
* Fix 'zfs receive -o' when used with '-e|-d'LOLi2018-01-302-2/+39
| | | | | | | | | | | | | When used in conjunction with one of '-e' or '-d' zfs receive options none of the properties requested to be set (-o) are actually applied: this is caused by a wrong assumption made about the toplevel dataset in zfs_receive_one(). Fix this by correctly detecting the toplevel dataset. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7088
* Fix zpool-features(5) large_block inconsistencybunder20152018-01-291-2/+2
| | | | | | | | | | | | | Large_blocks feature activation was not consistent with man page, which erroneously stated that the feature was active when the recordsize was increased past the stock 128KB. It actually becomes active when data is written to the dataset. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: bunder2015 <[email protected]> Closes #6275 Closes #7093
* Fix style issues in man pages and commands helpLOLi2018-01-296-18/+17
| | | | | | | | | | | | | | * Remove 'zfs snap' from zfs help message (OpenZFS sync) * Update zfs(8) to suggest 'snap' can be used as an alias for 'snapshot' * Enforce 80 columns limit in help messages * Remove zfs_disable_dup_eviction from zfs-module-parameters(5) * Expose zfs_scan_max_ext_gap as a kernel module parameter. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7087
* Add dbuf hash and dbuf cache kstatsGiuseppe Di Natale2018-01-2912-43/+507
| | | | | | | | | | | | | | | | | | | | | | | Introduce kstats about the dbuf hash and dbuf cache to make it easier to inspect state. This should help with debugging and understanding of these portions of the codebase. Correct format of dbuf kstat file. Introduce a dbc column to dbufs kstat to indicate if a dbuf is in the dbuf cache. Introduce field filtering in the dbufstat python script. Introduce a no header option to the dbufstat python script. Introduce a test case to test basic mru->mfu list movement in the ARC. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #6906
* OpenZFS 8997 - ztest assertion failure in zil_lwb_write_issuePrakash Surya2018-01-265-62/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM ======= When `dmu_tx_assign` is called from `zil_lwb_write_issue`, it's possible for either `ERESTART` or `EIO` to be returned. If `ERESTART` is returned, this will cause an assertion to fail directly in `zil_lwb_write_issue`, where the code assumes the return value is `EIO` if `dmu_tx_assign` returns a non-zero value. This can occur if the SPA is suspended when `dmu_tx_assign` is called, and most often occurs when running `zloop`. If `EIO` is returned, this can cause assertions to fail elsewhere in the ZIL code. For example, `zil_commit_waiter_timeout` contains the following logic: lwb_t *nlwb = zil_lwb_write_issue(zilog, lwb); ASSERT3S(lwb->lwb_state, !=, LWB_STATE_OPENED); In this case, if `dmu_tx_assign` returned `EIO` from within `zil_lwb_write_issue`, the `lwb` variable passed in will not be issued to disk. Thus, it's `lwb_state` field will remain `LWB_STATE_OPENED` and this assertion will fail. `zil_commit_waiter_timeout` assumes that after it calls `zil_lwb_write_issue`, the `lwb` will be issued to disk, and doesn't handle the case where this is not true; i.e. it doesn't handle the case where `dmu_tx_assign` returns `EIO`. SOLUTION ======== This change modifies the `dmu_tx_assign` function such that `txg_how` is a bitmask, rather than of the `txg_how_t` enum type. Now, the previous `TXG_WAITED` semantics can be used via `TXG_NOTHROTTLE`, along with specifying either `TXG_NOWAIT` or `TXG_WAIT` semantics. Previously, when `TXG_WAITED` was specified, `TXG_NOWAIT` semantics was automatically invoked. This was not ideal when using `TXG_WAITED` within `zil_lwb_write_issued`, leading the problem described above. Rather, we want to achieve the semantics of `TXG_WAIT`, while also preventing the `tx` from being penalized via the dirty delay throttling. With this change, `zil_lwb_write_issued` can acheive the semtantics that it requires by passing in the value `TXG_WAIT | TXG_NOTHROTTLE` to `dmu_tx_assign`. Further, consumers of `dmu_tx_assign` wishing to achieve the old `TXG_WAITED` semantics can pass in the value `TXG_NOWAIT | TXG_NOTHROTTLE`. Authored by: Prakash Surya <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Andriy Gapon <[email protected]> Ported-by: Brian Behlendorf <[email protected]> Porting Notes: - Additionally updated `zfs_tmpfile` to use `TXG_NOTHROTTLE` OpenZFS-issue: https://www.illumos.org/issues/8997 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/19ea6cb0f9 Closes #7084
* zpool import -d to specify device pathChunwei Chen2018-01-266-48/+205
| | | | | | | | | | | | | | | | | When we know which devices have the pool we are looking for, sometime it's better if we can directly pass those device paths to zpool import instead of letting it to search through all unrelated stuff, which might take a lot of time if you have hundreds of disks. This patch allows option -d <dev_path> to zpool import. You can have multiple pairs of -d <dev_path>, and zpool import will only search through those devices. For example: zpool import -d /dev/sda -d /dev/sdb Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #7077
* Update README.initramfs.markdownBrian Behlendorf2018-01-261-5/+5
| | | | | | | | | Fix several typos and grammar. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed by: George Melikov <[email protected]> Signed-off-by: Arno van Wyk <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7080
* Extend zloop.sh for automated testingBrian Behlendorf2018-01-251-8/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | In order to debug issues encountered by ztest during automated testing it's important that as much debugging information as possible by dumped at the time of the failure. The following changes extend the zloop.sh script in order to make it easier to integrate with buildbot. * Add the `-m <maximum cores>` option to zloop.sh to place a limit of the number of core dumps generated. By default, the existing behavior is maintained and no limit is set. * Add the `-l` option to create a 'ztest.core.N' symlink in the current directory to the core directory. This functionality is provided primarily for buildbot which expects log files to have well known names. * Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump additional basic information on failure for latter analysis. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6999
* Prevent zdb(8) from occasionally hanging on I/OBrian Behlendorf2018-01-251-0/+10
| | | | | | | | | | | | | | | | | The zdb(8) command may not terminate in the case where the pool gets suspended and there is a caller in zio_wait() blocking on an outstanding read I/O that will never complete. This can in turn cause ztest(1) to block indefinitely despite the deadman. Resolve the issue by setting the default failure mode for zdb(8) to panic. In user space we always want the command to terminate when forward progress is no longer possible. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6999
* Extend deadman logicBrian Behlendorf2018-01-2521-57/+582
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The intent of this patch is extend the existing deadman code such that it's flexible enough to be used by both ztest and on production systems. The proposed changes include: * Added a new `zfs_deadman_failmode` module option which is used to dynamically control the behavior of the deadman. It's loosely modeled after, but independant from, the pool failmode property. It can be set to wait, continue, or panic. * wait - Wait for the "hung" I/O (default) * continue - Attempt to recover from a "hung" I/O * panic - Panic the system * Added a new `zfs_deadman_ziotime_ms` module option which is analogous to `zfs_deadman_synctime_ms` except instead of applying to a pool TXG sync it applies to zio_wait(). A default value of 300s is used to define a "hung" zio. * The ztest deadman thread has been re-enabled by default, aligned with the upstream OpenZFS code, and then extended to terminate the process when it takes significantly longer to complete than expected. * The -G option was added to ztest to print the internal debug log when a fatal error is encountered. This same option was previously added to zdb in commit fa603f82. Update zloop.sh to unconditionally pass -G to obtain additional debugging. * The FM_EREPORT_ZFS_DELAY event which was previously posted when the deadman detect a "hung" pool has been replaced by a new dedicated FM_EREPORT_ZFS_DEADMAN event. * The proposed recovery logic attempts to restart a "hung" zio by calling zio_interrupt() on any outstanding leaf zios. We may want to further restrict this to zios in either the ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages. Calling zio_interrupt() is expected to only be useful for cases when an IO has been submitted to the physical device but for some reasonable the completion callback hasn't been called by the lower layers. This shouldn't be possible but has been observed and may be caused by kernel/driver bugs. * The 'zfs_deadman_synctime_ms' default value was reduced from 1000s to 600s. * Depending on how ztest fails there may be no cache file to move. This should not be considered fatal, collect the logs which are available and carry on. * Add deadman test cases for spa_deadman() and zio_wait(). * Increase default zfs_deadman_checktime_ms to 60s. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6999
* OpenZFS 8731 - ASSERT3U(nui64s, <=, UINT16_MAX) fails for large blocksAndriy Gapon2018-01-251-6/+6
| | | | | | | | | | | | | | Authored by: Andriy Gapon <[email protected]> Approved by: Dan McDonald <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8731 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4c08500788 Closes #7079
* Revert "Remove wrong ASSERT in annotate_ecksum"Giuseppe Di Natale2018-01-251-2/+2
| | | | | | | | | | This reverts commit 093911f1945b5c164a45bb077103283dafdcae0c. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #7079
* OpenZFS 8972 - zfs holds: In scripted mode, do not pad columns with spacesAllan Jude2018-01-191-4/+7
| | | | | | | | | | | | Authored by: Allan Jude <[email protected]> Approved by: Dan McDonald <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Melikov <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8972 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3aace5c077 Closes #7063
* OpenZFS 8835 - Speculative prefetch in ZFS not working for misaligned readsAlexander Motin2018-01-191-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | In case of misaligned I/O sequential requests are not detected as such due to overlaps in logical block sequence: dmu_zfetch(fffff80198dd0ae0, 27347, 9, 1) dmu_zfetch(fffff80198dd0ae0, 27355, 9, 1) dmu_zfetch(fffff80198dd0ae0, 27363, 9, 1) dmu_zfetch(fffff80198dd0ae0, 27371, 9, 1) dmu_zfetch(fffff80198dd0ae0, 27379, 9, 1) dmu_zfetch(fffff80198dd0ae0, 27387, 9, 1) This patch makes single block overlap to be counted as a stream hit, improving performance up to several times. Authored by: Alexander Motin <[email protected]> Approved by: Gordon Ross <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Allan Jude <[email protected]> Reviewed by: Gvozden Neskovic <[email protected]> Reviewed by: George Melikov <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8835 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/aab6dd482a Closes #7062
* OpenZFS 8652 - Tautological comparisons with ZPROP_INVALBrian Behlendorf2018-01-194-16/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | usr/src/uts/common/sys/fs/zfs.h Change ZPROP_INVAL and ZPROP_CONT from macros to enum values. Clang and GCC both prefer to use unsigned ints to store enums. That was causing tautological comparison warnings (and likely eliminating error handling code at compile time) whenever a zfs_prop_t or zpool_prop_t was compared to ZPROP_INVAL or ZPROP_CONT. Making the error flags be explicity enum values forces the enum types to be signed. ZPROP_INVAL was also compared against two different enum types. I had to change its name to ZPOOL_PROP_INVAL whenever its compared to a zpool_prop_t. There are still some places where ZPROP_INVAL or ZPROP_CONT is compared to a plain int, in code that doesn't know whether the int is storing a zfs_prop_t or a zpool_prop_t. usr/src/uts/common/fs/zfs/spa.c s/ZPROP_INVAL/ZPOOL_PROP_INVAL/ Authored by: Alan Somers <[email protected]> Approved by: Gordon Ross <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Reviewed by: George Melikov <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8652 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c2de80dc74 Closes #7061
* OpenZFS 8641 - "zpool clear" and "zinject" don't work on "spare" or ↵Brian Behlendorf2018-01-191-5/+6
| | | | | | | | | | | | | | | | | | "replacing" vdevs Add "spare" and "replacing" to the list of interior vdev types in zpool_vdev_is_interior(), alongside the existing "mirror" and "raidz". This fixes running "zinject -d" and "zpool clear" on spare and replacing vdevs. Authored by: Alan Somers <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Melikov <[email protected]> Approved by: Gordon Ross <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8641 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/9a36801382 Closes #7060
* Run zfs load-key if needed in dracutMatthew Thode2018-01-188-11/+92
| | | | | | | | | | | | | | | | 'zfs load-key -a' will only be called if needed. If a dataset not needed for boot does not have its key loaded (home directories for example) boot can still continue. zfs:AUTO was not working via dracut, so we still need the generator script to do its thing. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Manuel Amador (Rudd-O) <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Matthew Thode <[email protected]> Closes #6982 Closes #7004
* Fix Debian packaging on ARMv7/ARM64LOLi2018-01-181-3/+6
| | | | | | | | | | | | | | | When building packages on Debian-based systems specify the target architecture used by 'alien' to convert .rpm packages into .deb: this avoids detecting an incorrect value which results in the following errors: <package>.aarch64.rpm is for architecture aarch64 ; the package cannot be built on this system <package>.armv7l.rpm is for architecture armel ; the package cannot be built on this system Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7046 Closes #7058
* Emit an error message before MMP suspends poolJohn L. Hammond2018-01-171-0/+5
| | | | | | | | | | | In mmp_thread(), emit an MMP specific error message before calling zio_suspend() so that the administrator will understand why the pool is being suspended. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: John L. Hammond <[email protected]> Closes #7048
* OpenZFS 8959 - Add notifications when a scrub is paused or resumedSean Eric Fagan2018-01-173-2/+36
| | | | | | | | | | | | | | | | | | | Authored by: Sean Eric Fagan <[email protected]> Reviewed by: Alek Pinchuk <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Approved by: Gordon Ross <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> Porting Notes: - Brought #defines in eventdefs.h in line with ZFS on Linux format. - Updated zfs-events.5 with the new events. OpenZFS-issue: https://www.illumos.org/issues/8959 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c862b93eea Closes #7049
* Fix shellcheck v0.4.6 warningsBrian Behlendorf2018-01-176-17/+20
| | | | | | | | | | | | | Resolve new warnings reported after upgrading to shellcheck version 0.4.6. This patch contains no functional changes. * egrep is non-standard and deprecated. Use grep -E instead. [SC2196] * Check exit code directly with e.g. 'if mycmd;', not indirectly with $?. [SC2181] Suppressed. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7040
* Remove l2arc_nocompress from zfs-module-parameters(5)DeHackEd2018-01-161-11/+0
| | | | | | | | Parameter was removed in d3c2ae1c0806 (OpenZFS 6950 - ARC should cache compressed data) Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: DHE <[email protected]> Closes #7043