openzfs/zfs.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix config issues: frame size and headers	chrisrd	2018-02-15	11	-11/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. With various (debug and/or tracing?) kernel options enabled it's possible for 'struct inode' and 'struct super_block' to exceed the default frame size, leaving errors like this in config.log: build/conftest.c:116:1: error: the frame size of 1048 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Fix this by removing the frame size warning for config checks 2. Without the correct headers included, it's possible for declarations to be missed, leaving errors like this in the config.log: build/conftest.c:131:14: error: ‘struct nameidata’ declared inside parameter list [-Werror] Fix this by adding appropriate headers. Note: Both these issues can result in silent config failures because the compile failure is taken to mean "this option is not supported by this kernel" rather than "there's something wrong with the config test". This can lead to something merely annoying (compile failures) to something potentially serious (miscompiled or misused kernel primitives or functions). E.g. the fixes included here resulted in these additional defines in zfs_config.h with linux v4.14.19: Also, drive-by whitespace fixes in config/* files which don't mention "GNU" (those ones look to be imported from elsewhere so leave them alone). Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlop <[email protected]> Closes #7169
*	Address objtool check failures in lua module	Don Brady	2018-02-15	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	The use of void __attribute__((noreturn)) in kernel builds was causing lots of warnings if CONFIG_STACK_VALIDATION is active. For now we just remove this attribute to achieve clean builds for the Lua module. There was no significant increase in the time to run the full channel_program ZTS tests. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #7173
*	Clarify zinject(8) explanation of -e	Olaf Faaland	2018-02-15	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \|	Error injection of EIO or ENXIO simply sets the zio's io_error value, rather than preventing the read or write from occurring. This is important information as it affects how the probes must be used. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7172
*	OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio	George Wilson	2018-02-14	2	-20/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PROBLEM ======= It's possible for a parent zio to complete even though it has children which have not completed. This can result in the following panic: > $C ffffff01809128c0 vpanic() ffffff01809128e0 mutex_panic+0x58(fffffffffb94c904, ffffff597dde7f80) ffffff0180912950 mutex_vector_enter+0x347(ffffff597dde7f80) ffffff01809129b0 zio_remove_child+0x50(ffffff597dde7c58, ffffff32bd901ac0, ffffff3373370908) ffffff0180912a40 zio_done+0x390(ffffff32bd901ac0) ffffff0180912a70 zio_execute+0x78(ffffff32bd901ac0) ffffff0180912b30 taskq_thread+0x2d0(ffffff33bae44140) ffffff0180912b40 thread_start+8() > ::status debugging crash dump vmcore.2 (64-bit) from batfs0390 operating system: 5.11 joyent_20170911T171900Z (i86pc) image uuid: (not set) panic message: mutex_enter: bad mutex, lp=ffffff597dde7f80 owner=ffffff3c59b39480 thread=ffffff0180912c40 dump content: kernel pages only The problem is that dbuf_prefetch along with l2arc can create a zio tree which confuses the parent zio and allows it to complete with while children still exist. Here's the scenario: zio tree: pio \|--- lio The parent zio, pio, has entered the zio_done stage and begins to check its children to see there are still some that have not completed. In zio_done(), the children are checked in the following order: zio_wait_for_children(zio, ZIO_CHILD_VDEV, ZIO_WAIT_DONE) zio_wait_for_children(zio, ZIO_CHILD_GANG, ZIO_WAIT_DONE) zio_wait_for_children(zio, ZIO_CHILD_DDT, ZIO_WAIT_DONE) zio_wait_for_children(zio, ZIO_CHILD_LOGICAL, ZIO_WAIT_DONE) If pio, finds any child which has not completed then it stops executing and goes to sleep. Each call to zio_wait_for_children() will grab the io_lock while checking the particular child. In this scenario, the pio has completed the first call to zio_wait_for_children() to check for any ZIO_CHILD_VDEV children. Since the only zio in the zio tree right now is the logical zio, lio, then it completes that call and prepares to check the next child type. In the meantime, the lio completes and in its callback creates a child vdev zio, cio. The zio tree looks like this: zio tree: pio \|--- lio \|--- cio The lio then grabs the parent's io_lock and removes itself. zio tree: pio \|--- cio The pio continues to run but has already completed its check for ZIO_CHILD_VDEV and will erroneously complete. When the child zio, cio, completes it will panic the system trying to reference the parent zio which has been destroyed. SOLUTION ======== The fix is to rework the zio_wait_for_children() logic to accept a bitfield for all the children types that it's interested in checking. The io_lock will is held the entire time we check all the children types. Since the function now accepts a bitfield, a simple ZIO_CHILD_BIT() macro is provided to allow for the conversion between a ZIO_CHILD type and the bitfield used by the zio_wiat_for_children logic. Authored by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Andriy Gapon <[email protected]> Reviewed by: Youzhong Yang <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8857 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/862ff6d99c Issue #5918 Closes #7168
*	OpenZFS 8940 - Sending an intra-pool resumable send stream may result in EXDEV	loli10K	2018-02-14	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because resuming from a token requires "guid" -> "snapshot" mapping we have to walk the whole dataset hierarchy to find the right snapshot to send; when both source and destination exists, for an incremental resumable stream, libzfs gets confused and picks up the wrong snapshot to send from: this results in attempting to send "destination@snap1 -> source@snap2" instead of "source@snap1 -> source@snap2" which fails with a "Invalid cross-device link" error (EXDEV). Fix this by adjusting the logic behind dataset traversal in zfs_iter_children() to pick the right snapshot to send from. Additionally update dry-run 'zfs send -t' to print its output to stderr: this is consistent with other dry-run commands. Patch Notes: Reconciled differences between OpenZFS and aee1dd4d983c64db3c3155290d48f05243e85709. Authored by: loli10K <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Hans Rosenfeld <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8940 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/9f7867c206 Closes #7171
*	Project Quota on ZFS	Nasf-Fan	2018-02-13	82	-280/+4519
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Project quota is a new ZFS system space/object usage accounting and enforcement mechanism. Similar as user/group quota, project quota is another dimension of system quota. It bases on the new object attribute - project ID. Project ID is a numerical value to indicate to which project an object belongs. An object only can belong to one project though you (the object owner or privileged user) can change the object project ID via 'chattr -p' or 'zfs project [-s] -p' explicitly. The object also can inherit the project ID from its parent when created if the parent has the project inherit flag (that can be set via 'chattr +P' or 'zfs project -s [-p]'). By accounting the spaces/objects belong to the same project, we can know how many spaces/objects used by the project. And if we set the upper limit then we can control the spaces/objects that are consumed by such project. It is useful when multiple groups and users cooperate for the same project, or a user/group needs to participate in multiple projects. Support the following commands and functionalities: zfs set projectquota@project zfs set projectobjquota@project zfs get projectquota@project zfs get projectobjquota@project zfs get projectused@project zfs get projectobjused@project zfs projectspace zfs allow projectquota zfs allow projectobjquota zfs allow projectused zfs allow projectobjused zfs unallow projectquota zfs unallow projectobjquota zfs unallow projectused zfs unallow projectobjused chattr +/-P chattr -p project_id lsattr -p This patch also supports tree quota based on the project quota via "zfs project" commands set as following: zfs project [-d\|-r] <file\|directory ...> zfs project -C [-k] [-r] <file\|directory ...> zfs project -c [-0] [-d\|-r] [-p id] <file\|directory ...> zfs project [-p id] [-r] [-s] <file\|directory ...> For "df [-i] $DIR" command, if we set INHERIT (project ID) flag on the $DIR, then the proejct [obj]quota and [obj]used values for the $DIR's project ID will be shown as the total/free (avail) resource. Keep the same behavior as EXT4/XFS does. Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by Ned Bass <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fan Yong <[email protected]> TEST_ZIMPORT_POOLS="zol-0.6.1 zol-0.6.2 master" Change-Id: Ib4f0544602e03fb61fd46a849d7ba51a6005693c Closes #6290
*	'zfs receive' fails with "dataset is busy"	LOLi	2018-02-12	2	-2/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Receiving an incremental stream after an interrupted "zfs receive -s" fails with the message "dataset is busy": this is because we still have the hidden clone ../%recv from the resumable receive. Improve the error message suggesting the existence of a partially complete resumable stream from "zfs receive -s" which can be either aborted ("zfs receive -A") or resumed ("zfs send -t"). Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7129 Closes #7154
*	contrib/initramfs: add missing conf.d/zfs	LOLi	2018-02-12	2	-2/+12
\| \| \| \| \| \| \| \| \| \|	When upgrading from the distribution-provided zfs-initramfs package on root-on-zfs Ubuntu and Debian the system may fail to boot: this change adds the missing initramfs configuration file. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7158
*	mmp should use a fixed tag for spa_config locks	sanjeevbagewadi	2018-02-12	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \|	mmp_write_uberblock() and mmp_write_done() should the same tag for spa_config_locks. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Sanjeev Bagewadi <[email protected]> Closes #6530 Closes #7155
*	OpenZFS 9004 - Some ZFS tests used files removed with 32 bit kernel	John Wren Kennedy	2018-02-09	2	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: John Wren Kennedy <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Reviewed by: George Melikov <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9004 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/fafe9b241f Closes #7149
*	OpenZFS 8520 - lzc_rollback	Andriy Gapon	2018-02-09	3	-19/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	8520 lzc_rollback_to should support rolling back to origin 7198 libzfs should gracefully handle EINVAL from lzc_rollback lzc_rollback_to() should support rolling back to a clone's origin. The current checks in zfs_ioc_rollback() would not allow that because the origin snapshot belongs to a different filesystem. The overly restrictive check was in introduced in 7600, but it was not a regression as none of the existing tools provided a way to rollback to the origin. Authored by: Andriy Gapon <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8520 OpenZFS-issue: https://www.illumos.org/issues/7198 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/78a5a1a25a Closes #7150
*	Handle zap_add() failures in mixed case mode	sanjeevbagewadi	2018-02-09	9	-32/+290
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	With "casesensitivity=mixed", zap_add() could fail when the number of files/directories with the same name (varying in case) exceed the capacity of the leaf node of a Fatzap. This results in a ASSERT() failure as zfs_link_create() does not expect zap_add() to fail. The fix is to handle these failures and rollback the transactions. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Sanjeev Bagewadi <[email protected]> Closes #7011 Closes #7054
*	Fix zdb -ed on objset for exported pool	Chunwei Chen	2018-02-09	5	-15/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	zdb -ed on objset for exported pool would failed with: failed to own dataset 'qq/fs0': No such file or directory The reason is that zdb pass objset name to spa_import, it uses that name to create a spa. Later, when dmu_objset_own tries to lookup the spa using real pool name, it can't find one. We fix this by make sure we pass pool name rather than objset name to spa_import. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #7099 Closes #6464
*	Fix zdb -E segfault	Chunwei Chen	2018-02-09	1	-1/+4
\| \| \| \| \| \| \| \| \|	SPA_MAXBLOCKSIZE is too large for stack. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #7099
*	Fix zdb -R decompression	Chunwei Chen	2018-02-09	2	-21/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are some issues in the zdb -R decompression implementation. The first is that ZLE can easily decompress non-ZLE streams. So we add ZDB_NO_ZLE env to make zdb skip ZLE. The second is the random bytes appended to pabd, pbuf2 stuff. This serve no purpose at all, those bytes shouldn't be read during decompression anyway. Instead, we randomize lbuf2, so that we can make sure decompression fill exactly to lsize by bcmp lbuf and lbuf2. The last one is the condition to detect fail is wrong. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #7099 Closes #4984
*	Fix racy assignment of zcb.zcb_haderrors	Chunwei Chen	2018-02-09	1	-2/+8
\| \| \| \| \| \| \| \| \| \|	zcb_haderrors will be modified in zdb_blkptr_done, which is asynchronous. So we must move this assignment after zio_wait. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #7099
*	Fix zle_decompress out of bound access	Chunwei Chen	2018-02-09	1	-0/+4
\| \| \| \| \| \| \|	Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #7099
*	Fix zdb -c traverse stop on damaged objset root	Chunwei Chen	2018-02-09	1	-6/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a corruption happens to be on a root block of an objset, zdb -c will not correctly report the error, and it will not traverse the datasets that come after. This is because traverse_visitbp, which does the callback and reset error for TRAVERSE_HARD, is skipped when traversing zil is failed in traverse_impl. Here's example of what 'zdb -eLcc' command looks like on a pool with damaged objset root: == before patch: Traversing all blocks to verify checksums ... Error counts: errno count block traversal size 379392 != alloc 33987072 (unreachable 33607680) bp count: 172 ganged count: 0 bp logical: 1678336 avg: 9757 bp physical: 130560 avg: 759 compression: 12.85 bp allocated: 379392 avg: 2205 compression: 4.42 bp deduped: 0 ref>1: 0 deduplication: 1.00 SPA allocated: 33987072 used: 0.80% additional, non-pointer bps of type 0: 71 Dittoed blocks on same vdev: 101 == after patch: Traversing all blocks to verify checksums ... zdb_blkptr_cb: Got error 52 reading <54, 0, -1, 0> -- skipping Error counts: errno count 52 1 block traversal size 33963520 != alloc 33987072 (unreachable 23552) bp count: 447 ganged count: 0 bp logical: 36093440 avg: 80745 bp physical: 33699840 avg: 75391 compression: 1.07 bp allocated: 33963520 avg: 75981 compression: 1.06 bp deduped: 0 ref>1: 0 deduplication: 1.00 SPA allocated: 33987072 used: 0.80% additional, non-pointer bps of type 0: 76 Dittoed blocks on same vdev: 115 == Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #7099
*	OpenZFS 8965 - zfs_acl_ls_001_pos fails due to no longer supported grep regex	John Wren Kennedy	2018-02-08	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The test used \> to detect the end of a string, but this no longer works, so use $ which works as well since the string ends the line anyway. Authored by: John Wren Kennedy <[email protected]> Reviewed by: Akash Ayare <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Reviewed by: Yuri Pankov <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8965 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/cb1204e444 Closes #7145
*	Linux 4.11 compat: avoid refcount_t name conflict	Brian Behlendorf	2018-02-08	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Related to commit 4859fe796, when directly using the kernel's refcount functions in kernel compatibility code do not map refcount_t to zfs_refcount_t. This leads to a type mismatch. Longer term we should consider renaming refcount_t to zfs_refcount_t in the zfs code base. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7148
*	Linux 4.16 compat: inode_set_iversion()	Brian Behlendorf	2018-02-08	5	-5/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A new interface was added to manipulate the version field of an inode. Add a inode_set_iversion() wrapper for older kernels and use the new interface when available. The i_version field was dropped from the trace point due to the switch to an atomic64_t i_version type. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7148
*	OpenZFS 8677 - Open-Context Channel Programs	Serapheim Dimitropoulos	2018-02-08	26	-133/+422
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Chris Williamson <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Don Brady <[email protected]> We want to be able to run channel programs outside of synching context. This would greatly improve performance for channel programs that just gather information, as they won't have to wait for synching context anymore. === What is implemented? This feature introduces the following: - A new command line flag in "zfs program" to specify our intention to run in open context. (The -n option) - A new flag/option within the channel program ioctl which selects the context. - Appropriate error handling whenever we try a channel program in open-context that contains zfs.sync* expressions. - Documentation for the new feature in the manual pages. === How do we handle zfs.sync functions in open context? When such a function is found by the interpreter and we are running in open context we abort the script and we spit out a descriptive runtime error. For example, given the script below ... arg = ... fs = arg["argv"][1] err = zfs.sync.destroy(fs) msg = "destroying " .. fs .. " err=" .. err return msg if we run it in open context, we will get back the following error: Channel program execution failed: [string "channel program"]:3: running functions from the zfs.sync submodule requires passing sync=TRUE to lzc_channel_program() (i.e. do not specify the "-n" command line argument) stack traceback: [C]: in function 'destroy' [string "channel program"]:3: in main chunk === What about testing? We've introduced new wrappers for all channel program tests that run each channel program as both (startard & open-context) and expect the appropriate behavior depending on the program using the zfs.sync module. OpenZFS-issue: https://www.illumos.org/issues/8677 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/17a49e15 Closes #6558
*	OpenZFS 8604 - Simplify snapshots unmounting code	Serapheim Dimitropoulos	2018-02-08	3	-30/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Andy Stormont <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Don Brady <[email protected]> Every time we want to unmount a snapshot (happens during snapshot deletion or renaming) we unnecessarily iterate through all the mountpoints in the VFS layer (see zfs_get_vfs). The current patch completely gets rid of that code and changes the approach while keeping the behavior of that code path the same. Specifically, it puts a hold on the dataset/snapshot and gets its vfs resource reference directly, instead of linearly searching for it. If that reference exists we attempt to amount it. With the above change, it became obvious that the nvlist manipulations that we do (add_boolean and add_nvlist) take a significant amount of time ensuring uniqueness of every new element even though they don't have too. Thus, we updated the patch so those nvlists are not trying to enforce the uniqueness of their elements. A more complete analysis of the problem solved by this patch can be found below: https://sdimitro.github.io/post/snap-unmount-perf/ OpenZFS-issue: https://www.illumos.org/issues/8604 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/126118fb
*	Increase code coverage for Lua libraries	Don Brady	2018-02-08	17	-696/+1372
\| \| \| \| \| \| \|	Add test coverage for lua libraries Remove dead code in Lua implementation Signed-off-by: Don Brady <[email protected]>
*	Add basic functional tests for zcp user properties	Don Brady	2018-02-08	3	-4/+103
\| \| \| \|	Signed-off-by: Don Brady <[email protected]>
*	OpenZFS 8600 - ZFS channel programs - snapshot	Chris Williamson	2018-02-08	22	-33/+500
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Chris Williamson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Brad Lewis <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Don Brady <[email protected]> ZFS channel programs should be able to create snapshots. In addition to the base snapshot functionality, this entails extra logic to handle edge cases which were formerly not possible, such as creating then destroying a snapshot in the same transaction sync. OpenZFS-issue: https://www.illumos.org/issues/8600 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/68089b8b
*	OpenZFS 8592 - ZFS channel programs - rollback	Brad Lewis	2018-02-08	8	-16/+177
\| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Brad Lewis <[email protected]> Reviewed by: Chris Williamson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Don Brady <[email protected]> ZFS channel programs should be able to perform a rollback. OpenZFS-issue: https://www.illumos.org/issues/8592 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d46b5ed6
*	OpenZFS 8605 - zfs channel programs fix zfs.exists	Chris Williamson	2018-02-08	6	-3/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Chris Williamson <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Don Brady <[email protected]> zfs.exists() in channel programs doesn't return any result, and should have a man page entry. This patch corrects zfs.exists so that it returns a value indicating if the dataset exists or not. It also adds documentation about it in the man page. OpenZFS-issue: https://www.illumos.org/issues/8605 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1e85e111
*	OpenZFS 7431 - ZFS Channel Programs	Chris Williamson	2018-02-08	179	-272/+27055
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authored by: Chris Williamson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Approved by: Garrett D'Amore <[email protected]> Ported-by: Don Brady <[email protected]> Ported-by: John Kennedy <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/7431 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/dfc11533 Porting Notes: * The CLI long option arguments for '-t' and '-m' don't parse on linux * Switched from kmem_alloc to vmem_alloc in zcp_lua_alloc * Lua implementation is built as its own module (zlua.ko) * Lua headers consumed directly by zfs code moved to 'include/sys/lua/' * There is no native setjmp/longjump available in stock Linux kernel. Brought over implementations from illumos and FreeBSD * The get_temporary_prop() was adapted due to VFS platform differences * Use of inline functions in lua parser to reduce stack usage per C call * Skip some ZFS Test Suite ZCP tests on sparc64 to avoid stack overflow
*	OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a ↵	WHR	2018-02-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	use after end of the lifetime of a local variable Authored by: WHR <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Andriy Gapon <[email protected]> Reviewed by: George Melikov <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Richard Lowe <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8966 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c95549fcdc Closes #7141
*	Only run pre-mount hook zfs-load-key on systemd	Matthew Thode	2018-02-07	1	-0/+3
\| \| \| \| \| \| \| \|	Reviewed-by: Kash Pande <[email protected]> Reviewed-by: bunder2015 <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Thode <[email protected]> Closes #7136 Closes #7140
*	Verify ZFS Test Suite scripts executability	LOLi	2018-02-07	7	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change adds a make target 'testscheck' which verifies all ksh test scripts, part of the ZFS Test Suite, have their executable bit set: this should avoid adding new test scripts that cannot be executed by the test-runner.py which would result in the following warning message: Warning: Test removed from TestGroup because it failed verification. This also verifies both .cfg and .kshlib files used in the ZFS Test Suite are not executable. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7131
*	Remove deprecated zfs_arc_p_aggressive_disable	Richard Elling	2018-02-07	2	-15/+0
\| \| \| \| \| \| \| \| \| \|	zfs_arc_p_aggressive_disable is no more. This PR removes docs and module parameters for zfs_arc_p_aggressive_disable. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Richard Elling <[email protected]> Closes #7135
*	Add zfs-load-key.sh to .gitignore	Brian Behlendorf	2018-02-06	2	-53/+2
\| \| \| \| \| \| \| \| \| \| \|	The generated zfs-load-key.sh file should have been added to the .gitignore file as part of commit 7da8f8d8. And the generated file should not be included in the repo. Reviewed-by: Matthew Thode <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7134
*	Fix `zpool status` overflow on fast scrubs	DeHackEd	2018-02-06	1	-1/+1
\| \| \| \| \| \| \| \|	Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: DHE <[email protected]> Closes #7133
*	Fix default libdir for Debian/Ubuntu	Brian Behlendorf	2018-02-05	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	The distribution provided architecture specific RPM macro files for x86_64 and other architectures on Debian/Ubuntu specify the wrong default libdir install location. When building deb packages override _lib with the correct location. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7083 Closes #7101
*	Adjust ARC prefetch tunables to match docs	Tom Caputi	2018-02-05	2	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the ARC exposes 2 tunables (zfs_arc_min_prefetch_ms and zfs_arc_min_prescient_prefetch_ms) which are documented to be specified in milliseconds. However, the code actually uses the values as though they were in seconds. This patch adjusts the code to match the names and documentation of the tunables. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed by: George Melikov <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7126
*	Set persistent ztest failure mode	Brian Behlendorf	2018-02-05	1	-17/+19
\| \| \| \| \| \| \| \| \| \| \| \| \|	In order to reliably detect deadlocks in the create and import path ztest should set the failure mode property. This ensures that the pool is always using the correct failmode behavior. Removed insidious use of local variable in MAXFAULTS macro. Converted VERIFY() to VERIFY0() where appropriate. Reviewed-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7111
*	Bug fix in qat_compress.c for vmalloc addr check	wli5	2018-02-05	1	-4/+0
\| \| \| \| \| \| \| \| \|	Remove the unused vmalloc address check, and function mem_to_page will handle the non-vmalloc address when map it to a physical address. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Weigang Li <[email protected]> Closes #7125
*	Fix hash_lock / keystore.sk_dk_lock lock inversion	Brian Behlendorf	2018-02-04	1	-48/+39
\| \| \| \| \| \| \| \| \| \| \| \| \|	The keystore.sk_dk_lock should not be held while performing I/O. Drop the lock when reading from disk and update the code so they the first successful caller adds the key. Improve error handling in spa_keystore_create_mapping_impl(). Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: RageLtMan <rageltman@sempervictus> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7112 Closes #7115
*	Fix systemd_ RPM macros usage on Debian-based distributions	LOLi	2018-02-02	1	-1/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Debian-based distributions do not seem to provide RPM macros for dealing with systemd pre- and post- (un)install actions: this results in errors when installing or upgrading .deb packages because the resulting control scripts contain the following unresolved macros: * %systemd_post * %systemd_preun * %systemd_postun Fix this by providing default values for postinstall, preuninstall and postuninstall scripts when these macros are not defined. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7074 Closes #7100
*	Change os->os_next_write_raw to work per txg	Tom Caputi	2018-02-02	6	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, os_next_write_raw is a single boolean used for determining whether or not the next call to dmu_objset_sync() should write out the objset_phys_t as a raw buffer. Since the boolean is not associated with a txg, the work simply happens during the next txg, which is not necessarily the correct one. In the current implementation this issue was misdiagnosed, resulting in a small hack in dmu_objset_sync() which seemed to resolve the problem. This patch changes os_next_write_raw to be an array of booleans, one for each txg in TXG_OFF and removes the hack. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #6864
*	Raw sends must be able to decrease nlevels	Tom Caputi	2018-02-02	9	-16/+289
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, when a raw zfs send file includes a DRR_OBJECT record that would decrease the number of levels of an existing object, the object is reallocated with dmu_object_reclaim() which creates the new dnode using the old object's nlevels. For non-raw sends this doesn't really matter, but raw sends require that nlevels on the receive side match that of the send side so that the checksum-of-MAC tree can be properly maintained. This patch corrects the issue by freeing the object completely before allocating it again in this case. This patch also corrects several issues with dnode_hold_impl() and related functions that prevented dnodes (particularly multi-slot dnodes) from being reallocated properly due to the fact that existing dnodes were not being fully cleaned up when they were freed. This patch adds a test to make sure that zfs recv functions properly with incremental streams containing dnodes of different sizes. Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6821 Closes #6864
*	Fix recovery import (-F) with encrypted pool	Tom Caputi	2018-02-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When performing zil_claim() at pool import time, it is important that encrypted datasets set os_next_write_raw before writing to the zil_header_t. This prevents the code from attempting to re-authenticate the objset_phys_t when it writes it out, which is unnecessary because the zil_header_t is not protected by either objset MAC and impossible since the keys aren't loaded yet. Unfortunately, one of the code paths did not set this flag, which causes failed ASSERTs during 'zpool import -F'. This patch corrects this issue. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #6864 Closes #6916
*	Encryption Stability and On-Disk Format Fixes	Tom Caputi	2018-02-02	31	-187/+787
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The on-disk format for encrypted datasets protects not only the encrypted and authenticated blocks themselves, but also the order and interpretation of these blocks. In order to make this work while maintaining the ability to do raw sends, the indirect bps maintain a secure checksum of all the MACs in the block below it along with a few other fields that determine how the data is interpreted. Unfortunately, the current on-disk format erroneously includes some fields which are not portable and thus cannot support raw sends. It is not possible to easily work around this issue due to a separate and much smaller bug which causes indirect blocks for encrypted dnodes to not be compressed, which conflicts with the previous bug. In addition, the current code generates incompatible on-disk formats on big endian and little endian systems due to an issue with how block pointers are authenticated. Finally, raw send streams do not currently include dn_maxblkid when sending both the metadnode and normal dnodes which are needed in order to ensure that we are correctly maintaining the portable objset MAC. This patch zero's out the offending fields when computing the bp MAC and ensures that these MACs are always calculated in little endian order (regardless of the host system's byte order). This patch also registers an errata for the old on-disk format, which we detect by adding a "version" field to newly created DSL Crypto Keys. We allow datasets without a version (version 0) to only be mounted for read so that they can easily be migrated. We also now include dn_maxblkid in raw send streams to ensure the MAC can be maintained correctly. This patch also contains minor bug fixes and cleanups. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #6845 Closes #6864 Closes #7052
*	tx_waited -> tx_dirty_delayed in trace_dmu.h	Dr. András Korn	2018-01-31	1	-5/+6
\| \| \| \| \| \| \| \| \| \|	This change was missed in 0735ecb33485e91a78357a274e47c2782858d8b9. Reviewed-by: Fabian Grünbichler <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: András Korn <[email protected]> Closes #7096
*	Change movaps to movups in AES-NI code	Tom Caputi	2018-01-31	3	-48/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the ICP contains accelerated assembly code to be used specifically on CPUs with AES-NI enabled. This code makes heavy use of the movaps instruction which assumes that it will be provided aes keys that are 16 byte aligned. This assumption seems to hold on Illumos, but on Linux some kernel options such as 'slub_debug=P' will violate it. This patch changes all instances of this instruction to movups which is the same except that it can handle unaligned memory. This patch also adds a few flags which were accidentally never given to the assembly compiler, resulting in objtool warnings. Reviewed by: Gvozden Neskovic <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Nathaniel R. Lewis <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7065 Closes #7108
*	Fix txg_sync_thread hang in scan_exec_io()	Brian Behlendorf	2018-01-31	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When scn->scn_maxinflight_bytes has not been initialized it's possible to hang on the condition variable in scan_exec_io(). This issue was uncovered by ztest and is only possible when deduplication is enabled through the following call path. txg_sync_thread() spa_sync() ddt_sync_table() ddt_sync_entry() dsl_scan_ddt_entry() dsl_scan_scrub_cb() dsl_scan_enqueuei() scan_exec_io() cv_wait() Resolve the issue by always initializing scn_maxinflight_bytes to a reasonable minimum value. This value will be recalculated in dsl_scan_sync() to pick up changes to zfs_scan_vdev_limit and the addition/removal of vdevs. Reviewed-by: Tom Caputi <[email protected]> Reviewed by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7098
*	remove pools without a bootfs from BOOTFS variable	Matthew Thode	2018-01-30	2	-2/+2
\| \| \| \| \| \| \| \|	Use the same method used in zfs-load-key. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Llewelyn Trahaearn <[email protected]> Signed-off-by: Matthew Thode <[email protected]> Closes #7089
*	Fix 'zfs receive -o' when used with '-e\|-d'	LOLi	2018-01-30	2	-2/+39
\| \| \| \| \| \| \| \| \| \| \| \| \|	When used in conjunction with one of '-e' or '-d' zfs receive options none of the properties requested to be set (-o) are actually applied: this is caused by a wrong assumption made about the toplevel dataset in zfs_receive_one(). Fix this by correctly detecting the toplevel dataset. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7088