aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Linux 4.1 compat: loop device on ZFSChunwei Chen2015-08-245-137/+162
| | | | | | | | | | | | | | | | | | | Starting from Linux 4.1 allows iov_iter with bio_vec to be passed into iter_read/iter_write. Notably, the loop device will pass bio_vec to backend filesystem. However, current ZFS code assumes iovec without any check, so it will always crash when using loop device. With the restructured uio_t, we can safely pass bio_vec in uio_t with UIO_BVEC set. The uio* functions are modified to handle bio_vec case separately. The const uio_iov causes some warning in xuio related stuff, so explicit convert them to non const. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3511 Closes #3640
* Add compatibility layer for {kmap,kunmap}_atomicChunwei Chen2015-08-244-1/+63
| | | | | | | | | Starting from linux-2.6.37, {kmap,kunmap}_atomic takes 1 argument instead of 2. We use zfs_{kmap,kunmap}_atomic as wrappers and always take 2 argument, but ignore the 2nd for newer kernel. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 4.2 compat: vfs_rename()Brian Behlendorf2015-08-191-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The spa_config_write() function relies on the classic method of making sure updates to the /etc/zfs/zpool.cache file are atomic. It writes out a temporary version of the file and then uses vn_rename() to switch it in to place. This way there can never exist a partial version of the file, it's all or nothing. Conceptually this is a good strategy and it makes good sense for platforms where it's easy to do a rename within the kernel. Unfortunately, Linux is not one of those platforms. Even doing basic I/O to a file system from within the kernel is strongly discouraged. In order to support this at all the vn_rename() implementation ends up being complex and fragile. So fragile that recent Linux 4.2 changes have broken it. While it is possible to update vn_rename() to work with the latest kernels a better long term strategy is to stop using vn_rename() entirely. Then all this complex, fragile code can be removed. Achieving this is straight forward because config_write() is the only consumer of vn_rename(). This patch reworks spa_config_write() to update the cache file in place. The file will be truncated, written out, and then synced to disk. If an error is encountered the file will be unlinked leaving the system in a consistent state. This does expose a tiny tiny tiny window where a system could crash at exactly the wrong moment could leave a partially written cache file. However, this is highly unlikely because the cache file is 1) infrequently updated, 2) only a few kilobytes in size, and 3) written with a single vn_rdwr() call. If this were to somehow happen it poses no risk to pool. Simply removing the cache file will allow the pool to be imported cleanly. Going forward this will be even less of an issue as we intend to disable the use of a cache file by default. Bottom line not using vn_rename() allows us to make ZoL more robust against upstream kernel changes. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3653
* Handle zap_lookup() failure in ddt_object_load()Brian Behlendorf2015-08-191-3/+4
| | | | | | | | | Failing to lookup a name in the spa_ddt_stat_object should not result in a panic in ddt_object_load(). The error can be safely returned to the caller for handling resulting in a useful user error message. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3370
* Fix zvol detectionRichard Yao2015-08-191-0/+6
| | | | | | | | | The zpool create subcomand should not return an error on debug builds of the userland tools when given zvols. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3595
* Add parenthesis to the ternary operatortuxoko2015-08-191-1/+1
| | | | | | | | | Without the parenthesis, this particular ASSERT will evaluate to "(RW_READER == (!zap->zap_ismicro && fatreader)) ? RW_READER : lti" Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3685
* Fix build failure with Linux 4.1 and FTRACEChris Dunlop2015-08-181-0/+3
| | | | | | | | See also #3546, commit c1718e9 Signed-off-by: Chris Dunlop <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3673
* Linux 4.1 compat: configure bdi_setup_and_register()Chris Dunlop2015-08-181-2/+2
| | | | | | | | | | Pull struct backing_dev_info off the stack: by linux-4.1 it's grown past our 1024 byte stack frame warning limit resulting in an incorrect configure result. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlop <[email protected]> Closes #3671
* ztest: display non-index properties properly at verbose level 6Tim Chase2015-07-301-3/+10
| | | | | | | | | | | | At verbosity levels of 6 or greater, ztest_dsl_prop_set_uint64() attempts to display the value of all properties as indexed values regardless of whether the property is an indexed value or simply an un-indexed integer. This patch causes the numeric value of the property to be displayed if zfs_prop_index_to_string() fails. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3649
* Rework zed_notify_email for configurable PROG/OPTSChris Dunlap2015-07-302-11/+51
| | | | | | | | | | | | | | | | | | | | | | | This commit reworks the zed_notify_email() function to allow configuration of the mail executable and command-line arguments. ZED_EMAIL_PROG specifies the name or path of the executable responsible for sending notifications via email. This variable defaults to "mail". ZED_EMAIL_OPTS specifies command-line options passed to ZED_EMAIL_PROG. The following keyword substitutions are performed: - @ADDRESS@ is replaced with the recipient email address(es) - @SUBJECT@ is replaced with the notification subject This variable defaults to "-s '@SUBJECT@' @ADDRESS@". ZED_EMAIL_ADDR replaces ZED_EMAIL (although the latter is retained for backward compatibility). This variable can contain multiple addresses as long as they are delimited by whitespace. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3634 Closes #3631
* Fix whitespace in zed_log_errChris Dunlap2015-07-301-1/+1
| | | | | | | | This commit fixes the two adjacent spaces that appear in zed_log_err() messages when ZEVENT_EID is undefined. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Update arc_memory_throttle() to check pageoutBrian Behlendorf2015-07-302-13/+44
| | | | | | | | | | | | | | | | | | | | | | | | This brings the behavior of arc_memory_throttle() back in sync with illumos. The updated memory throttling policy roughly goes like this: * Never throttle if more than 10% of memory is free. This threshold is configurable with the zfs_arc_lotsfree_percent module option. * Minimize any throttling of kswapd even when free memory is below the set threshold. Allow it to write out pages as quickly as possible to help alleviate the memory pressure. * Delay all other threads when free memory is below the set threshold in order to avoid compounding the memory pressure. Buffers will be evicted from the ARC to reduce the issue. The Linux specific zfs_arc_memory_throttle_disable module option has been removed in favor of the existing zfs_arc_lotsfree_percent tuning. Setting zfs_arc_lotsfree_percent=0 will have the same effect as zfs_arc_memory_throttle_disable and it was therefore redundant. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3637
* Update arc_available_memory() to check freememBrian Behlendorf2015-07-302-31/+46
| | | | | | | | | | | | | | | | | | | | | | While Linux doesn't provide detailed information about the state of the VM it does provide us total free pages. This information should be incorporated in to the arc_available_memory() calculation rather than solely relying on a signal from direct reclaim. Conceptually this brings arc_available_memory() back in sync with illumos. It is also desirable that the target amount of free memory be tunable on a system. While the default values are expected to work well for most workloads there may be cases where custom values are needed. The zfs_arc_sys_free module option was added for this purpose. zfs_arc_sys_free - The target number of bytes the ARC should leave as free memory on the system. This value can checked in /proc/spl/kstat/zfs/arcstats and setting this module option will override the default value. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3637
* Bound zvol_threads module optionBrian Behlendorf2015-07-292-4/+5
| | | | | | | | | | The zvol_threads module option should be bounded to a reasonable range. The taskq must have at least 1 thread and shouldn't have more than 1,024 at most. The default value of 32 is a reasonable default. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3614
* Fix "BUG: Bad page state" caused by writeback flagChunwei Chen2015-07-291-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | Commit d958324 fixed the deadlock between page lock and range lock by unlocking the page lock before acquiring the range lock. However, this created a new issue #3075. The problem is that if we can't set the write back bit before releasing the page lock. Then other processes will be unaware that the page is under active write back. They may therefore truncate the page, invalidate the page, or not honor the sync semantics. To workaround this problem we re-dirty the page before dropping the page lock. While this doesn't prevent the page from being truncated it does ensure it won't be invalidated. Then the range lock and the page lock are reacquired in the correct deadlock-free order. Once both locks are safely held the page state can be rechecked. If all is well and the page is in the expect state the dirty bit can be removed, the write back bit set, and the page removed from the skip count. If not the page will be handled as appropriate. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3075
* Fix build failure with Linux 4.1 and FTRACEFrédéric VANNIÈRE2015-07-298-0/+24
| | | | | | Signed-off-by: Frédéric VANNIÈRE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3546
* Align thread priority with Linux defaultsBrian Behlendorf2015-07-2812-20/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under Linux filesystem threads responsible for handling I/O are normally created with the maximum priority. Non-I/O filesystem processes run with the default priority. ZFS should adopt the same priority scheme under Linux to maintain good performance and so that it will complete fairly when other Linux filesystems are active. The priorities have been updated to the following: $ ps -eLo rtprio,cls,pid,pri,nice,cmd | egrep 'z_|spl_|zvol|arc|dbu|meta' - TS 10743 19 -20 [spl_kmem_cache] - TS 10744 19 -20 [spl_system_task] - TS 10745 19 -20 [spl_dynamic_tas] - TS 10764 19 0 [dbu_evict] - TS 10765 19 0 [arc_prune] - TS 10766 19 0 [arc_reclaim] - TS 10767 19 0 [arc_user_evicts] - TS 10768 19 0 [l2arc_feed] - TS 10769 39 0 [z_unmount] - TS 10770 39 -20 [zvol] - TS 11011 39 -20 [z_null_iss] - TS 11012 39 -20 [z_null_int] - TS 11013 39 -20 [z_rd_iss] - TS 11014 39 -20 [z_rd_int_0] - TS 11022 38 -19 [z_wr_iss] - TS 11023 39 -20 [z_wr_iss_h] - TS 11024 39 -20 [z_wr_int_0] - TS 11032 39 -20 [z_wr_int_h] - TS 11033 39 -20 [z_fr_iss_0] - TS 11041 39 -20 [z_fr_int] - TS 11042 39 -20 [z_cl_iss] - TS 11043 39 -20 [z_cl_int] - TS 11044 39 -20 [z_ioctl_iss] - TS 11045 39 -20 [z_ioctl_int] - TS 11046 39 -20 [metaslab_group_] - TS 11050 19 0 [z_iput] - TS 11121 38 -19 [z_wr_iss] Note that under Linux the meaning of a processes priority is inverted with respect to illumos. High values on Linux indicate a _low_ priority while high value on illumos indicate a _high_ priority. In order to preserve the logical meaning of the minclsyspri and maxclsyspri macros when they are used by the illumos wrapper functions their values have been inverted. This way when changes are merged from upstream illumos we won't need to remember to invert the macro. It could also lead to confusion. This patch depends on https://github.com/zfsonlinux/spl/pull/466. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #3607
* Check for NULL in dmu_free_long_range_impl()Brian Behlendorf2015-07-281-1/+5
| | | | | | | | | | A NULL should never be passed as the dnode_t pointer to the function dmu_free_long_range_impl(). Regardless, because we have a reported occurrence of this let's add some error handling to catch this. Better to report a reasonable error to caller than panic the system. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3445
* Make sure that POOL_IMPORTED is set, unset and checked where appropriate.Turbo Fredriksson2015-07-281-10/+17
| | | | | | | | | | | | | * If it's unset in find_rootfs(), no pool is imported so no point in looking for a rootfs. * If find_rootfs() couldn't find a rootfs, the pool is exported. Remember to unset POOL_IMPORTED after doing so. * Set POOL_IMPORTED if/when a pool have been imported in import_pool(). * Improve backup import (the one using cache file). Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3636
* Fix some minor issues with the SYSV init and initramfs scripts.Turbo Fredriksson2015-07-248-20/+27
| | | | | | | | | | | | | | | | | | | | | This is some minor fixes to commits 2cac7f5f11756663525a5d4604d9f0a3202d4024 and 2a34db1bdbcecf5019c4a59f2a44c92fe82010f2. * Make sure to alien'ate the new initramfs rpm package as well! The rpm package is build correctly, but alien isn't run on it to create the deb. * Before copying file from COPY_FILE_LIST, make sure the DESTDIR/dir exists. * Include /lib/udev/vdev_id file in the initrd. * Because the initrd needs to use '/sbin/modprobe' instead of 'modprobe', we need to use this in load_module() as well. * Make sure that load_module() can be used more globaly, instead of calling '/sbin/modprobe' all over the place. * Make sure that check_module_loaded() have a parameter - module to check. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3626
* Minor style cleanupBrian Behlendorf2015-07-231-10/+7
| | | | | | | | | Address minor differences in style between upstream and ZoL. This patch contains no functional differences and is solely designed to minimize the delta from upstream. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3533
* Remove double counting HDR_L2ONLY_SIZEBrian Behlendorf2015-07-231-3/+0
| | | | | | | | | | | Commit d962d5d didn't quite properly resolve the HDR_L2ONLY_SIZE accounting. Accounting is now performed only in the constructor and destructor which is a nice simplification. It should have been removed the from create and destroy functions. This brings up back in sync with upstream. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3533
* Add hdr_recl() reclaim callbackBrian Behlendorf2015-07-231-2/+18
| | | | | | | | | | | Originally removed because it wasn't required under Linux. However, there may still be some utility in signaling the arc reclaim thread under Linux via reclaim. This should already have happened by other means but it's not harmless and reduces another point of divergence with upstream. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3533
* Reinstate zfs_arc_p_min_shiftBrian Behlendorf2015-07-232-2/+26
| | | | | | | | | | | | | | | Commit f521ce1 removed the minimum value for "arc_p" allowing it to drop to zero or grow to "arc_c". This was done to improve specific workload which constantly dirties new "metadata" but also frequently touches a "small" amount of mfu data (e.g. mkdir's). This change may still be desirable but it needs to be re-investigated. in the context of the recent ARC changes from upstream. Therefore this code is being restored to facilitate benchmarking. By setting "zfs_arc_p_min_shift=64" we easily compare the performance. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3533
* Illumos 5817 - change type of arcs_size from uint64_t to refcount_tPrakash Surya2015-07-232-28/+113
| | | | | | | | | | | | | | | | | 5817 change type of arcs_size from uint64_t to refcount_t Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Alex Reece <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Garrett D'Amore <[email protected]> References: https://www.illumos.org/issues/5817 https://github.com/illumos/illumos-gate/commit/2fd872a Ported-by: Brian Behlendorf <[email protected]> Issue #3533
* Illumos 5445 - Add more visibility via arcstatsPrakash Surya2015-07-231-40/+151
| | | | | | | | | | | | | | | | | | | | | | | 5445 Add more visibility via arcstats; specifically arc_state_t stats and differentiate between "data" and "metadata" Reviewed by: Basil Crow <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Bayard Bell <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://www.illumos.org/issues/5445 https://github.com/illumos/illumos-gate/commit/4076b1b Porting Notes: This patch is an improved version of cc7f677 which was previously merged in ZoL. This patch incorporates the additional improvements which were made upstream. Ported-by: Brian Behlendorf <[email protected]> Issue #3533
* Illumos 5376 - arc_kmem_reap_now() should not result in clearing arc_no_growMatthew Ahrens2015-07-233-157/+374
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5376 arc_kmem_reap_now() should not result in clearing arc_no_grow Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Steven Hartland <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5376 https://github.com/illumos/illumos-gate/commit/2ec99e3 Porting Notes: The good news is that many of the recent changes made upstream to the ARC tackled issues previously observed by ZoL with similar solutions. The bad news is those solution weren't identical to the ones we applied. This patch is designed to split the difference and apply as much of the upstream work as possible. * The arc_available_memory() function was removed previous in ZoL but due to the upstream changes it makes sense to add it back. This function has been customized for Linux so that it can be used to determine a low memory. This provides the same basic functionality as the illumos version allowing us to minimize changes through the rest of the code base. The exact mechanism used to detect a low memory state remains unchanged so this change isn't a significant as it might first appear. * This patch includes the long standing fix for arc_shrink() which was originally proposed in #2167. Since there were related changes to this function it made sense to include that work. * The arc_init() function has been re-factored. As before it sets sane default values for the ARC but then calls arc_tuning_update() to apply user specific tuning made via module options. The arc_tuning_update() function is then called periodically by the arc_reclaim_thread() to apply changes to the tunings made during normal operation. Ported-by: Brian Behlendorf <[email protected]> Closes #3616 Closes #2167
* Set default _initconfdir directoryBrian Behlendorf2015-07-211-0/+5
| | | | | | | | | | | The _initconfdir macro is normally provided by global rpm macros file for use in the spec file. However, older distributions such as CentOS 6 do not define it. To prevent a build failure in this case the spec file has been updated to use a reasonable default when the value is undefined. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3617
* Add logic to try and recover an inode with an invalid modeBrian Behlendorf2015-07-171-4/+11
| | | | | | | | | | | | | | | | | | | | | | | When an inode is detected with invalid mode bits the safe thing to do is panic the system. This indicates a problem with the contents of a dnode and it should never be possible. This is the default behavior. Unfortunately, due to flaws in the system attribute (SA) implementation (on all platforms) it was possible that ZFS could create a damaged dnode. This was a rare issue which only impacted dnodes which used a spill block. Normally only symlinks and files with ACLs would require a spill block. However, if the dataset had the xattr=sa property set and extended attributes were used this problem could occur. As of the 0.6.4 tag the root cause of this issue has been fixed. For pools which are exhibiting this damage the 'zfs_recover=1' module option may be set. This will cause ZFS to interpret the dnode with invalid mode bits as a normal file. This may allow the files to be accessed for recovery purposes. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3548
* Support parallel build trees (VPATH builds)Turbo Fredriksson2015-07-1742-388/+476
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Build products from an out of tree build should be written relative to the build directory. Sources should be referred to by their locations in the source directory. This is accomplished by adding the 'src' and 'obj' variables for the module Makefile.am, using relative paths to reference source files, and by setting VPATH when source files are not co-located with the Makefile. This enables the following: $ mkdir build $ cd build $ ../configure \ --with-spl=$HOME/src/git/spl/ \ --with-spl-obj=$HOME/src/git/spl/build $ make -s This change also has the advantage of resolving the following warning which is generated by modern versions of automake. Makefile.am:00: warning: source file 'xxx' is in a subdirectory, Makefile.am:00: but option 'subdir-objects' is disabled Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1082
* Update inode under range lockBrian Behlendorf2015-07-171-1/+1
| | | | | | | | | | | | After a successful write the inode must be updated under the range lock. If it is updated after dropping the lock there exists a race where the znode and inode wile disagree about the file size. This could result in narrow window of time where read(2) is able to access data beyond what fstat(2) reports as the file size. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #3601
* Linux 4.2 compat: follow_link() / put_link()Brian Behlendorf2015-07-176-8/+80
| | | | | | | | | | | | | | | | | | | | As of Linux 4.2 the kernel has completely retired the nameidata structure. One of the few remaining consumers of this interface were the follow_link() and put_link() callbacks. This patch adds the required checks to configure to detect the interface change and updates the functions accordingly. Migrating to the simple_follow_link() interface was considered but was decided against ironically due to the increased complexity. It also should be noted that the kernel follow_link() and put_link() interfaces changes several times after 4.1 and but before 4.2. This means there is a narrow range of kernel commits which never appear in an official tag of the Linux kernel which ZoL will not build. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Issue #3596
* Linux 4.2 compat: remove bio->bi_cnt accessBrian Behlendorf2015-07-171-9/+0
| | | | | | | | | | | | Linux 4.2 commit torvalds/linux@dac5621 renamed bio->bi_cnt to bio->__bi_cnt. Because this value is only used once in a block of debug code it simplest just to remove the PANIC. To my knowledge this debugging has never been hit or proved useful so this is no great loss. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #3596
* Linux 4.2 compat: bdi_setup_and_register()Brian Behlendorf2015-07-171-0/+1
| | | | | | | | | | | The vfs_compat.h header should include the linux/backing-dev.h header because it depends on the bdi_* functions defined there. In previous kernels this header was being indirectly included which prevented a build failure. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #3596
* Illumos 5347 - idle pool may run itself out of spaceMatthew Ahrens2015-07-144-25/+51
| | | | | | | | | | | | | | | | | | | | | | | | 5347 idle pool may run itself out of space Reviewed by: Alex Reece <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Steven Hartland <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://github.com/illumos/illumos-gate/commit/231aab8 https://github.com/illumos/illumos-gate/commit/4a92375 3642 https://www.illumos.org/issues/5347 https://github.com/zfsonlinux/zfs/commit/89b1cd6 (partial commit & fix) https://github.com/zfsonlinux/zfs/commit/fbeddd6 Illumos 4390 https://github.com/zfsonlinux/zfs/commit/2696dfa Illumos 3642, 3643 Porting notes: This is completing the partial fix from FreeBSD Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3586
* Illumos 5764 - "zfs send -nv" directs output to stderrManoj Joseph2015-07-142-10/+18
| | | | | | | | | | | | | | | | | | 5764 "zfs send -nv" directs output to stderr Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Basil Crow <[email protected]> Reviewed by: Steven Hartland <[email protected]> Reviewed by: Bayard Bell <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://github.com/illumos/illumos-gate/commit/dc5f28a https://www.illumos.org/issues/5764 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3585
* Illumos 5610 - zfs clone from different source and target pools produces ↵Alexander Eremin2015-07-141-11/+2
| | | | | | | | | | | | | | | | | | | | coredump 5610 zfs clone from different source and target pools produces coredump Reviewed by: Josef 'Jeff' Sipek <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://github.com/illumos/illumos-gate/commit/03b1c29 https://www.illumos.org/issues/5610 https://www.illumos.org/issues/5824 https://github.com/zfsonlinux/zfs/issues/2911 https://github.com/zfsonlinux/zfs/commit/9063f65 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3584
* Illumos 1765 - assert triggered in libzfs_import.cPrasad Joshi2015-07-141-1/+4
| | | | | | | | | | | | | | | | 1765 assert triggered in libzfs_import.c trying to import pool name beginning with a number Reviewed-by: Garrett D'Amore <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://github.com/illumos/illumos-gate/commit/9edf9eb https://www.illumos.org/issues/1765 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3562
* Failure of userland copy should return EFAULTRichard Yao2015-07-141-1/+1
| | | | | | | | | | | | | | | | | Many key internal functions pass system return codes that are safe to return to userland. In the case of ddi_copyin(9F), an error passes -1 and the documentation states very clearly that drivers should pass EFAULT to userland when this happens. http://illumos.org/man/9F/ddi_copyin This does not happen in the ZFS source code. I believe it should be changed to pass EFAULT. I caught this when writing man pages for the libzfs_core API. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3575
* Translate sync zio to sync bioBoris Protopopov2015-07-131-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Translate zio requests with ZIO_PRIORITY_SYNC_READ and ZIO_PRIORITY_SYNC_WRITE into synchronous bio requests by setting READ_SYNC and WRITE_SYNC flags. Specifically, WRITE_SYNC flag turns out to have a pronounced effect when writing to an SSD-based SLOG. When WRITE_SYNC is not set (WRITE is set instead), the block trace for a SLOG device looks as follows: ... 130,96 0 3 0.008968390 0 C W 830464 + 136 [0] 130,96 0 4 0.011999161 0 C W 830720 + 136 [0] 130,96 0 5 0.023955549 0 C W 831744 + 136 [0] 130,96 0 6 0.024337663 19775 A W 832000 + 136 <- (130,97) 829952 130,96 0 7 0.024338823 19775 Q W 832000 + 136 [z_wr_iss/6] 130,96 0 8 0.024340523 19775 G W 832000 + 136 [z_wr_iss/6] 130,96 0 9 0.024343187 19775 P N [z_wr_iss/6] 130,96 0 10 0.024344120 19775 I W 832000 + 136 [z_wr_iss/6] 130,96 0 11 0.026784405 0 UT N [swapper] 1 130,96 0 12 0.026805339 202 U N [kblockd/0] 1 130,96 0 13 0.026807199 202 D W 832000 + 136 [kblockd/0] 130,96 0 14 0.026966948 0 C W 832000 + 136 [0] 130,96 3 1 0.000449358 19788 A W 829952 + 136 <- (130,97) 827904 130,96 3 2 0.000450951 19788 Q W 829952 + 136 [z_wr_iss/19] 130,96 3 3 0.000453212 19788 G W 829952 + 136 [z_wr_iss/19] 130,96 3 4 0.000455956 19788 P N [z_wr_iss/19] 130,96 3 5 0.000457076 19788 I W 829952 + 136 [z_wr_iss/19] 130,96 3 6 0.002786349 0 UT N [swapper] 1 ... Here the 130,197 is the partition created on the log device when adding it to the pool, whereas the base device is 130,96. As one can see, the writes to the SLOG are not marked synchronous (the S is missing next to W), and the queue unplugs occur based on the timer (UT event) resulting in slightly over 2 msec latency of writes. This results in a sub-par performance of single stream synchronous writes (limited by latency of the SLOG). When the WRITE_SYNC is set, a similar trace looks as follows: ... 130,96 4 1 0.000000000 70714 A WS 4280576 + 136 <- (130,97) 4278528 130,96 4 2 0.000000832 70714 Q WS 4280576 + 136 [(null)] 130,96 4 3 0.000002109 70714 G WS 4280576 + 136 [(null)] 130,96 4 4 0.000003394 70714 P N [(null)] 130,96 4 5 0.000003846 70714 I WS 4280576 + 136 [(null)] 130,96 4 6 0.000004854 70714 D WS 4280576 + 136 [(null)] 130,96 5 1 0.000354487 70713 A WS 4280832 + 136 <- (130,97) 4278784 130,96 5 2 0.000355072 70713 Q WS 4280832 + 136 [(null)] 130,96 5 3 0.000356383 70713 G WS 4280832 + 136 [(null)] 130,96 5 4 0.000357635 70713 P N [(null)] 130,96 5 5 0.000358088 70713 I WS 4280832 + 136 [(null)] 130,96 5 6 0.000359191 70713 D WS 4280832 + 136 [(null)] 130,96 0 76 0.000159539 0 C WS 4280576 + 136 [0] 130,96 16 85 0.000742108 70718 A WS 4281088 + 136 <- (130,97) 4279040 130,96 16 86 0.000743197 70718 Q WS 4281088 + 136 [z_wr_iss/15] 130,96 16 87 0.000744450 70718 G WS 4281088 + 136 [z_wr_iss/15] 130,96 16 88 0.000745817 70718 P N [z_wr_iss/15] 130,96 16 89 0.000746705 70718 I WS 4281088 + 136 [z_wr_iss/15] 130,96 16 90 0.000747848 70718 D WS 4281088 + 136 [z_wr_iss/15] 130,96 0 77 0.000604063 0 C WS 4280832 + 136 [0] 130,96 0 78 0.000899858 0 C WS 4281088 + 136 [0] As one can see, all the writes are synchronous (WS), and I/O completions (e.g. from issue I to completion C) take 160-250 usec, or about 10x faster. Since WRITE_SYNC or READ_SYNC flags are among several factors that are considered when processing bio requests, it seems prudent to mark all the zio requests of synchronous priority with the READ/WRITE_SYNC flags to make them eligible for consideration as such by the Linux block I/O layer. Signed-off-by: Boris Protopopov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3529
* Fix switch-bool warningBrian Behlendorf2015-07-131-1/+1
| | | | | | | | | | | As of gcc version 5.1.1 a new warning has been added to detect the use of a boolean in a switch statement (-Wswitch-bool). Resolve the warning by explicitly casting the value to an integer type. zfs-0.6.4/module/zfs/zvol.c: In function 'zvol_request': error: switch condition has boolean value [-Werror=switch-bool] Signed-off-by: Brian Behlendorf <[email protected]>
* Disable gcc bool-compare warningBrian Behlendorf2015-07-134-0/+30
| | | | | | | | | | As of gcc version 5.1.1 a new boolean comparison warning has been introduced. This warning is harmless but is triggered several places in the ZFS code base. Because warnings are promoted to errors when building with debugging enabled it is necessary to disable the warning when using versions of gcc which automatically enabling this check. Signed-off-by: Brian Behlendorf <[email protected]>
* Use truncate instead of fallocate in ziltest.shBrian Behlendorf2015-07-131-2/+2
| | | | | | | | | | For the purposes of creating sparse files the truncate command is preferable to fallocate because generic sparse files are more widely supported by older platforms. Specifically Debian Wheezy which is based on a 2.6.32 kernel used ext3 by default which at the time did not support it. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix Xen Virtual Block Device detectionRichard Yao2015-07-102-2/+11
| | | | | | | | | | | We fail to make partitions on xvd (Xen Virtual Block) devices. This also causes debug builds of zpool create to return an error when given xen virtual block devices. These devices should be given the same treatment as vd (KVM Virtual Block) devices, so we adjust the relevant code paths. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3576
* Illumos 5813 - zfs_setprop_error(): Handle errno value E2BIG.Will Andrews2015-07-101-0/+6
| | | | | | | | | | | | | | | | | 5813 zfs_setprop_error(): Handle errno value E2BIG. Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Garrett D'Amore <[email protected]> References: https://github.com/illumos/illumos-gate/commit/6fdcb3d https://www.illumos.org/issues/5813 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3572
* Illumos 5661 - ZFS: "compression = on" should use lz4 if feature is enabledJustin T. Gibbs2015-07-105-31/+61
| | | | | | | | | | | | | | | | 5661 ZFS: "compression = on" should use lz4 if feature is enabled Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Josef 'Jeff' Sipek <[email protected]> Reviewed by: Xin LI <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://github.com/illumos/illumos-gate/commit/db1741f https://www.illumos.org/issues/5661 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3571
* Illumos 5427 - memory leak in libzfs when doing rollbackJan Kryl2015-07-101-4/+1
| | | | | | | | | | | | | | | 5427 memory leak in libzfs when doing rollback Reviewed by: Michael Tsymbalyuk <[email protected]> Reviewed by: Steven Hartland <[email protected]> Approved by: Dan McDonald <[email protected]> References https://github.com/illumos/illumos-gate/commit/b7070b7 https://www.illumos.org/issues/5427 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3569
* Illumos 5118 - When verifying or creating a storage pool, error messages ↵Basil Crow2015-07-101-16/+18
| | | | | | | | | | | | | | | | | | | | only show one device 5118 When verifying or creating a storage pool, error messages only show one device Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Boris Protopopov <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://github.com/illumos/illumos-gate/commit/75fbdf9 https://www.illumos.org/issues/5118 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3567
* Illumos 4966 - zpool list iterator does not update outputGeorge Wilson2015-07-101-10/+10
| | | | | | | | | | | | | | | | 4966 zpool list iterator does not update output Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Dan McDonald <[email protected]> Approved by: Garrett D'Amore <[email protected]> References: https://github.com/illumos/illumos-gate/commit/cd67d23 https://www.illumos.org/issues/4966 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3566
* Illumos 4745 - fix AVL code misspellingsJosef 'Jeff' Sipek2015-07-102-7/+8
| | | | | | | | | | | | | | | 4745 fix AVL code misspellings Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Richard Lowe <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://github.com/illumos/illumos-gate/commit/6907ca4 https://www.illumos.org/issues/4745 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3565