summaryrefslogtreecommitdiffstats
path: root/lib
Commit message (Collapse)AuthorAgeFilesLines
* Honor xattr=sa dataset propertyNed Bass2015-09-191-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | ZFS incorrectly uses directory-based extended attributes even when xattr=sa is specified as a dataset property or mount option. Support to honor temporary mount options including "xattr" was added in commit 0282c4137e7409e6d85289f4955adf07fac834f5. There are two issues with the mount option handling: * Libzfs has historically included "xattr" in its list of default mount options. This overrides the dataset property, so the dataset is always configured to use directory-based xattrs even when the xattr dataset property is set to off or sa. Address this by removing "xattr" from the set of default mount options in libzfs. * There was no way to enable system attribute-based extended attributes using temporary mount options. Add the mount options "saxattr" and "dirxattr" which enable the xattr behavior their names suggest. This approach has the advantages of mirroring the valid xattr dataset property values and following existing conventions for mount option names. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3787
* Add temporary mount optionsBrian Behlendorf2015-09-035-107/+3
| | | | | | | | | | | | | Add the required kernel side infrastructure to parse arbitrary mount options. This enables us to support temporary mount options in largely the same way it is handled on other platforms. See the 'Temporary Mount Point Properties' section of zfs(8) for complete details. Signed-off-by: Brian Behlendorf <[email protected]> Closes #985 Closes #3351
* Check large block feature flag on volumesBrian Behlendorf2015-08-281-2/+8
| | | | | | | | | | | | | | | Since ZoL allows large blocks to be used by volumes, unlike upstream illumos, the feature flag must be checked prior to volume creation. This is critical because unlike filesystems, volumes will create a object which uses large blocks as part of the create. Therefore, it cannot be safely checked in zfs_check_settable() after the dataset can been created. In addition this patch updates the relevant error messages to use zfs_nicenum() to print the maximum blocksize. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3591
* Fix zvol detectionRichard Yao2015-08-191-0/+6
| | | | | | | | | The zpool create subcomand should not return an error on debug builds of the userland tools when given zvols. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3595
* Align thread priority with Linux defaultsBrian Behlendorf2015-07-282-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under Linux filesystem threads responsible for handling I/O are normally created with the maximum priority. Non-I/O filesystem processes run with the default priority. ZFS should adopt the same priority scheme under Linux to maintain good performance and so that it will complete fairly when other Linux filesystems are active. The priorities have been updated to the following: $ ps -eLo rtprio,cls,pid,pri,nice,cmd | egrep 'z_|spl_|zvol|arc|dbu|meta' - TS 10743 19 -20 [spl_kmem_cache] - TS 10744 19 -20 [spl_system_task] - TS 10745 19 -20 [spl_dynamic_tas] - TS 10764 19 0 [dbu_evict] - TS 10765 19 0 [arc_prune] - TS 10766 19 0 [arc_reclaim] - TS 10767 19 0 [arc_user_evicts] - TS 10768 19 0 [l2arc_feed] - TS 10769 39 0 [z_unmount] - TS 10770 39 -20 [zvol] - TS 11011 39 -20 [z_null_iss] - TS 11012 39 -20 [z_null_int] - TS 11013 39 -20 [z_rd_iss] - TS 11014 39 -20 [z_rd_int_0] - TS 11022 38 -19 [z_wr_iss] - TS 11023 39 -20 [z_wr_iss_h] - TS 11024 39 -20 [z_wr_int_0] - TS 11032 39 -20 [z_wr_int_h] - TS 11033 39 -20 [z_fr_iss_0] - TS 11041 39 -20 [z_fr_int] - TS 11042 39 -20 [z_cl_iss] - TS 11043 39 -20 [z_cl_int] - TS 11044 39 -20 [z_ioctl_iss] - TS 11045 39 -20 [z_ioctl_int] - TS 11046 39 -20 [metaslab_group_] - TS 11050 19 0 [z_iput] - TS 11121 38 -19 [z_wr_iss] Note that under Linux the meaning of a processes priority is inverted with respect to illumos. High values on Linux indicate a _low_ priority while high value on illumos indicate a _high_ priority. In order to preserve the logical meaning of the minclsyspri and maxclsyspri macros when they are used by the illumos wrapper functions their values have been inverted. This way when changes are merged from upstream illumos we won't need to remember to invert the macro. It could also lead to confusion. This patch depends on https://github.com/zfsonlinux/spl/pull/466. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #3607
* Illumos 5376 - arc_kmem_reap_now() should not result in clearing arc_no_growMatthew Ahrens2015-07-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5376 arc_kmem_reap_now() should not result in clearing arc_no_grow Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Steven Hartland <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5376 https://github.com/illumos/illumos-gate/commit/2ec99e3 Porting Notes: The good news is that many of the recent changes made upstream to the ARC tackled issues previously observed by ZoL with similar solutions. The bad news is those solution weren't identical to the ones we applied. This patch is designed to split the difference and apply as much of the upstream work as possible. * The arc_available_memory() function was removed previous in ZoL but due to the upstream changes it makes sense to add it back. This function has been customized for Linux so that it can be used to determine a low memory. This provides the same basic functionality as the illumos version allowing us to minimize changes through the rest of the code base. The exact mechanism used to detect a low memory state remains unchanged so this change isn't a significant as it might first appear. * This patch includes the long standing fix for arc_shrink() which was originally proposed in #2167. Since there were related changes to this function it made sense to include that work. * The arc_init() function has been re-factored. As before it sets sane default values for the ARC but then calls arc_tuning_update() to apply user specific tuning made via module options. The arc_tuning_update() function is then called periodically by the arc_reclaim_thread() to apply changes to the tunings made during normal operation. Ported-by: Brian Behlendorf <[email protected]> Closes #3616 Closes #2167
* Support parallel build trees (VPATH builds)Turbo Fredriksson2015-07-1713-188/+256
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Build products from an out of tree build should be written relative to the build directory. Sources should be referred to by their locations in the source directory. This is accomplished by adding the 'src' and 'obj' variables for the module Makefile.am, using relative paths to reference source files, and by setting VPATH when source files are not co-located with the Makefile. This enables the following: $ mkdir build $ cd build $ ../configure \ --with-spl=$HOME/src/git/spl/ \ --with-spl-obj=$HOME/src/git/spl/build $ make -s This change also has the advantage of resolving the following warning which is generated by modern versions of automake. Makefile.am:00: warning: source file 'xxx' is in a subdirectory, Makefile.am:00: but option 'subdir-objects' is disabled Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1082
* Illumos 5764 - "zfs send -nv" directs output to stderrManoj Joseph2015-07-141-9/+15
| | | | | | | | | | | | | | | | | | 5764 "zfs send -nv" directs output to stderr Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Basil Crow <[email protected]> Reviewed by: Steven Hartland <[email protected]> Reviewed by: Bayard Bell <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://github.com/illumos/illumos-gate/commit/dc5f28a https://www.illumos.org/issues/5764 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3585
* Fix Xen Virtual Block Device detectionRichard Yao2015-07-102-2/+11
| | | | | | | | | | | We fail to make partitions on xvd (Xen Virtual Block) devices. This also causes debug builds of zpool create to return an error when given xen virtual block devices. These devices should be given the same treatment as vd (KVM Virtual Block) devices, so we adjust the relevant code paths. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3576
* Illumos 5813 - zfs_setprop_error(): Handle errno value E2BIG.Will Andrews2015-07-101-0/+6
| | | | | | | | | | | | | | | | | 5813 zfs_setprop_error(): Handle errno value E2BIG. Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Garrett D'Amore <[email protected]> References: https://github.com/illumos/illumos-gate/commit/6fdcb3d https://www.illumos.org/issues/5813 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3572
* Illumos 5427 - memory leak in libzfs when doing rollbackJan Kryl2015-07-101-4/+1
| | | | | | | | | | | | | | | 5427 memory leak in libzfs when doing rollback Reviewed by: Michael Tsymbalyuk <[email protected]> Reviewed by: Steven Hartland <[email protected]> Approved by: Dan McDonald <[email protected]> References https://github.com/illumos/illumos-gate/commit/b7070b7 https://www.illumos.org/issues/5427 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3569
* Illumos 4626 - libzfs memleak in zpool_in_use()Josef 'Jeff' Sipek2015-07-101-4/+11
| | | | | | | | | | | | | | | | 4626 libzfs memleak in zpool_in_use() Reviewed by: Tony Nguyen <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://github.com/illumos/illumos-gate/commit/fb13f48 https://www.illumos.org/issues/4626 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3563
* Illumos 5163 - arc should reap range_seg_cacheGeorge Wilson2015-06-251-0/+5
| | | | | | | | | | | | | | | | | | | | 5163 arc should reap range_seg_cache Reviewed by: Christopher Siden <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5163 https://github.com/illumos/illumos-gate/commit/83803b5 Porting Notes: Added umem_cache_reap_now() wrapped to suppress unused variable warning for user space build in arc_kmem_reap_now(). Ported-by: Brian Behlendorf <[email protected]>
* Update all default taskq settingsBrian Behlendorf2015-06-251-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Over the years the default values for the taskqs used on Linux have differed slightly from illumos. In the vast majority of cases this was done to avoid creating an obnoxious number of idle threads which would pollute the process listing. With the addition of support for dynamic taskqs all multi-threaded queues should be created as dynamic taskqs. This allows us to get the best of both worlds. * The illumos default values for the I/O pipeline can be restored. These values are known to work well for most workloads. The only exception is the zio write interrupt taskq which is changed to ZTI_P(12, 8). At least under Linux more threads has been shown to improve performance, see commit 7e55f4e. * Reduces the number of idle threads on the system when it's not under heavy load. The maximum number of threads will only be created when they are required. * Remove the vdev_file_taskq and rely on the system_taskq instead which is now dynamic and may have up to 64-threads. Again this brings us back inline with upstream. * Tasks dispatched with taskq_dispatch_ent() are allowed to use dynamic taskqs. The Linux taskq implementation supports this. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #3507
* Add IMPLY() and EQUIV() macrosBrian Behlendorf2015-06-241-1/+17
| | | | | | | | | Added for upstream compatibility, they are of the form: * IMPLY(a, b) - if (a) then (b) * EQUIV(a, b) - if (a) then (b) *AND* if (b) then (a) Signed-off-by: Brian Behlendorf <[email protected]>
* Add taskq_wait_outstanding() functionBrian Behlendorf2015-06-111-0/+6
| | | | | | | | | SPL commit behlendorf/spl@9cef1b5 adds the taskq_wait_outstanding() interface. See the commit log for the full justification for this addition. This patch adds the required user space counterpart. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]>
* Illumos 5497 - lock contention on arcs_mtxPrakash Surya2015-06-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reviewed by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Dan McDonald <[email protected]> Porting notes and other significant code changes: The illumos 5368 patch (ARC should cache more metadata), which was never picked up by ZoL, is mostly reverted by this patch. Since ZoL relies on the kernel asynchronously calling the shrinker to actually reap memory, the shrinker wakes up arc_reclaim_waiters_cv every time it runs. The arc_adapt_thread() function no longer calls arc_do_user_evicts() since the newly-added arc_user_evicts_thread() calls it periodically. Notable conflicting ZoL commits which conflicted with this patch or whose effects are either duplicated or un-done by this patch: 302f753 - Integrate ARC more tightly with Linux 39e055c - Adjust arc_p based on "bytes" in arc_shrink f521ce1 - Allow "arc_p" to drop to zero or grow to "arc_c" 77765b5 - Remove "arc_meta_used" from arc_adjust calculation 94520ca - Prune metadata from ghost lists in arc_adjust_meta Trace support for multilist_insert() and multilist_remove() has been added and produces the following output: fio-12498 [077] .... 112936.448324: zfs_multilist__insert: ml { offset 240 numsublists 80 sublistidx 63 } fio-12498 [077] .... 112936.448347: zfs_multilist__remove: ml { offset 240 numsublists 80 sublistidx 29 } The following arcstats have been removed: recycle_miss - Used by arcstat.py and arc_summary.py, both of which have been updated appropriately. l2_writes_hdr_miss The following arcstats have been added: evict_not_enough - Number of times arc_evict_state() was unable to evict enough buffers to reach its target amount. evict_l2_skip - Number of times arc_evict_hdr() skipped eviction because it was being written to the l2arc. l2_writes_lock_retry - Replaces l2_writes_hdr_miss. Number of times l2arc_write_done() failed to acquire hash_lock (and re-tries). arc_meta_min - Shows the value of the zfs_arc_meta_min module parameter (see below). The "index" column of the "dbuf" kstat has been removed since it doesn't have a direct analog in the new multilist scheme. Additional multilist- related stats could be added in the future but would likely require extensions to the mulilist API. The following module parameters have been added: zfs_arc_evict_batch_limit - Number of ARC headers to free per sub-list before moving on to the next sub-list. zfs_arc_meta_min - Enforce a floor on the amount of metadata in the ARC. zfs_arc_num_sublists_per_state - Number of multilist sub-lists per ARC state. zfs_arc_overflow_shift - Controls amount by which the ARC must exceed the target size to be considered "overflowing". Ported-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]
* Add libzfs_error_init() functionBrian Behlendorf2015-05-221-16/+30
| | | | | | | | | | | | | | All fprintf() error messages are moved out of the libzfs_init() library function where they never belonged in the first place. A libzfs_error_init() function is added to provide useful error messages for the most common causes of failure. Additionally, in libzfs_run_process() the 'rc' variable was renamed to 'error' for consistency with the rest of the code base. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Richard Yao <[email protected]>
* Wait in libzfs_init() for the /dev/zfs deviceBrian Behlendorf2015-05-221-4/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While module loading itself is synchronous the creation of the /dev/zfs device is not. This is because /dev/zfs is typically created by a udev rule after the module is registered and presented to user space through sysfs. This small window between module loading and device creation can result in spurious failures of libzfs_init(). This patch closes that race by extending libzfs_init() so it can detect that the modules are loaded and only if required wait for the /dev/zfs device to be created. This allows scripts to reliably use the following shell construct without the need for additional error handling. $ /sbin/modprobe zfs && /sbin/zpool import -a To minimize the potential time waiting in libzfs_init() a strategy similar to adaptive mutexes is employed. The function will busy-wait for up to 10ms based on the expectation that the modules were just loaded and therefore the /dev/zfs will be created imminently. If it takes longer than this it will fall back to polling for up to 10 seconds. This behavior can be customized to some degree by setting the following new environment variables. This functionality is provided for backwards compatibility with existing scripts which depend on the module auto-load behavior. By default module auto-loading is now disabled. * ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules. * ZFS_MODULE_TIMEOUT="<seconds>" - Seconds to wait for /dev/zfs The zfs-import-* systemd service files have been updated to call '/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading behavior. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #2556
* Change 3-digit octal escapes to 4-digit onesHajo Möller2015-05-181-2/+2
| | | | | | | | | | Prefixing an octal value with a leading zero is the standard way to disambiguate it. This change only impacts the `zfs diff` output and is therefore very limited in scope. Signed-off-by: Hajo M<C3><B6>ller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3417
* Illumos 5847 - libzfs_diff should check zfs_prop_get() returnAlexander Eremin2015-05-151-3/+7
| | | | | | | | | | | | | | 5847 libzfs_diff should check zfs_prop_get() return Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Albert Lee <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5847 https://github.com/illumos/illumos-gate/commit/8430278 Ported-by: Brian Behlendorf <[email protected]> Closes #3412
* Illumos 5765 - add support for estimating send stream size with ↵Max Grossman2015-05-131-5/+17
| | | | | | | | | | | | | | | | | | | | | | lzc_send_space when source is a bookmark 5765 add support for estimating send stream size with lzc_send_space when source is a bookmark Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Steven Hartland <[email protected]> Reviewed by: Bayard Bell <[email protected]> Approved by: Albert Lee <[email protected]> References: https://www.illumos.org/issues/5765 https://github.com/illumos/illumos-gate/commit/643da460 Porting notes: * Unused variable 'recordsize' in dmu_send_estimate() dropped Ported-by: DHE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3397
* Illumos 5027 - zfs large block supportMatthew Ahrens2015-05-113-18/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5027 zfs large block support Reviewed by: Alek Pinchuk <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Josef 'Jeff' Sipek <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5027 https://github.com/illumos/illumos-gate/commit/b515258 Porting Notes: * Included in this patch is a tiny ISP2() cleanup in zio_init() from Illumos 5255. * Unlike the upstream Illumos commit this patch does not impose an arbitrary 128K block size limit on volumes. Volumes, like filesystems, are limited by the zfs_max_recordsize=1M module option. * By default the maximum record size is limited to 1M by the module option zfs_max_recordsize. This value may be safely increased up to 16M which is the largest block size supported by the on-disk format. At the moment, 1M blocks clearly offer a significant performance improvement but the benefits of going beyond this for the majority of workloads are less clear. * The illumos version of this patch increased DMU_MAX_ACCESS to 32M. This was determined not to be large enough when using 16M blocks because the zfs_make_xattrdir() function will fail (EFBIG) when assigning a TX. This was immediately observed under Linux because all newly created files must have a security xattr created and that was failing. Therefore, we've set DMU_MAX_ACCESS to 64M. * On 32-bit platforms a hard limit of 1M is set for blocks due to the limited virtual address space. We should be able to relax this one the ABD patches are merged. Ported-by: Brian Behlendorf <[email protected]> Closes #354
* Fix type mismatch on 32-bit systemsBrian Behlendorf2015-05-112-5/+6
| | | | | | | | | | | | | | The umem_alloc_aligned() function should not assume that a 'void *' type is 64-bit. It will not be on 32-bit platforms. Rather than complicating the ASSERT to handle this it is simply removed. Additionally, the '%lu' format specifier should not be assumed to imply a 64-bit value. Fix this by using the 'llu' format specifier which will always be atleast 64-bit and explicitly casing the variable to an u_longlong_t. This issue is handled the same way in many other parts of the code. Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos #5244 - zio pipeline callers should explicitly invoke next stageGeorge Wilson2015-04-301-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5244 zio pipeline callers should explicitly invoke next stage Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Alex Reece <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Steven Hartland <[email protected]> Approved by: Gordon Ross <[email protected]> References: https://www.illumos.org/issues/5244 https://github.com/illumos/illumos-gate/commit/738f37b Porting Notes: 1. The unported "2932 support crash dumps to raidz, etc. pools" caused a merge conflict due to a copyright difference in module/zfs/vdev_raidz.c. 2. The unported "4128 disks in zpools never go away when pulled" and additional Linux-specific changes caused merge conflicts in module/zfs/vdev_disk.c. Ported-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2828
* Illumos 3897 - zfs filesystem and snapshot limitsJerry Jelinek2015-04-284-0/+67
| | | | | | | | | | | | | | | | | | 3897 zfs filesystem and snapshot limits Author: Jerry Jelinek <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Christopher Siden <[email protected]> References: https://www.illumos.org/issues/3897 https://github.com/illumos/illumos-gate/commit/a2afb61 Porting Notes: dsl_dataset_snapshot_check(): reduce stack usage using kmem_alloc(). Ported-by: Chris Dunlop <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 5134 - if ZFS_DEBUG or debug= is set, libzpool should enable debug ↵Matthew Ahrens2015-04-281-0/+3
| | | | | | | | | | | | | | | | | | | | | | prints 5134 if ZFS_DEBUG or debug= is set, libzpool should enable debug prints Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/projects/illumos-gate/issues/5134 https://github.com/illumos/illumos-gate/commit/7fa49ea Porting notes: Added dprintf_setup() to main in zfs_main.c and zpool_main.c. Ported by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2669
* Support the "version" property on volumes via the zfs_prop_get_int() APITim Chase2015-04-241-0/+3
| | | | | | | | | | | | | | | | | As of this commit, volumes do not possess the version property in any existing OpenZFS implementation. The zpool upgrade code, however, uses zfs_prop_get_int() to fetch the version property of all children in a pool. The semantics of the function, however, demand that it only be used for known valid properties so it returns a garbage value for volumes. This patch causes the version of a volume to appear to callers using the zfs_prop_get_int() API to be that of the default ZPL version for the implementation. In the future, should volumes gain the property, its actual value will be used. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Fixes #3313
* zpool import should honor overlay propertyNed Bass2015-03-271-0/+14
| | | | | | | | | | | | Make the 'zpool import' command honor the overlay property to allow filesystems to be mounted on a non-empty directory. As it stands now this property is only checked by the 'zfs mount' command. Move the check into 'zfs_mount()` in libzpool so the property is honored for all callers. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3227
* Check all vdev labels in 'zpool import'Brian Behlendorf2015-03-251-16/+58
| | | | | | | | | | | | | | | | | | When using 'zpool import' to scan for available pools prefer vdev names which reference vdevs with more valid labels. There should be two labels at the start of the device and two labels at the end of the device. If labels are missing then the device has been damaged or is in some other way incomplete. Preferring names with fully intact labels helps weed out bad paths and improves the likelihood of being able to import the pool. This behavior only applies when scanning /dev/ for valid pools. If a cache file exists the pools described by the cache file will be used. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlap <[email protected]> Closes #3145 Closes #2844 Closes #3107
* Increase Linux pipe buffer size on 'zfs receive'Richard Yao2015-03-201-0/+43
| | | | | | | | | | | | | | | | | | | | | I noticed when reviewing documentation that it is possible for user space to use fctnl(fd, F_SETPIPE_SZ, (unsigned long) size) to change the kernel pipe buffer size on Linux to increase the pipe size up to the value specified in /proc/sys/fs/pipe-max-size. There are users using mbuffer to improve zfs recv performance when piping over the network, so it seems advantageous to integrate such functionality directly into the zfs recv tool. This avoids the addition of two buffers and two copies (one for the buffer mbuffer adds and another for the additional pipe), so it should be more efficient. This could have been made configurable and/or this could have changed the value back to the original after we were done with the file descriptor, but I do not see a strong case for doing either, so I went with a simple implementation. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #1161
* Fix possible future overflow in zfs_nicenumChrister Ekholm2015-02-241-1/+1
| | | | | | | | | | The function zfs_nicenum that converts number to human-readable output uses a index to a string of letters. This patch limits the index to the length of the string. Signed-off-by: Christer Ekholm <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3122
* Change VERIFY to ASSERT in mutex_destroy()Brian Behlendorf2015-02-111-1/+1
| | | | | | | | | | | | | There have been multiple reports of 'zdb' tripping the VERIFY in mutex_destroy() because pthread_mutex_destroy() returns EBUSY. Exactly how this can happen still needs to be explained, but this doesn't strictly need to be fatal for non-debug builds. Therefore, this patch converts the VERIFY to an ASSERT until the root cause is determined and resolved. Signed-off-by: Brian Behlendorf <[email protected]> Issue #2027
* Produce a full snapshot list for zfs send -pTim Chase2015-02-091-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | In order to accelerate zfs receive operations in the face of many property-containing snapshots, commit 0574855 changed the header nvlist ("fss") of a send stream to exclude snapshots which aren't part of the stream. This, however, would cause zfs receive -F to erroneously remove snapshots; it would remove any snapshot which wasn't listed in the header nvlist. This patch restores the full list of snapshots in fss[<id>[snaps]] but still suppresses the properties of non-sent snapshots and also removes a consistency check in which an error is raised if a listed snapshot does not have any properties in fss[<id>[snapprops]]. The 0574855 commit also introduced a bug in which zfs send -p of a complete stream (zfs send -p pool/fs@snap) would exclude the snapshot properties in fss[<id>[snapprops]]. This patch detects the last snapshot in a series when no "from" snapshot has been specified and includes its properties. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2907
* Read spl_hostid module parameter before gethostid()Chunwei Chen2015-02-042-2/+26
| | | | | | | | | | If spl_hostid is set via module parameter, it's likely different from gethostid(). Therefore, the userspace tool should read it first before falling back to gethostid(). Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3034
* Use (void) memcpy(), not (void *) memcpy()Richard Yao2015-01-301-3/+3
| | | | | | | | | This was caught by Clang. Clearly the intent of this code was to explicitly ignore the return value. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3054
* Make `zpool import -d|-c` behave consistentlyBrian Behlendorf2015-01-281-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When importing pools with zpool import -aN there is inconsistent behavior between '-d /dev/disk/by-id' (or another path) and '-c /etc/zfs/zpool.cache'. The difference in behavior is caused by zpool_find_import_cached() returning an empty nvlist_t when there are no pools to import but zpool_find_import_impl() returns NULL for the same situation. The behavior of zpool_find_import_cached() is arguably more correct because it allows returning NULL to be used for an error case and not an empty set. This change resolves the issue by updating get_configs() such that it returns an empty set instead of NULL when no config is found. The updated behavior will now always return 0 for this case. $ zpool import -aN; echo $? no pools available to import 0 $ zpool import -aN -d /var/tmp/; echo $? no pools available to import 0 $ zpool import -aN -c /etc/zfs/zpool.cache; echo $? no pools available to import 0 Signed-off-by: Brian Behlendorf <[email protected]> Closes #2080
* Mark IO pipeline with PF_FSTRANSBrian Behlendorf2015-01-161-0/+17
| | | | | | | | | | In order to avoid deadlocking in the IO pipeline it is critical that pageout be avoided during direct memory reclaim. This ensures that the pipeline threads can always make forward progress and never end up blocking on a DMU transaction. For this very reason Linux now provides the PF_FSTRANS flag which may be set in the process context. Signed-off-by: Brian Behlendorf <[email protected]>
* Swap DTRACE_PROBE* with Linux tracepointsPrakash Surya2014-11-173-37/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch leverages Linux tracepoints from within the ZFS on Linux code base. It also refactors the debug code to bring it back in sync with Illumos. The information exported via tracepoints can be used for a variety of reasons (e.g. debugging, tuning, general exploration/understanding, etc). It is advantageous to use Linux tracepoints as the mechanism to export this kind of information (as opposed to something else) for a number of reasons: * A number of external tools can make use of our tracepoints "automatically" (e.g. perf, systemtap) * Tracepoints are designed to be extremely cheap when disabled * It's one of the "accepted" ways to export this kind of information; many other kernel subsystems use tracepoints too. Unfortunately, though, there are a few caveats as well: * Linux tracepoints appear to only be available to GPL licensed modules due to the way certain kernel functions are exported. Thus, to actually make use of the tracepoints introduced by this patch, one might have to patch and re-compile the kernel; exporting the necessary functions to non-GPL modules. * Prior to upstream kernel version v3.14-rc6-30-g66cc69e, Linux tracepoints are not available for unsigned kernel modules (tracepoints will get disabled due to the module's 'F' taint). Thus, one either has to sign the zfs kernel module prior to loading it, or use a kernel versioned v3.14-rc6-30-g66cc69e or newer. Assuming the above two requirements are satisfied, lets look at an example of how this patch can be used and what information it exposes (all commands run as 'root'): # list all zfs tracepoints available $ ls /sys/kernel/debug/tracing/events/zfs enable filter zfs_arc__delete zfs_arc__evict zfs_arc__hit zfs_arc__miss zfs_l2arc__evict zfs_l2arc__hit zfs_l2arc__iodone zfs_l2arc__miss zfs_l2arc__read zfs_l2arc__write zfs_new_state__mfu zfs_new_state__mru # enable all zfs tracepoints, clear the tracepoint ring buffer $ echo 1 > /sys/kernel/debug/tracing/events/zfs/enable $ echo 0 > /sys/kernel/debug/tracing/trace # import zpool called 'tank', inspect tracepoint data (each line was # truncated, they're too long for a commit message otherwise) $ zpool import tank $ cat /sys/kernel/debug/tracing/trace | head -n35 # tracer: nop # # entries-in-buffer/entries-written: 1219/1219 #P:8 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss: hdr... z_rd_int/0-30156 [003] .... 91344.200611: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.201173: zfs_arc__miss: hdr... z_rd_int/1-30157 [003] .... 91344.201756: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.201795: zfs_arc__miss: hdr... z_rd_int/2-30158 [003] .... 91344.202099: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.202126: zfs_arc__hit: hdr ... lt-zpool-30132 [003] .... 91344.202130: zfs_arc__hit: hdr ... lt-zpool-30132 [003] .... 91344.202134: zfs_arc__hit: hdr ... lt-zpool-30132 [003] .... 91344.202146: zfs_arc__miss: hdr... z_rd_int/3-30159 [003] .... 91344.202457: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.202484: zfs_arc__miss: hdr... z_rd_int/4-30160 [003] .... 91344.202866: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.202891: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.203034: zfs_arc__miss: hdr... z_rd_iss/1-30149 [001] .... 91344.203749: zfs_new_state__mru... lt-zpool-30132 [001] .... 91344.203789: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.203878: zfs_arc__miss: hdr... z_rd_iss/3-30151 [001] .... 91344.204315: zfs_new_state__mru... lt-zpool-30132 [001] .... 91344.204332: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204337: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204352: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204356: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204360: zfs_arc__hit: hdr ... To highlight the kind of detailed information that is being exported using this infrastructure, I've taken the first tracepoint line from the output above and reformatted it such that it fits in 80 columns: lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss: hdr { dva 0x1:0x40082 birth 15491 cksum0 0x163edbff3a flags 0x640 datacnt 1 type 1 size 2048 spa 3133524293419867460 state_type 0 access 0 mru_hits 0 mru_ghost_hits 0 mfu_hits 0 mfu_ghost_hits 0 l2_hits 0 refcount 1 } bp { dva0 0x1:0x40082 dva1 0x1:0x3000e5 dva2 0x1:0x5a006e cksum 0x163edbff3a:0x75af30b3dd6:0x1499263ff5f2b:0x288bd118815e00 lsize 2048 } zb { objset 0 object 0 level -1 blkid 0 } For the specific tracepoint shown here, 'zfs_arc__miss', data is exported detailing the arc_buf_hdr_t (hdr), blkptr_t (bp), and zbookmark_t (zb) that caused the ARC miss (down to the exact DVA!). This kind of precise and detailed information can be extremely valuable when trying to answer certain kinds of questions. For anybody unfamiliar but looking to build on this, I found the XFS source code along with the following three web links to be extremely helpful: * http://lwn.net/Articles/379903/ * http://lwn.net/Articles/381064/ * http://lwn.net/Articles/383362/ I should also node the more "boring" aspects of this patch: * The ZFS_LINUX_COMPILE_IFELSE autoconf macro was modified to support a sixth paramter. This parameter is used to populate the contents of the new conftest.h file. If no sixth parameter is provided, conftest.h will be empty. * The ZFS_LINUX_TRY_COMPILE_HEADER autoconf macro was introduced. This macro is nearly identical to the ZFS_LINUX_TRY_COMPILE macro, except it has support for a fifth option that is then passed as the sixth parameter to ZFS_LINUX_COMPILE_IFELSE. These autoconf changes were needed to test the availability of the Linux tracepoint macros. Due to the odd nature of the Linux tracepoint macro API, a separate ".h" must be created (the path and filename is used internally by the kernel's define_trace.h file). * The HAVE_DECLARE_EVENT_CLASS autoconf macro was introduced. This is to determine if we can safely enable the Linux tracepoint functionality. We need to selectively disable the tracepoint code due to the kernel exporting certain functions as GPL only. Without this check, the build process will fail at link time. In addition, the SET_ERROR macro was modified into a tracepoint as well. To do this, the 'sdt.h' file was moved into the 'include/sys' directory and now contains a userspace portion and a kernel space portion. The dprintf and zfs_dbgmsg* interfaces are now implemented as tracepoint as well. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Update utsname supportBrian Behlendorf2014-10-173-39/+8
| | | | | | | | | | | | | | | | | | Modify the code to use the utsname() kernel function rather than a global variable. This results is cleaner more portable code because utsname() is already provided by the kernel and can be easily emulated in user space via uname(2). This means that it will behave consistently in both contexts. This is also has the benefit that it allows the removal of a few _KERNEL pre-processor conditions. And it also is a pre-requisite for a proper FUSE port because we need to provide a valid utsname. Finally, it allows us to remove this functionality from the SPL and all the related compatibility code. Signed-off-by: Brian Behlendorf <[email protected]> Issue #2757
* zfs send -p send properties only for snapshots that are actually sentAndriy Gapon2014-10-021-4/+25
| | | | | | | | | | | | | | | | ... as opposed to sending properties of all snapshots of the relevant filesystem. The previous behavior results in properties being set on all snapshots on the receiving side, which is quite slow. Behavior of zfs send -R is not changed. References: http://thread.gmane.org/gmane.comp.file-systems.openzfs.devel/346 Ported-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2729 Issue #2210
* FreeBSD PR kern/172259: Fixes zfs receive errorssmh2014-10-021-2/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | FreeBSD PR kern/172259: Fixes zfs receive errors caused by snapshot replication being processed in a random order instead of creation order. Eliminates needless filesystem renames caused by removed parent snapshots which subsequently causes many more errors. PR: kern/172259 Submitted by: Steven Hartland Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks References: https://github.com/freebsd/freebsd/commit/4995789 Porting notes: Minor whitespace fixes were made to conform with style requirements: lib/libzfs/libzfs_sendrecv.c: 2269: indent by spaces instead of tabs lib/libzfs/libzfs_sendrecv.c: 2270: indent by spaces instead of tabs Ported-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2729
* Implement -t option to zpool create for temporary pool namesRichard Yao2014-09-301-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | Creating virtual machines that have their rootfs on ZFS on hosts that have their rootfs on ZFS causes SPA namespace collisions when the standard name rpool is used. The solution is either to give each guest pool a name unique to the host, which is not always desireable, or boot a VM environment containing an ISO image to install it, which is cumbersome. 26b42f3f9d03f85cc7966dc2fe4dfe9216601b0e introduced `zpool import -t ...` to simplify situations where a host must access a guest's pool when there is a SPA namespace conflict. We build upon that to introduce `zpool import -t tname ...`. That allows us to create a pool whose in-core name is tname, but whose on-disk name is the normal name specified. This simplifies the creation of machine images that use a rootfs on ZFS. That benefits not only real world deployments, but also ZFSOnLinux development by decreasing the time needed to perform rootfs on ZFS experiments. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2417
* Make user stack limit configurableBrian Behlendorf2014-09-301-24/+24
| | | | | | | | | | | | | | | | | To aid in detecting and debugging stack overflow issues make the user space stack limit configurable via a new ZFS_STACK_SIZE environment variable. The value assigned to ZFS_STACK_SIZE will be used as the default stack size in bytes. Because this is mainly useful as a debugging aid in conjunction with ztest the stack limit is disabled by default. See the ztest(1) man page for additional details on using the ZFS_STACK_SIZE environment variable. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #2743 Issue #2293
* Illumos 5116 - zpool history -i goes into infinite loopMatthew Ahrens2014-09-231-4/+19
| | | | | | | | | | | | | | | | | | 5116 zpool history -i goes into infinite loop Reviewed by: Christopher Siden <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Boris Protopopov <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5116 https://github.com/illumos/illumos-gate/commit/3339867 Ported by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2715
* Illumos 5135 - zpool_find_import_cached() can use fnvlist_*Matthew Ahrens2014-09-231-11/+5
| | | | | | | | | | | | | | | Reviewed by: Christopher Siden <[email protected]> Reviewed by: Max Grossman <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5135 https://github.com/illumos/illumos-gate/commit/b18d6b0 Ported by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2693
* lib/libzpool/kernel.c: Assert no owners in rw_destroy()Richard Yao2014-09-231-1/+1
| | | | | | | | | This is intended to cause ztest to fail when rw_destroy() is called on a rwlock that has owners. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2330
* Properly NULL terminate string in zfs_strcmp_pathnameRichard Yao2014-09-231-1/+1
| | | | | | | | The utility cppcheck caught this. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2330
* Illumos 5147 - zpool list -v should show individual disk capacityGeorge Wilson2014-09-231-2/+12
| | | | | | | | | | | | | | | | | | | | The 'zpool list -v' command displays lots of info but excludes the capacity of each disk. This should be added. 5147 zpool list -v should show individual disk capacity Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5147 https://github.com/illumos/illumos-gate/commit/7a09f97 Ported by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2688
* Remove obsolete comment about guard pagesNed Bass2014-09-221-6/+0
| | | | | | | | | | Remove an obsolete comment that refers to code removed by commit 79c6e4c4. The code and comment related to space consumed by guard pages in user-space stacks, which we no longer take into account. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2722