aboutsummaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* Illumos 6171 - dsl_prop_unregister() slows down dataset eviction.Justin T. Gibbs2016-01-123-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | 6171 dsl_prop_unregister() slows down dataset eviction. Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Prakash Surya <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/6171 https://github.com/illumos/illumos-gate/commit/03bad06 Porting notes: - Conflicts - 3558fd7 Prototype/structure update for Linux - 2cf7f52 Linux compat 2.6.39: mount_nodev() - 13fe019 Illumos #3464 - 241b541 Illumos 5959 - clean up per-dataset feature count code - dsl_prop_unregister() preserved until out of tree consumers like Lustre can transition to dsl_prop_unregister_all(). - Fixing 'space or tab at end of line' in include/sys/dsl_dataset.h Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 5987 - zfs prefetch code needs workMatthew Ahrens2016-01-124-35/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5987 zfs prefetch code needs work Reviewed by: Adam Leventhal <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Approved by: Gordon Ross <[email protected]> References: https://www.illumos.org/issues/5987 zfs prefetch code needs work illumos/illumos-gate@cf6106c 5987 zfs prefetch code needs work Porting notes: - [module/zfs/dbuf.c] - 5f6d0b6 Handle block pointers with a corrupt logical size - [module/zfs/dmu_zfetch.c] - c65aa5b Fix gcc missing parenthesis warnings - 428870f Update core ZFS code from build 121 to build 141. - 79c76d5 Change KM_PUSHPAGE -> KM_SLEEP - b8d06fc Switch KM_SLEEP to KM_PUSHPAGE - Account for ISO C90 - mixed declarations and code - warnings - Module parameters (new/changed): - Replaced zfetch_block_cap with zfetch_max_distance (Max bytes to prefetch per stream (default 8MB; 8 * 1024 * 1024)) - Preserved zfs_prefetch_disable as 'int' for consistency with existing Linux module options. - [include/sys/trace_arc.h] - Added new tracepoints - DEFINE_ARC_BUF_HDR_EVENT(zfs_arc__sync__wait__for__async); - DEFINE_ARC_BUF_HDR_EVENT(zfs_arc__demand__hit__predictive__prefetch); - [man/man5/zfs-module-parameters.5] - Updated man page Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Revert "Illumos 3749 - zfs event processing should work on R/O root filesystems"Brian Behlendorf2016-01-112-2/+0
| | | | | | | | | | This reverts commit b47637ecdc7b647ec5bd9dfca888179eecfaa72d which introduced a regression in ztest. $ ./cmd/ztest/ztest -V 5 vdevs, 7 datasets, 23 threads, 300 seconds... *** Error in `/rpool/home/behlendo/src/git/zfs/cmd/ztest/.libs/lt-ztest': double free or corruption (fasttop): 0x0000000000d339f0 ***
* Illumos 4929 - want prevsnap propertyMatthew Ahrens2016-01-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | 4929 want prevsnap property Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Matt Amdur <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Reviewed by: Boris Protopopov <[email protected]> Reviewed by: Richard Lowe <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/4929 https://github.com/illumos/illumos-gate/commit/b461c74 Porting notes: - [include/sys/fs/zfs.h] - f67d70 Create an 'overlay' property - 11b9ec Add full SELinux support - [fs/zfs/dsl_dataset.c] - This increases the stack size of dsl_dataset_stats() but nothing has been changed until this is shown to be an issue. Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 4891 - want zdb option to dump all metadataMatthew Ahrens2016-01-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 4891 want zdb option to dump all metadata Reviewed by: Sonu Pillai <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Richard Lowe <[email protected]> Approved by: Garrett D'Amore <[email protected]> We'd like a way for zdb to dump metadata in a machine-readable format, so that we can bring that back from a customer site for in-house diagnosis. Think of it as a crash dump for zpools, which can be used for post-mortem analysis of a malfunctioning pool References: https://www.illumos.org/issues/4891 https://github.com/illumos/illumos-gate/commit/df15e41 Porting notes: - [cmd/zdb/zdb.c] - a5778ea zdb: Introduce -V for verbatim import - In main() getopt 'opt' variable removed and the code was brought back in line with illumos. - [lib/libzpool/kernel.c] - 1e33ac1 Fix Solaris thread dependency by using pthreads - f0e324f Update utsname support - 4d58b69 Fix vn_open/vn_rdwr error handling - In vn_open() allocate 'dumppath' on heap instead of stack - Properly handle 'dump_fd == -1' error path - Free 'realpath' after added vn_dumpdir_code block Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 3749 - zfs event processing should work on R/O root filesystemsWill Andrews2016-01-112-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | 3749 zfs event processing should work on R/O root filesystems Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Eric Schrock <[email protected]> Approved by: Christopher Siden <[email protected]> References: https://www.illumos.org/issues/3749 https://github.com/illumos/illumos-gate/commit/3cb69f7 Porting notes: - [include/sys/spa_impl.h] - ffe9d38 Add generic errata infrastructure - 1421c89 Add visibility in to arc_read - [include/sys/fm/fs/zfs.h] - 2668527 Add linux events - 6283f55 Support custom build directories and move includes - [module/zfs/spa_config.c] - Updated spa_config_sync() to match illumos with the exception of a Linux specific block. Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 5960, 5925Paul Dagnelie2016-01-0810-24/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5960 zfs recv should prefetch indirect blocks 5925 zfs receive -o origin= Reviewed by: Prakash Surya <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> References: https://www.illumos.org/issues/5960 https://www.illumos.org/issues/5925 https://github.com/illumos/illumos-gate/commit/a2cdcdd Porting notes: - [lib/libzfs/libzfs_sendrecv.c] - b8864a2 Fix gcc cast warnings - 325f023 Add linux kernel device support - 5c3f61e Increase Linux pipe buffer size on 'zfs receive' - [module/zfs/zfs_vnops.c] - 3558fd7 Prototype/structure update for Linux - c12e3a5 Restructure zfs_readdir() to fix regressions - [module/zfs/zvol.c] - Function @zvol_map_block() isn't needed in ZoL - 9965059 Prefetch start and end of volumes - [module/zfs/dmu.c] - Fixed ISO C90 - mixed declarations and code - Function dmu_prefetch() 'int i' is initialized before the following code block (c90 vs. c99) - [module/zfs/dbuf.c] - fc5bb51 Fix stack dbuf_hold_impl() - 9b67f60 Illumos 4757, 4913 - 34229a2 Reduce stack usage for recursive traverse_visitbp() - [module/zfs/dmu_send.c] - Fixed ISO C90 - mixed declarations and code - b58986e Use large stacks when available - 241b541 Illumos 5959 - clean up per-dataset feature count code - 77aef6f Use vmem_alloc() for nvlists - 00b4602 Add linux kernel memory support Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 5746 - more checksumming in zfs sendMatthew Ahrens2015-12-303-4/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5746 more checksumming in zfs send Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Bayard Bell <[email protected]> Approved by: Albert Lee <[email protected]> References: https://www.illumos.org/issues/5746 https://github.com/illumos/illumos-gate/commit/98110f0 https://github.com/zfsonlinux/zfs/issues/905 Porting notes: - Minor conflicts due to: - https://github.com/zfsonlinux/zfs/commit/2024041 - https://github.com/zfsonlinux/zfs/commit/044baf0 - https://github.com/zfsonlinux/zfs/commit/88904bb - Fix ISO C90 warnings (-Werror=declaration-after-statement) - arc_buf_t *abuf; - dmu_buf_t *bonus; - zio_cksum_t cksum_orig; - zio_cksum_t *cksump; - Fix format '%llx' format specifier warning - Align message in zstreamdump safe_malloc() with upstream Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3611
* Prevent SA length overflowNed Bass2015-12-301-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function sa_update() accepts a 32-bit length parameter and assigns it to a 16-bit field in sa_bulk_attr_t, potentially truncating the passed-in value. This could lead to corrupt system attribute (SA) records getting written to the pool. Add a VERIFY to sa_update() to detect cases where overflow would occur. The SA length is limited to 16-bit values by the on-disk format defined by sa_hdr_phys_t. The function zfs_sa_set_xattr() is vulnerable to this bug if the unpacked nvlist of xattrs is less than 64k in size but the packed size is greater than 64k. Fix this by appropriately checking the size of the packed nvlist before calling sa_update(). Add error handling to zpl_xattr_set_sa() to keep the cached list of SA-based xattrs consistent with the data on disk. Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned transaction if sa_update() returns an error, but the DMU only allows unassigned transactions to be aborted. Wrap the sa_update() call in a VERIFY0, remove the transaction abort, and call dmu_tx_commit() unconditionally. This is consistent practice with other callers of sa_update(). Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #4150
* Illumos 5745 - zfs set allows only one dataset property to be set at a timeChris Williamson2015-12-291-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5745 zfs set allows only one dataset property to be set at a time Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Bayard Bell <[email protected]> Reviewed by: Richard PALO <[email protected]> Reviewed by: Steven Hartland <[email protected]> Approved by: Rich Lowe <[email protected]> References: https://www.illumos.org/issues/5745 https://github.com/illumos/illumos-gate/commit/3092556 Porting notes: - Fix the missing braces around initializer, zfs_cmd_t zc = {"\0"}; - Remove extra format argument in zfs_do_set() - Declare at the top: - zfs_prop_t prop; - nvpair_t *elem; - nvpair_t *next; - int i; - Additionally initialize: - int added_resv = 0; - zfs_prop_t prop = 0; - Assign 0 install of NULL for uint64_t types. - zc->zc_nvlist_conf = '\0'; - zc->zc_nvlist_src = '\0'; - zc->zc_nvlist_dst = '\0'; Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3574
* Add lock types RW_NOLOCKDEP and MUTEX_NOLOCKDEPOlaf Faaland2015-12-221-0/+2
| | | | | | | | | | | | | | | Both lock types were introduced in SPL to allow some locks to be taken/released with linux lockdep turned off. See SPL commit for details. Add the new lock types to zfs_context.h to allow user space compilation. Depends on SPL commit 692ae8d SPL pull request refs/pull/480/head Signed-off-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3895
* Fix vdev_queue_aggregate() deadlockBrian Behlendorf2015-12-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This deadlock may manifest itself in slightly different ways but at the core it is caused by a memory allocation blocking on file- system reclaim in the zio pipeline. This is normally impossible because zio_execute() disables filesystem reclaim by setting PF_FSTRANS on the thread. However, kmem cache allocations may still indirectly block on file system reclaim while holding the critical vq->vq_lock as shown below. To resolve this issue zio_buf_alloc_flags() is introduced which allocation flags to be passed. This can then be used in vdev_queue_aggregate() with KM_NOSLEEP when allocating the aggregate IO buffer. Since aggregating the IO is purely a performance optimization we want this to either succeed or fail quickly. Trying too hard to allocate this memory under the vq->vq_lock can negatively impact performance and result in this deadlock. * z_wr_iss zio_vdev_io_start vdev_queue_io -> Takes vq->vq_lock vdev_queue_io_to_issue vdev_queue_aggregate zio_buf_alloc -> Waiting on spl_kmem_cache process * z_wr_int zio_vdev_io_done vdev_queue_io_done mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss * txg_sync spa_sync dsl_pool_sync zio_wait -> Waiting on zio being handled by z_wr_int * spl_kmem_cache spl_cache_grow_work kv_alloc spl_vmalloc ... evict zpl_evict_inode zfs_inactive dmu_tx_wait txg_wait_open -> Waiting on txg_sync Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #3808 Closes #3867
* Use uio for zvol_{read,write}Chunwei Chen2015-12-152-3/+2
| | | | | | | | | Since uio now supports bvec, we can convert bio into uio and reuse dmu_{read,write}_uio. This way, we can remove some duplicate code. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4078
* Use spa as key besides objsetid for snapentryChunwei Chen2015-12-081-1/+2
| | | | | | | | | | | | | objsetid is not unique across pool, so using it solely as key would cause panic when automounting two snapshot on different pools with the same objsetid. We fix this by adding spa pointer as additional key. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Issue #3948 Issue #3786 Issue #3887
* Illumos 5959 - clean up per-dataset feature count codeMatthew Ahrens2015-12-042-10/+30
| | | | | | | | | | | | | | | | | | | | | | | 5959 clean up per-dataset feature count code Reviewed by: Toomas Soome <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Alex Reece <[email protected]> Approved by: Richard Lowe <[email protected]> References: https://www.illumos.org/issues/5959 https://github.com/illumos/illumos-gate/commit/ca0cc39 Porting notes: illumos code doesn't check for feature_get_refcount() returning ENOTSUP (which means feature is disabled) in zdb. zfsonlinux added a check in https://github.com/zfsonlinux/zfs/commit/784652c due to #3468. The check was reintroduced here. Ported-by: Witaut Bajaryn <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3965
* Add zap_prefetch() interfaceBrian Behlendorf2015-12-041-0/+1
| | | | | | | | | | | Provide a generic interface to prefetch ZAP entries by name. This functionality is being added for external consumers such as Lustre. It is based of the existing zap_prefetch_uint64() version which is used by the deduplication code. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #4061
* Ext4's typical GPT partition type not recognizedilovezfs2015-12-041-0/+126
| | | | | | | | | | | | | | | | | | | | | Adding additional entries to the efi conversion array will help prevent the overwriting of the GPTs of disks with in-use file systems in more cases. Most notably, this adds partition type 8300 "Linux filesystem" (0FC63DAF-8483-4772-8E79-3D69D8477DE4), which is often used for ext4 and btrfs, among others. This commit itself does nothing to address the underlying problematic behavior that check_slice() isn't called on partitions of an unrecognized type, even when they contain a currently mounted file system. The additional entries were derived from these two resources: https://en.wikipedia.org/wiki/GUID_Partition_Table http://sourceforge.net/p/gptfdisk/code/ci/master/tree/parttypes.cc Signed-off-by: ilovezfs <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4016
* Illumos 934 - FreeBSD's GPT not recognizedYuri Pankov2015-12-041-48/+59
| | | | | | | | | | | | | | | | Reviewed by: Alexander Eremin <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Reviewed by: Andrew Stormont <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Gordon Ross <[email protected]> References: https://www.illumos.org/issues/934 https://github.com/illumos/illumos-gate/commit/e21ea67 Ported-by: ilovezfs <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4016
* Only trigger SET_ERROR tracepoint event on errorRichard Yao2015-12-021-0/+9
| | | | | | | | | | | | Currently, the SET_ERROR tracepoint triggers regardless of whether there is an error or not. On Illumos, SET_ERROR only triggers on an actual error, which is avoids irrelevant noise. Linux 2.6.38 added support for conditional tracepoints, so we modify SET_ERROR to use them when they are avaliable for functionality equivalent to the Illumos functionality. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4043
* Linux 4.4 compat: xattr operations takes xattr_handlerChunwei Chen2015-12-011-0/+28
| | | | | | | | | | The xattr_hander->{list,get,set} were changed to take a xattr_handler, and handler_flags argument was removed and should be accessed by handler->flags. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4021
* Illumos 6267 - dn_bonus evicted too earlyJustin T. Gibbs2015-10-132-2/+17
| | | | | | | | | | | | | | | | | 6267 dn_bonus evicted too early Reviewed by: Richard Yao <[email protected]> Reviewed by: Xin LI <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Richard Lowe <[email protected]> References: https://www.illumos.org/issues/6267 https://github.com/illumos/illumos-gate/commit/d205810 Signed-off-by: Brian Behlendorf <[email protected]> Ported-by: Ned Bass [email protected] Issue #3865 Issue #3443
* Linux 4.3 compat: bio_end_io_t / BIO_UPTODATELukas Wunner2015-09-251-12/+9
| | | | | | | | | | | | | | | | | | Commit torvalds/linux@4246a0b63bd8f56a1469b12eafeb875b1041a451 ("block: add a bi_error field to struct bio") dropped the error argument from bio_endio in favor of newly introduced bio->bi_error. This also replaces bio->bi_flags value BIO_UPTODATE. bio_endio was a 3 argument function until Linux 2.6.24, which made it a 2 argument function, and now the prototype has changed yet again to a 1 argument function. Support for pre 2.6.24 kernels was already dropped with 37f9dac592bf ("zvol processing should use struct bio") which assumed the 2 argument version in zvol_request(). Remaining code to support the 3 argument version is hereby removed. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Issue #3799
* Add large block support to zpios(1) benchmarkDon Brady2015-09-222-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As part of the large block support effort, it makes sense to add support for large blocks to **zpios(1)**. The specifying of a zfs block size for zpios is optional and will default to 128K if the block size is not specified. `zpios ... -S size | --blocksize size ...` This will use *size* ZFS blocks for each test, specified as a comma delimited list with an optional unit suffix. The supported range is powers of two from 128K through 16M. A range of block sizes can be tested as follows: `-S 128K,256K,512K,1M` Example run below (non realistic results from a VM and output abbreviated for space) ``` --regioncount=750 --regionsize=8M --chunksize=1M --offset=4K --threaddelay=0 --cleanup --human-readable --verbose --cleanup --blocksize=128K,256K,512K,1M th-cnt rg-cnt rg-sz ch-sz blksz wr-data wr-bw rd-data rd-bw --------------------------------------------------------------------- 4 750 8m 1m 128k 5g 90.06m 5g 93.37m 4 750 8m 1m 256k 5g 79.71m 5g 99.81m 4 750 8m 1m 512k 5g 42.20m 5g 93.14m 4 750 8m 1m 1m 5g 35.51m 5g 89.36m 8 750 8m 1m 128k 5g 85.49m 5g 90.81m 8 750 8m 1m 256k 5g 61.42m 5g 99.24m 8 750 8m 1m 512k 5g 49.09m 5g 108.78m 16 750 8m 1m 128k 5g 86.28m 5g 88.73m 16 750 8m 1m 256k 5g 64.34m 5g 93.47m 16 750 8m 1m 512k 5g 68.84m 5g 124.47m 16 750 8m 1m 1m 5g 53.97m 5g 97.20m --------------------------------------------------------------------- ``` Signed-off-by: Don Brady <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3795 Closes #2071
* Honor xattr=sa dataset propertyNed Bass2015-09-192-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | ZFS incorrectly uses directory-based extended attributes even when xattr=sa is specified as a dataset property or mount option. Support to honor temporary mount options including "xattr" was added in commit 0282c4137e7409e6d85289f4955adf07fac834f5. There are two issues with the mount option handling: * Libzfs has historically included "xattr" in its list of default mount options. This overrides the dataset property, so the dataset is always configured to use directory-based xattrs even when the xattr dataset property is set to off or sa. Address this by removing "xattr" from the set of default mount options in libzfs. * There was no way to enable system attribute-based extended attributes using temporary mount options. Add the mount options "saxattr" and "dirxattr" which enable the xattr behavior their names suggest. This approach has the advantages of mirroring the valid xattr dataset property values and following existing conventions for mount option names. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3787
* Illumos 6214 - zpools going southArne Jansen2015-09-112-14/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | 6214 zpools going south Reviewed by: Igor Kozhukhov <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> References: https://www.illumos.org/issues/6214 http://cr.illumos.org/~webrev/sensille/6214_zpools_going_south/ Porting Notes: Reintroduce b_compress to the l2arc_buf_hdr_t. In commit b9541d6 the compression flags were moved to the generic b_flags in the arc_buf_hdr_t. This is a problem because l2arc_compress_buf() may manipulate the compression flags and this can only be done safely under the hash lock which is not held. See Illumos 6214 for a detailed analysis of the race. HDR_GET_COMPRESS() macro was removed from arc_buf_info(). Ported-by: Brian Behlendorf <[email protected]> Closes #3757
* Reintroduce IO accounting on zvols on Linux 3.19+Richard Yao2015-09-091-0/+5
| | | | | | | | | | | | | | zfsonlinux/zfs@e20cd6f7a8922709b1aa2ecefd783390102d79e0 caused us to lose IO accounting on zvols. When I originally wrote that last year, the symbols we needed to maintain IO accounting were GPL exported, but torvalds/linux@394ffa503bc40e32d7f54a9b817264e81ce131b4 provided suitable symbols for restoring this functionality 4 months later. We can call them to restore the IO accounting on Linux 3.19 and later as well as any older kernels where that patch is backported. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3741
* Add dbgmsg kstatBrian Behlendorf2015-09-042-34/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Internally ZFS keeps a small log to facilitate debugging. By default the log is disabled, to enable it set zfs_dbgmsg_enable=1. The contents of the log can be accessed by reading the /proc/spl/kstat/zfs/dbgmsg file. Writing 0 to this proc file clears the log. $ echo 1 >/sys/module/zfs/parameters/zfs_dbgmsg_enable $ echo 0 >/proc/spl/kstat/zfs/dbgmsg $ zpool import tank $ cat /proc/spl/kstat/zfs/dbgmsg 1 0 0x01 -1 0 2492357525542 2525836565501 timestamp message 1441141408 spa=tank async request task=1 1441141408 txg 70 open pool version 5000; software version 5000/5; ... 1441141409 spa=tank async request task=32 1441141409 txg 72 import pool version 5000; software version 5000/5; ... 1441141414 command: lt-zpool import tank Note the zfs_dbgmsg() and dprintf() functions are both now mapped to the same log. As mentioned above the kernel debug log can be accessed though the /proc/spl/kstat/zfs/dbgmsg kstat. For user space consumers log messages are immediately written to stdout after applying the ZFS_DEBUG environment variable. $ ZFS_DEBUG=on ./cmd/ztest/ztest -V Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #3728
* Merge branch 'zvol'Brian Behlendorf2015-09-042-181/+23
|\ | | | | | | | | | | | | | | Performance improvements for zvols. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3720
| * Remove blk_rq_bytes()/blk_rq_sectors autotools checksRichard Yao2015-09-041-23/+0
| | | | | | | | Signed-off-by: Richard Yao <[email protected]>
| * Remove blk_rq_pos() autotools checkRichard Yao2015-09-041-8/+0
| | | | | | | | Signed-off-by: Richard Yao <[email protected]>
| * Remove blk_fetch_request() autotools checkRichard Yao2015-09-041-14/+0
| | | | | | | | Signed-off-by: Richard Yao <[email protected]>
| * Remove blk_requeue_request() autotools checkRichard Yao2015-09-041-8/+0
| | | | | | | | Signed-off-by: Richard Yao <[email protected]>
| * Remove blk_end_request() autotools check.Richard Yao2015-09-041-74/+0
| | | | | | | | Signed-off-by: Richard Yao <[email protected]>
| * Remove rq_is_sync() autotools checkRichard Yao2015-09-041-8/+0
| | | | | | | | Signed-off-by: Richard Yao <[email protected]>
| * Remove rq_for_each_segment() autotools checkRichard Yao2015-09-041-42/+0
| | | | | | | | Signed-off-by: Richard Yao <[email protected]>
| * zvol processing should use struct bioRichard Yao2015-09-042-3/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Internally, zvols are files exposed through the block device API. This is intended to reduce overhead when things require block devices. However, the ZoL zvol code emulates a traditional block device in that it has a top half and a bottom half. This is an unnecessary source of overhead that does not exist on any other OpenZFS platform does this. This patch removes it. Early users of this patch reported double digit performance gains in IOPS on zvols in the range of 50% to 80%. Comments in the code suggest that the current implementation was done to obtain IO merging from Linux's IO elevator. However, the DMU already does write merging while arc_read() should implicitly merge read IOs because only 1 thread is permitted to fetch the buffer into ARC. In addition, commercial ZFSOnLinux distributions report that regular files are more performant than zvols under the current implementation, and the main consumers of zvols are VMs and iSCSI targets, which have their own elevators to merge IOs. Some minor refactoring allows us to register zfs_request() as our ->make_request() handler in place of the generic_make_request() function. This eliminates the layer of code that broke IO requests on zvols into a top half and a bottom half. This has several benefits: 1. No per zvol spinlocks. 2. No redundant IO elevator processing. 3. Interrupts are disabled only when actually necessary. 4. No redispatching of IOs when all taskq threads are busy. 5. Linux's page out routines will properly block. 6. Many autotools checks become obsolete. An unfortunate consequence of eliminating the layer that generic_make_request() is that we no longer calls the instrumentation hooks for block IO accounting. Those hooks are GPL-exported, so we cannot call them ourselves and consequently, we lose the ability to do IO monitoring via iostat. Since zvols are internally files mapped as block devices, this should be okay. Anyone who is willing to accept the performance penalty for the block IO layer's accounting could use the loop device in between the zvol and its consumer. Alternatively, perf and ftrace likely could be used. Also, tools like latencytop will still work. Tools such as latencytop sometimes provide a better view of performance bottlenecks than the traditional block IO accounting tools do. Lastly, if direct reclaim occurs during spacemap loading and swap is on a zvol, this code will deadlock. That deadlock could already occur with sync=always on zvols. Given that swap on zvols is not yet production ready, this is not a blocker. Signed-off-by: Richard Yao <[email protected]>
| * VDEV_REQ_FUA should be mapped to REQ_FUARichard Yao2015-09-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | Pre-2.6.37 kernels support REQ_FUA in request flags, but not in BIO flags. zvols are the only consumer of VDEV_REQ_FUA and since they are passed requests, they should be obey the REQ_FUA flag like later kernels. This optimization will only matter on 2.6.36 and 2.6.37 because the zvol rework changes things to use bio, where we no longer are able to distinguish on earlier kernels Signed-off-by: Richard Yao <[email protected]>
* | Add temporary mount optionsBrian Behlendorf2015-09-034-8/+131
|/ | | | | | | | | | | | | Add the required kernel side infrastructure to parse arbitrary mount options. This enables us to support temporary mount options in largely the same way it is handled on other platforms. See the 'Temporary Mount Point Properties' section of zfs(8) for complete details. Signed-off-by: Brian Behlendorf <[email protected]> Closes #985 Closes #3351
* Dbuf hash table should be sized as is the arc hash tableTim Chase2015-09-021-0/+3
| | | | | | | | | | | Commit 49ddb315066e372f31bda29a5c546a9eccc8b418 added the zfs_arc_average_blocksize parameter to allow control over the size of the arc hash table. The dbuf hash table's size should be determined similarly. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3721
* Disable LBA weighting on files and SSDsRichard Yao2015-09-011-0/+1
| | | | | | | | | | | | | | The LBA weighting makes sense on rotational media where the outer tracks have twice the bandwidth of the inner tracks. However, it is detrimental on nonrotational media such as solid state disks, where the only effect is to ensure that metaslabs enter the best-fit allocation behavior sooner, which is detrimental to performance. It also makes no sense on files where the underlying filesystem can arrange things however it wants. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3712
* Remove blk_queue_io_opt() autotools checkRichard Yao2015-09-011-9/+0
| | | | | | | | This is needed for supporting kernels earlier than 2.6.30. Support for those kernels was dropped, so we can safely remove this check. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Remove blk_queue_physical_block_size() autotools checkRichard Yao2015-09-011-10/+0
| | | | | | | | This is needed for supporting kernels earlier than 2.6.30. Support for those kernels was dropped, so we can safely remove this check. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 3.18 compat: Snapshot auto-mountingBrian Behlendorf2015-08-313-21/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-factor the .zfs/snapshot auto-mouting code to take in to account changes made to the upstream kernels. And to lay the groundwork for enabling access to .zfs snapshots via NFS clients. This patch makes the following core improvements. * All actively auto-mounted snapshots are now tracked in two global trees which are indexed by snapshot name and objset id respectively. This allows for fast lookups of any auto-mounted snapshot regardless without needing access to the parent dataset. * Snapshot entries are added to the tree in zfsctl_snapshot_mount(). However, they are now removed from the tree in the context of the unmount process. This eliminates the need complicated error logic in zfsctl_snapshot_unmount() to handle unmount failures. * References are now taken on the snapshot entries in the tree to ensure they always remain valid while a task is outstanding. * The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right after the auto-mount succeeds. This allows to kernel to unmount idle auto-mounted snapshots if needed removing the need for the zfsctl_unmount_snapshots() function. * Snapshots in active use will not be automatically unmounted. As long as at least one dentry is revalidated every zfs_expire_snapshot/2 seconds the auto-unmount expiration timer will be extended. * Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3589 Closes #3344 Closes #3295 Closes #3257 Closes #3243 Closes #3030 Closes #2841
* Check large block feature flag on volumesBrian Behlendorf2015-08-281-1/+1
| | | | | | | | | | | | | | | Since ZoL allows large blocks to be used by volumes, unlike upstream illumos, the feature flag must be checked prior to volume creation. This is critical because unlike filesystems, volumes will create a object which uses large blocks as part of the create. Therefore, it cannot be safely checked in zfs_check_settable() after the dataset can been created. In addition this patch updates the relevant error messages to use zfs_nicenum() to print the maximum blocksize. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3591
* Add compatibility layer for {kmap,kunmap}_atomicChunwei Chen2015-08-242-1/+42
| | | | | | | | | Starting from linux-2.6.37, {kmap,kunmap}_atomic takes 1 argument instead of 2. We use zfs_{kmap,kunmap}_atomic as wrappers and always take 2 argument, but ignore the 2nd for newer kernel. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Fix build failure with Linux 4.1 and FTRACEChris Dunlop2015-08-181-0/+3
| | | | | | | | See also #3546, commit c1718e9 Signed-off-by: Chris Dunlop <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3673
* Fix build failure with Linux 4.1 and FTRACEFrédéric VANNIÈRE2015-07-298-0/+24
| | | | | | Signed-off-by: Frédéric VANNIÈRE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3546
* Align thread priority with Linux defaultsBrian Behlendorf2015-07-281-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under Linux filesystem threads responsible for handling I/O are normally created with the maximum priority. Non-I/O filesystem processes run with the default priority. ZFS should adopt the same priority scheme under Linux to maintain good performance and so that it will complete fairly when other Linux filesystems are active. The priorities have been updated to the following: $ ps -eLo rtprio,cls,pid,pri,nice,cmd | egrep 'z_|spl_|zvol|arc|dbu|meta' - TS 10743 19 -20 [spl_kmem_cache] - TS 10744 19 -20 [spl_system_task] - TS 10745 19 -20 [spl_dynamic_tas] - TS 10764 19 0 [dbu_evict] - TS 10765 19 0 [arc_prune] - TS 10766 19 0 [arc_reclaim] - TS 10767 19 0 [arc_user_evicts] - TS 10768 19 0 [l2arc_feed] - TS 10769 39 0 [z_unmount] - TS 10770 39 -20 [zvol] - TS 11011 39 -20 [z_null_iss] - TS 11012 39 -20 [z_null_int] - TS 11013 39 -20 [z_rd_iss] - TS 11014 39 -20 [z_rd_int_0] - TS 11022 38 -19 [z_wr_iss] - TS 11023 39 -20 [z_wr_iss_h] - TS 11024 39 -20 [z_wr_int_0] - TS 11032 39 -20 [z_wr_int_h] - TS 11033 39 -20 [z_fr_iss_0] - TS 11041 39 -20 [z_fr_int] - TS 11042 39 -20 [z_cl_iss] - TS 11043 39 -20 [z_cl_int] - TS 11044 39 -20 [z_ioctl_iss] - TS 11045 39 -20 [z_ioctl_int] - TS 11046 39 -20 [metaslab_group_] - TS 11050 19 0 [z_iput] - TS 11121 38 -19 [z_wr_iss] Note that under Linux the meaning of a processes priority is inverted with respect to illumos. High values on Linux indicate a _low_ priority while high value on illumos indicate a _high_ priority. In order to preserve the logical meaning of the minclsyspri and maxclsyspri macros when they are used by the illumos wrapper functions their values have been inverted. This way when changes are merged from upstream illumos we won't need to remember to invert the macro. It could also lead to confusion. This patch depends on https://github.com/zfsonlinux/spl/pull/466. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #3607
* Illumos 5817 - change type of arcs_size from uint64_t to refcount_tPrakash Surya2015-07-231-1/+1
| | | | | | | | | | | | | | | | | 5817 change type of arcs_size from uint64_t to refcount_t Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Alex Reece <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Garrett D'Amore <[email protected]> References: https://www.illumos.org/issues/5817 https://github.com/illumos/illumos-gate/commit/2fd872a Ported-by: Brian Behlendorf <[email protected]> Issue #3533
* Illumos 5376 - arc_kmem_reap_now() should not result in clearing arc_no_growMatthew Ahrens2015-07-231-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5376 arc_kmem_reap_now() should not result in clearing arc_no_grow Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Steven Hartland <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5376 https://github.com/illumos/illumos-gate/commit/2ec99e3 Porting Notes: The good news is that many of the recent changes made upstream to the ARC tackled issues previously observed by ZoL with similar solutions. The bad news is those solution weren't identical to the ones we applied. This patch is designed to split the difference and apply as much of the upstream work as possible. * The arc_available_memory() function was removed previous in ZoL but due to the upstream changes it makes sense to add it back. This function has been customized for Linux so that it can be used to determine a low memory. This provides the same basic functionality as the illumos version allowing us to minimize changes through the rest of the code base. The exact mechanism used to detect a low memory state remains unchanged so this change isn't a significant as it might first appear. * This patch includes the long standing fix for arc_shrink() which was originally proposed in #2167. Since there were related changes to this function it made sense to include that work. * The arc_init() function has been re-factored. As before it sets sane default values for the ARC but then calls arc_tuning_update() to apply user specific tuning made via module options. The arc_tuning_update() function is then called periodically by the arc_reclaim_thread() to apply changes to the tunings made during normal operation. Ported-by: Brian Behlendorf <[email protected]> Closes #3616 Closes #2167