summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Either _ILP32 or _LP64 must be definedBrian Behlendorf2015-12-101-15/+27
| | | | | | | | | | | For some arm, powerpc, and sparc platforms it was possible that neither _ILP32 of _LP64 would be defined. Update the isa_defs.h header to explicitly set these macros and generate a compile error in the case neither are defined. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #4048
* Use spa as key besides objsetid for snapentryChunwei Chen2015-12-083-12/+26
| | | | | | | | | | | | | objsetid is not unique across pool, so using it solely as key would cause panic when automounting two snapshot on different pools with the same objsetid. We fix this by adding spa pointer as additional key. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Issue #3948 Issue #3786 Issue #3887
* Use large stacks when availableBrian Behlendorf2015-12-074-19/+67
| | | | | | | | | | | | | | | | | While stack size will vary by architecture it has historically defaulted to 8K on x86_64 systems. However, as of Linux 3.15 the default thread stack size was increased to 16K. These kernels are now the default in most non- enterprise distributions which means we no longer need to assume 8K stacks. This patch takes advantage of that fact by appropriately reverting stack conservation changes which were made to ensure stability. Changes which may have had a negative impact on performance for certain workloads. This also has the side effect of bringing the code slightly more in line with upstream. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #4059
* Update arcstat.py to remove deprecated rmis reference.cable29992015-12-041-2/+2
| | | | | | | | | Running arcstat.py -x currently throws KeyError due to rmis being absent, it was removed in commit ca0bf58. Signed-off-by: cable2999 <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3931
* Fix cstyle issue from 7a02327ilovezfs2015-12-041-2/+2
| | | | | | | | Continuations should be indented four spaces. Signed-off-by: ilovezfs <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4062
* Illumos 5959 - clean up per-dataset feature count codeMatthew Ahrens2015-12-0411-178/+237
| | | | | | | | | | | | | | | | | | | | | | | 5959 clean up per-dataset feature count code Reviewed by: Toomas Soome <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Alex Reece <[email protected]> Approved by: Richard Lowe <[email protected]> References: https://www.illumos.org/issues/5959 https://github.com/illumos/illumos-gate/commit/ca0cc39 Porting notes: illumos code doesn't check for feature_get_refcount() returning ENOTSUP (which means feature is disabled) in zdb. zfsonlinux added a check in https://github.com/zfsonlinux/zfs/commit/784652c due to #3468. The check was reintroduced here. Ported-by: Witaut Bajaryn <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3965
* Add zap_prefetch() interfaceBrian Behlendorf2015-12-042-0/+24
| | | | | | | | | | | Provide a generic interface to prefetch ZAP entries by name. This functionality is being added for external consumers such as Lustre. It is based of the existing zap_prefetch_uint64() version which is used by the deduplication code. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #4061
* Ext4's typical GPT partition type not recognizedilovezfs2015-12-042-1/+187
| | | | | | | | | | | | | | | | | | | | | Adding additional entries to the efi conversion array will help prevent the overwriting of the GPTs of disks with in-use file systems in more cases. Most notably, this adds partition type 8300 "Linux filesystem" (0FC63DAF-8483-4772-8E79-3D69D8477DE4), which is often used for ext4 and btrfs, among others. This commit itself does nothing to address the underlying problematic behavior that check_slice() isn't called on partitions of an unrecognized type, even when they contain a currently mounted file system. The additional entries were derived from these two resources: https://en.wikipedia.org/wiki/GUID_Partition_Table http://sourceforge.net/p/gptfdisk/code/ci/master/tree/parttypes.cc Signed-off-by: ilovezfs <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4016
* Illumos 934 - FreeBSD's GPT not recognizedYuri Pankov2015-12-042-49/+66
| | | | | | | | | | | | | | | | Reviewed by: Alexander Eremin <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Reviewed by: Andrew Stormont <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Gordon Ross <[email protected]> References: https://www.illumos.org/issues/934 https://github.com/illumos/illumos-gate/commit/e21ea67 Ported-by: ilovezfs <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4016
* Only trigger SET_ERROR tracepoint event on errorRichard Yao2015-12-021-0/+9
| | | | | | | | | | | | Currently, the SET_ERROR tracepoint triggers regardless of whether there is an error or not. On Illumos, SET_ERROR only triggers on an actual error, which is avoids irrelevant noise. Linux 2.6.38 added support for conditional tracepoints, so we modify SET_ERROR to use them when they are avaliable for functionality equivalent to the Illumos functionality. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4043
* Fix zdb_dump_block on little endian systemsChunwei Chen2015-12-021-0/+4
| | | | | | | | | | | | | | | | When dumping a block on a little endian system the data must be byte swapped to display correctly. Example incorrect output: $ echo 0123456789abcdef > aaa $ zdb -eR pp 3:1ee00:200 3:1ee00:200 0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef 000000: 3736353433323130 6665646362613938 0123456789abcdef 000010: 000000000000000a 0000000000000000 ................ Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4020
* Fix zdb calling behavior in ztestChunwei Chen2015-12-021-13/+41
| | | | | | | | | | | | | | The current zdb calling behaviour is really fragile, and is guaranteed to segfault if ztest is not installed in either /sbin or /usr/sbin. With this patch, the ztest will try to call zdb in the following order. 1. Use environmental variable ZDB_PATH if provided. 2. If ztest resides in build tree, guess the in tree zdb path. 3. Just pass zdb to popen and let it search it in PATH. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3126
* Prevent rm modules.* when make installtuxoko2015-12-021-1/+1
| | | | | | | | | | | This was originally in fe0ed8f910c1e4288dc190546cfe98ecf545b547, but somehow was changed and not working anymore. And it will cause the following error: modprobe: ERROR: ../libkmod/libkmod.c:506 lookup_builtin_file() could not open builtin file '/lib/modules/4.2.0-18-generic/modules.builtin.bin' Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4027
* Fix --enable-linux-builtinBrian Behlendorf2015-12-021-0/+2
| | | | | | | | | | Adding VPATH support, commit 47a4a6f, required that a `src` and `obj` line be added to the top of the Makefiles. They must be removed from the Makefiles when builtin. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/spl#481 Issue zfsonlinux/spl#498
* Linux 4.4 compat: xattr operations takes xattr_handlerChunwei Chen2015-12-013-2/+164
| | | | | | | | | | The xattr_hander->{list,get,set} were changed to take a xattr_handler, and handler_flags argument was removed and should be accessed by handler->flags. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4021
* Linux 4.4 compat: make_request_fn returns blk_qc_tChunwei Chen2015-12-012-2/+26
| | | | | | | | | | | As part of block polling support in Linux 4.4, make_request_fn should return a cookie value of type blk_qc_t. For now, we make zvol_request always return BLK_QC_T_NONE until we assess whether and how we want to support block polling. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4021
* Fix zfs_dirty_data_max overflow on 32-bittuxoko2015-11-191-2/+2
| | | | | | | | | | | | | On 32 bit, the calculation of zfs_dirty_data_max from phymem will overflow, causing it to be smaller than zfs_dirty_data_sync, and will cause txg being delayed while no one write to disk. The end result is horrendous write speed. On 4G ram 32-bit VM, before this patch, simple dd results in ~7MB/s. Now it can reach speed on par with 64-bit VM. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3973
* Fix null pointer in arc_kmem_reap_now on 32-bittuxoko2015-11-191-0/+5
| | | | | | | | | On 32 bit system, zio_buf_cache is limit to 1M. Larger than that is all NULL. So we need to avoid reaping them. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3973
* Fix snapshot automount behavior when concurrent or failChunwei Chen2015-11-191-17/+32
| | | | | | | | | | | | | | | | | | | | When concurrent threads accessing the snapdir, one will succeed the user helper mount while others will get EBUSY. However, the original code treats those EBUSY threads as success and goes on to do zfsctl_snapshot_add, which causes repeated avl_add and thus panic. Also, if the snapshot is already mounted somewhere else, a thread accessing the snapdir will also get EBUSY from user helper mount. And it will cause strange things as doing follow_down_one will fail and then follow_up will jump up to the mountpoint of the filesystem and confuse the hell out of VFS. The patch fix both behavior by returning 0 immediately for the EBUSY threads. Note, this will have a side effect for the second case where the VFS will retry several times before returning ELOOP. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4018
* sysmacros: Make P2ROUNDUP not trigger int overflowJason Zaman2015-11-161-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The original P2ROUNDUP and P2ROUNDUP_TYPED macros contain -x which triggers PaX's integer overflow detection for unsigned integers. Replace the macros with an equivalent version that does not trigger the overflow. Axioms: A. (-(x)) === (~((x) - 1)) === (~(x) + 1) under two's complement. B. ~(x & y) === ((~(x)) | (~(y))) under De Morgan's law. C. ~(~x) === x under the law of excluded middle. Proof: 0. (-(-(x) & -(align))) original 1. (~(-(x) & -(align)) + 1) by A 2. (((~(-(x))) | (~(-(align)))) + 1) by B 3. (((~(~((x) - 1))) | (~(~((align) - 1)))) + 1) by A 4. (((((x) - 1)) | (((align) - 1))) + 1) by C Q.E.D. Signed-off-by: Jason Zaman <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3949
* zimport.sh: Add configure/make option supportBrian Behlendorf2015-11-161-7/+12
| | | | | | | | | | | | Allow the following environment variables to control the build behavior of the zimport.sh script. This can be useful when you want a debug build or require specific build options. The default values are: CONFIG_OPTIONS="" MAKE_OPTIONS="-s -j$(nproc)" Signed-off-by: Brian Behlendorf <[email protected]>
* Follow 0/-E convention for module load errorsBrian Behlendorf2015-11-162-6/+2
| | | | | | | | | | Because errors during module load are so rare it went unnoticed that it was possible that a positive errno was returned. This would result in the module being loaded, nothing being initialized, and a system panic shortly thereafter. This is what was causing the hard failures in the automated testing. Signed-off-by: Brian Behlendorf <[email protected]>
* Obey arc_meta_limit default size when changing arc_maxAndCycle2015-11-131-1/+1
| | | | | | | | | | When decreasing the maximum ARC size preserve the 3/4 default ratio for the arc_meta_limit. Otherwise, the arc_meta_limit may be set the same as arc_max. Signed-off-by: AndCycle <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4001
* Add TEST configuration file for buildbotBrian Behlendorf2015-11-101-0/+95
| | | | | | | | | | | | The TEST file is provided as a hint to the automated test infra- structure. It controls which regression tests are run and how they are run. This file along with any lines in the commit messages which start with TEST_* are sourced by the test scripts and can be used to override the default values. For complete details see: https://github.com/zfsonlinux/zfs-buildbot/ Signed-off-by: Brian Behlendorf <[email protected]>
* Fix maybe uninitializedBrian Behlendorf2015-11-091-1/+1
| | | | | | | | | | | | As of gcc 5.1.1 20150618 (Red Hat 5.1.1-4) the -Werror=maybe-uninitialized check detects that 'snapname' in recv_incremental_replication() may not be initialized. Explicitly initialize the variable to resolved the warning. libzfs_sendrecv.c: In function ‘recv_incremental_replication’: libzfs_sendrecv.c:2019:2: error: ‘snapname’ may be used uninitialized in (void) snprintf(buf, sizeof (buf), "%s@%s", fsname, snapname); Signed-off-by: Brian Behlendorf <[email protected]>
* Remove shareiscsi description and example from zfs(8).Turbo Fredriksson2015-10-131-47/+9
| | | | | Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Unmount is part of the shutdown process, not the boot process.Turbo Fredriksson2015-10-131-1/+1
| | | | | | Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes: #3762
* Fix fail path in zfs_znode_allocChunwei Chen2015-10-131-2/+1
| | | | | | | | | | | | | | | | When sa_bulk_lookup() fails, unlock_new_inode() will spit out a WARNING. It will also recursive deadlock on ZFS_OBJ_HOLD_ENTER in zfs_zinactive(). Since we never call insert_inode_locked in fail path, I_NEW is never set, the inode is never hashed. So unlock_new_inode() can be safely remove it. We set z_sa_hdl to NULL in fail path so that iput path will stop at zfs_inactive() without entering zfs_zinactive(). This way we can avoid the deadlock and prevent double sa_handle_destroy(). Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3899
* Fix use-after-free in vdev_disk_physio_completionChunwei Chen2015-10-131-7/+10
| | | | | | | | | | | | | | | Currently, vdev_disk_physio_completion will try to wake up an waiter without first checking the existence. This creates a race window in which complete is called after dr is freed. We add dr_wait in dio_request to indicate the existence of waiter. Also, remove dr_rw since no one is using it, and reorder dr_ref to make the struct more compact in 64bit. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3917 Issue #3880
* Illumos 6267 - dn_bonus evicted too earlyJustin T. Gibbs2015-10-135-43/+38
| | | | | | | | | | | | | | | | | 6267 dn_bonus evicted too early Reviewed by: Richard Yao <[email protected]> Reviewed by: Xin LI <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Richard Lowe <[email protected]> References: https://www.illumos.org/issues/6267 https://github.com/illumos/illumos-gate/commit/d205810 Signed-off-by: Brian Behlendorf <[email protected]> Ported-by: Ned Bass [email protected] Issue #3865 Issue #3443
* zfs-import: Perform verbatim import using cache fileJames Lee2015-10-132-64/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change modifies the import service to use the default cache file to perform a verbatim import of pools at boot. This fixes code that searches all devices and imported all visible pools. Using the cache file is in keeping with the way ZFS has always worked, how Solaris, Illumos, FreeBSD, and systemd performs imports, and is how it is written in the man page (zpool(1M,8)): All pools in this cache are automatically imported when the system boots. Importantly, the cache contains important information for importing multipath devices, and helps control which pools get imported in more dynamic environments like SANs, which may have thousands of visible and constantly changing pools, which the ZFS_POOL_EXCEPTIONS variable is not equipped to handle. Verbatim imports prevent rogue pools from being automatically imported and mounted where they shouldn't be. The change also stops the service from exporting pools at shutdown. Exporting pools is only meant to be performed explicitly by the administrator of the system. The old behavior of searching and importing all visible pools is preserved and can be switched on by heeding the warning and toggling the ZPOOL_IMPORT_ALL_VISIBLE variable in /etc/default/zfs. Signed-off-by: James Lee <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3777 Closes #3526
* zdb: segfault in dump_bpobj_subobjs()Tim Chase2015-10-131-2/+2
| | | | | | | | | Avoid buffer overrun on all-zero bpobj subobjects by using signed array index. Also fix the type cast on the printf() argument. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3905
* libzfs: handle EDOM errorsDHE2015-10-131-0/+5
| | | | | | | | | | | EDOM may occur if a user tries to set `recordsize` too large without use "zfs set". This can be demonstrated with: > zpool create testpool -O recordsize=32M /dev/... Signed-off-by: DHE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3911
* Fix 'arc_c < arc_c_min' panicBrian Behlendorf2015-10-131-1/+2
| | | | | | | | | Strictly enforce keeping 'arc_c >= arc_c_min'. The ASSERTs are left in place to catch this in a debug build but logic has been added to gracefully handle in a production build. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3904
* Rename 'zed.service' to 'zfs-zed.service'Turbo Fredriksson2015-10-023-3/+6
| | | | | | | | | | | | | | | For consistency all systemd unit files and init scripts now share the same names. This prevents an issue where the zed is started twice on systems where both the systemd and sysv infrastructure is installed concurrently. For backward compatibility a 'zed' alias has been added. This allows the user to interact with the service using either the name 'zed' or 'zfs-zed'. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3837
* Fix zfs-dkms uninstall/updateBrian Behlendorf2015-10-021-5/+6
| | | | | | | | | | | | | Modern versions of dkms cleanup the build directory after installing. This resulted in 'dkms uninstall' never running because the check added by commit 866c162 which verifies the existance of the zfs.release build product would never be true. This patch resolves the issue by updating the conditional to check in the explicitly installed zfs_config.h file for the version. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3862
* zfs_inode_update should not call dmu_object_size_from_db under spinlockRichard Yao2015-09-301-2/+4
| | | | | | | | | | | We should never block when holding a spin lock, but zfs_inode_update can block in the critical section of a spin lock in zfs_inode_update: zfs_inode_update -> dmu_object_size_from_db -> zrl_add -> mutex_enter Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3858
* Remove obsolete zv_lockRichard Yao2015-09-301-2/+0
| | | | | | | | | All users of zv_lock were removed by 37f9dac, but we forgot to remove it. Lets remove it as clean up. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3858
* Init script fixesTurbo Fredriksson2015-09-297-63/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Fix regression - "OVERLAY_MOUNTS" should have been "DO_OVERLAY_MOUNTS". * Fix update-rc.d commands in postinst. Thanx to subzero79@GitHub. * Fix make sure a filesystem exists before trying to mount in mount_fs() * Fix local variable usage. * Fix to read_mtab(): * Strip control characters (space - \040) from /proc/mounts GLOBALY, not just first occurrence. * Don't replace unprintable characters ([/-. ]) for use in the variable name with underscore. No need, just remove them all together. * Add check_boolean() to check if a user configure option is set ('yes', 'Yes', 'YES' or any combination there of) OR '1'. Anything else is considered 'unset'. * Add a ZFS_POOL_IMPORT to the default config. * This is a semi colon separated list of pools to import ONLY. * This is intended for systems which have _a lot_ of pools (from a SAN for example) and it would be to many to put in the ZFS_POOL_EXCEPTIONS variable.. * Add a config option "ZPOOL_IMPORT_OPTS" for adding additional options to "zpool import". * Add documentation and the chance of overriding the ZPOOL_CACHE variable in the config file. * Remove "sort" from find_pools() and setup_snapshot_booting(). Sometimes not available, and not really necessary. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Issue #3816
* Fix uioskip crash when skip to endChunwei Chen2015-09-291-2/+4
| | | | | | | | | | | When doing uioskip to skip an iovec to the very end, the current loop condition will falsely check pass the end of iovec. We fix this checking uio_iovcnt first. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3806 Closes #3850
* Userspace can pass zero length segments via writev/readvRichard Yao2015-09-251-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Userspace can trigger an assertion by passing a zero-length segment when assertions are enabled: [27961.614792] VERIFY3(skip < iov->iov_len) failed (0 < 0) [27961.614795] PANIC at zfs_uio.c:187:uio_prefaultpages() [27961.614805] Call Trace: [27961.614811] dump_stack+0x45/0x57 [27961.614830] spl_dumpstack+0x44/0x50 [spl] [27961.614834] spl_panic+0xbb/0x100 [spl] [27961.614908] uio_prefaultpages+0x134/0x140 [zcommon] [27961.614930] zfs_write+0x1fd/0xe80 [zfs] [27961.615014] zpl_write_common_iovec+0x7f/0x110 [zfs] [27961.615035] zpl_iter_write+0xa0/0xd0 [zfs] [27961.615037] do_iter_readv_writev+0x59/0x80 [27961.615063] do_readv_writev+0x11b/0x260 [27961.615098] vfs_writev+0x39/0x50 [27961.615100] SyS_writev+0x4a/0xe0 [27961.615103] system_call_fastpath+0x16/0x6e The solution is to delete the assertion. This could potentially occur in uiomove as well, which contains analogous assertions that appear similarly unnecessary, so we remove those as well. Reported-by: Jonathan Vasquez <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Issue #3792
* Revert "dmu_objset_userquota_get_ids uses dn_bonus unsafely"Brian Behlendorf2015-09-251-17/+3
| | | | | | | | | | | | This reverts commit 5f8e1e850522ee5cd37366427da4b4101a71c8a8. It was determined that this patch introduced the quota regression described in #3789. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3443 Issue #3789
* Fix synchronous behavior in __vdev_disk_physio()Brian Behlendorf2015-09-253-81/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed side-effect of making the vdev_disk_io_start() synchronous for certain I/Os. This in turn resulted in vdev_disk_io_start() being able to re-dispatch zio's which would result in a RCU stalls when a disk was removed from the system. Additionally, this could negatively impact performance and explains the performance regressions reported in both #3829 and #3780. This patch resolves the issue by making the blocking behavior dependent on a 'wait' flag being passed rather than overloading the passed bio flags. Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to non-rotational devices where there is no benefit to queuing to aggregate the I/O. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3652 Issue #3780 Issue #3785 Issue #3817 Issue #3821 Issue #3829 Issue #3832 Issue #3870
* Avoid blocking in arc_reclaim_thread()Brian Behlendorf2015-09-251-11/+4
| | | | | | | | | | | | | | | | | | | | | | As described in the comment above arc_reclaim_thread() it's critical that the reclaim thread be careful about blocking. Just like it must never wait on a hash lock, it must never wait on a task which can in turn wait on the CV in arc_get_data_buf(). This will deadlock, see issue #3822 for full backtraces showing the problem. To resolve this issue arc_kmem_reap_now() has been updated to use the asynchronous arc prune function. This means that arc_prune_async() may now be called while there are still outstanding arc_prune_tasks. However, this isn't a problem because arc_prune_async() already keeps a reference count preventing multiple outstanding tasks per registered consumer. Functionally, this behavior is the same as the counterpart illumos function dnlc_reduce_cache(). Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Issue #3808 Issue #3834 Issue #3822
* Disable zpl_nr_cached_objects() callbackBrian Behlendorf2015-09-251-14/+1
| | | | | | | | | | | The zpl_nr_cached_objects() function has been disabled because in the current code it doesn't provide any critical functionality and it may result in a deadlock under certain circumstances. However, because we expect to need these hooks in the future this code has not been entirely removed. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3719
* Allow NFS activity to defer snapshot unmountsBrian Behlendorf2015-09-251-2/+13
| | | | | | | | | | Accessing a snapshot via NFS should cause an auto-unmount of that snapshot to be deferred until such as time as the snapshot is idle. This is analogous to the zpl_revalidate logic employed by locally mounted snapshots. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3794
* Linux 4.3 compat: bio_end_io_t / BIO_UPTODATELukas Wunner2015-09-254-40/+34
| | | | | | | | | | | | | | | | | | Commit torvalds/linux@4246a0b63bd8f56a1469b12eafeb875b1041a451 ("block: add a bi_error field to struct bio") dropped the error argument from bio_endio in favor of newly introduced bio->bi_error. This also replaces bio->bi_flags value BIO_UPTODATE. bio_endio was a 3 argument function until Linux 2.6.24, which made it a 2 argument function, and now the prototype has changed yet again to a 1 argument function. Support for pre 2.6.24 kernels was already dropped with 37f9dac592bf ("zvol processing should use struct bio") which assumed the 2 argument version in zvol_request(). Remaining code to support the 3 argument version is hereby removed. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Issue #3799
* Fixed --signal typoyuina8222015-09-221-1/+1
| | | | | | Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3773
* Add extra_started_commands because reload function is not defaultyuina8222015-09-221-0/+2
| | | | | | Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3773
* Add large block support to zpios(1) benchmarkDon Brady2015-09-227-17/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As part of the large block support effort, it makes sense to add support for large blocks to **zpios(1)**. The specifying of a zfs block size for zpios is optional and will default to 128K if the block size is not specified. `zpios ... -S size | --blocksize size ...` This will use *size* ZFS blocks for each test, specified as a comma delimited list with an optional unit suffix. The supported range is powers of two from 128K through 16M. A range of block sizes can be tested as follows: `-S 128K,256K,512K,1M` Example run below (non realistic results from a VM and output abbreviated for space) ``` --regioncount=750 --regionsize=8M --chunksize=1M --offset=4K --threaddelay=0 --cleanup --human-readable --verbose --cleanup --blocksize=128K,256K,512K,1M th-cnt rg-cnt rg-sz ch-sz blksz wr-data wr-bw rd-data rd-bw --------------------------------------------------------------------- 4 750 8m 1m 128k 5g 90.06m 5g 93.37m 4 750 8m 1m 256k 5g 79.71m 5g 99.81m 4 750 8m 1m 512k 5g 42.20m 5g 93.14m 4 750 8m 1m 1m 5g 35.51m 5g 89.36m 8 750 8m 1m 128k 5g 85.49m 5g 90.81m 8 750 8m 1m 256k 5g 61.42m 5g 99.24m 8 750 8m 1m 512k 5g 49.09m 5g 108.78m 16 750 8m 1m 128k 5g 86.28m 5g 88.73m 16 750 8m 1m 256k 5g 64.34m 5g 93.47m 16 750 8m 1m 512k 5g 68.84m 5g 124.47m 16 750 8m 1m 1m 5g 53.97m 5g 97.20m --------------------------------------------------------------------- ``` Signed-off-by: Don Brady <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3795 Closes #2071