aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Make xattr dir truncate and remove in one txChunwei Chen2015-12-281-8/+15
| | | | | | | | | | | | | | | We need truncate and remove be in the same tx when doing zfs_rmnode on xattr dir. Otherwise, if we truncate and crash, we'll end up with inconsistent zap object on the delete queue. We do this by skipping dmu_free_long_range and let zfs_znode_delete to do the work. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4114 Issue #4052 Issue #4006 Issue #3018 Issue #2861
* Fix empty xattr dir causing lockupChunwei Chen2015-12-281-0/+10
| | | | | | | | | | | | | | | | | | During zfs_rmnode on a xattr dir, if the system crash just after dmu_free_long_range, we would get empty xattr dir in delete queue. This would cause blkid=0 be passed into zap_get_leaf_byblk when doing zfs_purgedir during mount, and would try to do rw_enter on a wrong structure and cause system lockup. We fix this by returning ENOENT when blkid is zero in zap_get_leaf_byblk. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4114 Closes #4052 Closes #4006 Closes #3018 Closes #2861
* Fix z_xattr_lock/z_teardown_lock inversionBrian Behlendorf2015-12-221-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | There exists a lock inversion between the z_xattr_lock and the z_teardown_lock. Resolve this by taking the z_teardown_lock in all registered xattr callbacks prior to taking the z_xattr_lock. This ensures the locks are always taken is the same order thus preventing a deadlock. Note the z_teardown_lock is taken again in zfs_lookup() and this is safe because the z_teardown lock is a re-entrant read reader/writer lock. * process-1 zpl_xattr_get -> Takes zp->z_xattr_lock __zpl_xattr_get zfs_lookup -> Takes zsb->z_teardown_lock in ZFS_ENTER macro * process-2 zfs_ioc_recv -> Takes zsb->z_teardown_lock in zfs_suspend_fs() zfs_resume_fs zfs_rezget -> Takes zp->z_xattr_lock Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #3943 Closes #3969 Closes #4121
* Revert "Fix z_xattr_lock/z_teardown_lock lock inversion"Brian Behlendorf2015-12-221-10/+1
| | | | This reverts commit 6b32ef572f754efc3f9edb20d022450f8e6b02d9.
* Fix ztest truncated cache fileBrian Behlendorf2015-12-221-2/+3
| | | | | | | | | | | Commit efc412b updated spa_config_write() for Linux 4.2 kernels to truncate and overwrite rather than rename the cache file. This is the correct fix but it should have only been applied for the kernel build. In user space rename(2) is needed because ztest depends on the cache file. Signed-off-by: Brian Behlendorf <[email protected]> Closes #4129
* Identify locks flagged by lockdepOlaf Faaland2015-12-228-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running a kernel with CONFIG_LOCKDEP=y, lockdep reports possible recursive locking in some cases and possible circular locking dependency in others, within the SPL and ZFS modules. This patch uses a mutex type defined in SPL, MUTEX_NOLOCKDEP, to mark such mutexes when they are initialized. This mutex type causes attempts to take or release those locks to be wrapped in lockdep_off() and lockdep_on() calls to silence the dependency checker and allow the use of lock_stats to examine contention. For RW locks, it uses an analogous lock type, RW_NOLOCKDEP. The goal is that these locks are ultimately changed back to type MUTEX_DEFAULT or RW_DEFAULT, after the locks are annotated to reflect their relationship (e.g. z_name_lock below) or any real problem with the lock dependencies are fixed. Some of the affected locks are: tc_open_lock: ============= This is an array of locks, all with same name, which txg_quiesce must take all of in order to move txg to next state. All default to the same lockdep class, and so to lockdep appears recursive. zp->z_name_lock: ================ In zfs_rmdir, dzp = znode for the directory (input to zfs_dirent_lock) zp = znode for the entry being removed (output of zfs_dirent_lock) zfs_rmdir()->zfs_dirent_lock() takes z_name_lock in dzp zfs_rmdir() takes z_name_lock in zp Since both dzp and zp are type znode_t, the locks have the same default class, and lockdep considers it a possible recursive lock attempt. l->l_rwlock: ============ zap_expand_leaf() sometimes creates two new zap leaf structures, via these call paths: zap_deref_leaf()->zap_get_leaf_byblk()->zap_leaf_open() zap_expand_leaf()->zap_create_leaf()->zap_expand_leaf()->zap_create_leaf() Because both zap_leaf_open() and zap_create_leaf() initialize l->l_rwlock in their (separate) leaf structures, the lockdep class is the same, and the linux kernel believes these might both be the same lock, and emits a possible recursive lock warning. Signed-off-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3895
* Add lock types RW_NOLOCKDEP and MUTEX_NOLOCKDEPOlaf Faaland2015-12-221-0/+2
| | | | | | | | | | | | | | | Both lock types were introduced in SPL to allow some locks to be taken/released with linux lockdep turned off. See SPL commit for details. Add the new lock types to zfs_context.h to allow user space compilation. Depends on SPL commit 692ae8d SPL pull request refs/pull/480/head Signed-off-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3895
* Skip GPL-only symbols test when cross-compilingKamil Domański2015-12-181-8/+10
| | | | | | Signed-off-by: Kamil Domański <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4107
* Make zio_taskq_batch_pct user configurableDHE2015-12-182-1/+23
| | | | | | | | | Adds zio_taskq_batch_pct as an exported module parameter, allowing users to modify it at module load time. Signed-off-by: DHE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4110
* Activate LVM volume groups before looking for zpools.Benjamin Albrecht2015-12-182-2/+63
| | | | | | | | Original-patch-by: @jgoerzen Signed-off-by: Benjamin Albrecht <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes zfsonlinux/pkg-zfs#102 Closes #4029
* Man page white space and spelling correctionsNed Bass2015-12-187-136/+136
| | | | | | | | | Correct some misspelled words and grammatical errors, and remove trailing white space in the man pages. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4115
* Fix zfs_vdev_aggregation_limit bounds checkingBrian Behlendorf2015-12-181-11/+8
| | | | | | | | | | | | | Update the bounds checking for zfs_vdev_aggregation_limit so that it has a floor of zero and a maximum value of the supported block size for the pool. Additionally add an early return when zfs_vdev_aggregation_limit equals zero to disable aggregation. For very fast solid state or memory devices it may be more expensive to perform the aggregation than to issue the IO immediately. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix vdev_queue_aggregate() deadlockBrian Behlendorf2015-12-183-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This deadlock may manifest itself in slightly different ways but at the core it is caused by a memory allocation blocking on file- system reclaim in the zio pipeline. This is normally impossible because zio_execute() disables filesystem reclaim by setting PF_FSTRANS on the thread. However, kmem cache allocations may still indirectly block on file system reclaim while holding the critical vq->vq_lock as shown below. To resolve this issue zio_buf_alloc_flags() is introduced which allocation flags to be passed. This can then be used in vdev_queue_aggregate() with KM_NOSLEEP when allocating the aggregate IO buffer. Since aggregating the IO is purely a performance optimization we want this to either succeed or fail quickly. Trying too hard to allocate this memory under the vq->vq_lock can negatively impact performance and result in this deadlock. * z_wr_iss zio_vdev_io_start vdev_queue_io -> Takes vq->vq_lock vdev_queue_io_to_issue vdev_queue_aggregate zio_buf_alloc -> Waiting on spl_kmem_cache process * z_wr_int zio_vdev_io_done vdev_queue_io_done mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss * txg_sync spa_sync dsl_pool_sync zio_wait -> Waiting on zio being handled by z_wr_int * spl_kmem_cache spl_cache_grow_work kv_alloc spl_vmalloc ... evict zpl_evict_inode zfs_inactive dmu_tx_wait txg_wait_open -> Waiting on txg_sync Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #3808 Closes #3867
* Fix z_xattr_lock/z_teardown_lock lock inversionBrian Behlendorf2015-12-181-1/+10
| | | | | | | | | | | | | | | | | | | | | There exists a lock inversion between the z_xattr_lock and the z_teardown_lock. Detect this case and return EBUSY so zfs_resume_fs() will mark the inode stale and it can be safely revalidated on next access. * process-1 zpl_xattr_get -> Takes zp->z_xattr_lock __zpl_xattr_get zfs_lookup -> Takes zsb->z_teardown_lock in ZFS_ENTER macro * process-2 zfs_ioc_recv -> Takes zsb->z_teardown_lock in zfs_suspend_fs() zfs_resume_fs zfs_rezget -> Takes zp->z_xattr_lock Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #3969
* Use uio for zvol_{read,write}Chunwei Chen2015-12-154-172/+24
| | | | | | | | | Since uio now supports bvec, we can convert bio into uio and reuse dmu_{read,write}_uio. This way, we can remove some duplicate code. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4078
* Fix uio_prefaultpages for 0 length iovecChunwei Chen2015-12-151-5/+6
| | | | | | | | | | | | Userspace can freely pass in whatever iovec it feels like, and it's perfectly legal to pass an iovec which contains a zero length segment. In the current implementation, uio_prefaultpages would touch an out of bound byte in the "last byte" logic. While this probably wouldn't cause any critical error, we would like uio_prefaultpages to be able to continue gracefully. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4078
* Handle damaged blk_birth in dsl_deadlist_insert()Brian Behlendorf2015-12-151-0/+8
| | | | | | | | | | | | | | | | | If a bit were cleared in `bp->blk_birth` such that the txg birth was now lower than any other txg_birth in the deadlist, then there will be no entry before this in the tree. This should be impossible but regardless error handling code has been added for this case. By default this is left as a fatal case and the blk_birth is logged. However, setting `zfs_recover=1` will cause the bp to be placed at the start of the deadlist even though it contains an invalid blk_birth. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #4086 Closes #4089
* Handle block pointers with a corrupt logical sizeBrian Behlendorf2015-12-151-9/+3
| | | | | | | | | | | Commit 5f6d0b6 was originally added to gracefully handle block pointers with a damaged logical size. However, it incorrectly assumed that all passed arc_done_func_t could handle a NULL arc_buf_t. Signed-off-by: Brian Behlendorf <[email protected]> Closes #4069 Closes #4080
* Remove "index" column from dbufstat.pyOlaf Faaland2015-12-151-4/+3
| | | | | | | | | | | | | | Commit ca0bf58d to address arcs_mtx contention removed column "index" from the output of kstats/dbuf. dbufstat.py was not updated to reflect this, which causes it to crash when run with -bx This removes "index" from hardcoded lists of columns. Signed-off-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4096
* Revert "Switch ztest mmap(2) ASSERTs to VERIFYs"Richard Yao2015-12-141-3/+3
| | | | | | | | | | This reverts commit 202619623022722f30c2ee49931a4fa6896421c7. It is no longer necessary now that we pass -DDEBUG unconditionally. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4095
* Unconditionally build zdb and ztest with -DDEBUGRichard Yao2015-12-142-0/+3
| | | | | | | | | | | | | | | | | Illumos unconditionally builds zdb and ztest with -DDEBUG. This helps catch bugs and eliminates the need for commits like 202619623022722f30c2ee49931a4fa6896421c7, which changed ASSERTs to VERIFYs. The following files in the illumos tree show this: usr/src/cmd/zdb/Makefile.com usr/src/cmd/ztest/Makefile.com Given the usefulness of having early failure in these tools, we should do it too. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4095
* Hold the zfs_snapentry_t before dispatchBrian Behlendorf2015-12-141-1/+1
| | | | | | | | | While exceptionally unlikely to cause a problem the zfs_snapentry_t hold should be taken before the dispatch to prevent any possibility of the task being processed before the hold. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]>
* Fix snapshot automount race cause EREMOTEChunwei Chen2015-12-141-1/+1
| | | | | | | | | | | When a concorrent mount finishes just before calling to zfsctl_snapshot_ismounted, if we return EISDIR, the VFS will return with EREMOTE. We should instead just return 0, so VFS may retry and would likely notice the dentry is alreadly mounted. This will be inline with when usermode helper return EBUSY. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Change zfs_snapshot_lock from mutex to rw lockBrian Behlendorf2015-12-141-26/+26
| | | | | | | | | | By changing the zfs_snapshot_lock from a mutex to a rw lock the zfsctl_lookup_objset() function can be allowed to run concurrently. This should reduce the latency of fh_to_dentry lookups in ZFS snapshots which are being accessed over NFS. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]>
* Fix zfsctl_lookup_objset() deadlockBrian Behlendorf2015-12-141-1/+2
| | | | | | | | | | | | The zfsctl_snapshot_unmount_delay() function must not be called from zfsctl_lookup_objset() while it is currently holding the zfs_snapshot_lock. This will result in a deadlock. It is safe to call zfsctl_snapshot_unmount_delay_impl() directly because the function already has a reference on the zfs_snapentry_t. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #3997
* Set 'zfs_expire_snapshot=0' to disable auto-unmountBrian Behlendorf2015-12-141-0/+8
| | | | | | | | | There are cases where it's desirable that auto-mounted snapshots not expire after a fixed duration. They should be unmounted only when the filesystem they are a snapshot of is unmounted. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]>
* Either _ILP32 or _LP64 must be definedBrian Behlendorf2015-12-101-15/+27
| | | | | | | | | | | For some arm, powerpc, and sparc platforms it was possible that neither _ILP32 of _LP64 would be defined. Update the isa_defs.h header to explicitly set these macros and generate a compile error in the case neither are defined. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #4048
* Use spa as key besides objsetid for snapentryChunwei Chen2015-12-083-12/+26
| | | | | | | | | | | | | objsetid is not unique across pool, so using it solely as key would cause panic when automounting two snapshot on different pools with the same objsetid. We fix this by adding spa pointer as additional key. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Issue #3948 Issue #3786 Issue #3887
* Use large stacks when availableBrian Behlendorf2015-12-074-19/+67
| | | | | | | | | | | | | | | | | While stack size will vary by architecture it has historically defaulted to 8K on x86_64 systems. However, as of Linux 3.15 the default thread stack size was increased to 16K. These kernels are now the default in most non- enterprise distributions which means we no longer need to assume 8K stacks. This patch takes advantage of that fact by appropriately reverting stack conservation changes which were made to ensure stability. Changes which may have had a negative impact on performance for certain workloads. This also has the side effect of bringing the code slightly more in line with upstream. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #4059
* Update arcstat.py to remove deprecated rmis reference.cable29992015-12-041-2/+2
| | | | | | | | | Running arcstat.py -x currently throws KeyError due to rmis being absent, it was removed in commit ca0bf58. Signed-off-by: cable2999 <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3931
* Fix cstyle issue from 7a02327ilovezfs2015-12-041-2/+2
| | | | | | | | Continuations should be indented four spaces. Signed-off-by: ilovezfs <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4062
* Illumos 5959 - clean up per-dataset feature count codeMatthew Ahrens2015-12-0411-178/+237
| | | | | | | | | | | | | | | | | | | | | | | 5959 clean up per-dataset feature count code Reviewed by: Toomas Soome <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Alex Reece <[email protected]> Approved by: Richard Lowe <[email protected]> References: https://www.illumos.org/issues/5959 https://github.com/illumos/illumos-gate/commit/ca0cc39 Porting notes: illumos code doesn't check for feature_get_refcount() returning ENOTSUP (which means feature is disabled) in zdb. zfsonlinux added a check in https://github.com/zfsonlinux/zfs/commit/784652c due to #3468. The check was reintroduced here. Ported-by: Witaut Bajaryn <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3965
* Add zap_prefetch() interfaceBrian Behlendorf2015-12-042-0/+24
| | | | | | | | | | | Provide a generic interface to prefetch ZAP entries by name. This functionality is being added for external consumers such as Lustre. It is based of the existing zap_prefetch_uint64() version which is used by the deduplication code. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #4061
* Ext4's typical GPT partition type not recognizedilovezfs2015-12-042-1/+187
| | | | | | | | | | | | | | | | | | | | | Adding additional entries to the efi conversion array will help prevent the overwriting of the GPTs of disks with in-use file systems in more cases. Most notably, this adds partition type 8300 "Linux filesystem" (0FC63DAF-8483-4772-8E79-3D69D8477DE4), which is often used for ext4 and btrfs, among others. This commit itself does nothing to address the underlying problematic behavior that check_slice() isn't called on partitions of an unrecognized type, even when they contain a currently mounted file system. The additional entries were derived from these two resources: https://en.wikipedia.org/wiki/GUID_Partition_Table http://sourceforge.net/p/gptfdisk/code/ci/master/tree/parttypes.cc Signed-off-by: ilovezfs <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4016
* Illumos 934 - FreeBSD's GPT not recognizedYuri Pankov2015-12-042-49/+66
| | | | | | | | | | | | | | | | Reviewed by: Alexander Eremin <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Reviewed by: Andrew Stormont <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Gordon Ross <[email protected]> References: https://www.illumos.org/issues/934 https://github.com/illumos/illumos-gate/commit/e21ea67 Ported-by: ilovezfs <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4016
* Only trigger SET_ERROR tracepoint event on errorRichard Yao2015-12-021-0/+9
| | | | | | | | | | | | Currently, the SET_ERROR tracepoint triggers regardless of whether there is an error or not. On Illumos, SET_ERROR only triggers on an actual error, which is avoids irrelevant noise. Linux 2.6.38 added support for conditional tracepoints, so we modify SET_ERROR to use them when they are avaliable for functionality equivalent to the Illumos functionality. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4043
* Fix zdb_dump_block on little endian systemsChunwei Chen2015-12-021-0/+4
| | | | | | | | | | | | | | | | When dumping a block on a little endian system the data must be byte swapped to display correctly. Example incorrect output: $ echo 0123456789abcdef > aaa $ zdb -eR pp 3:1ee00:200 3:1ee00:200 0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef 000000: 3736353433323130 6665646362613938 0123456789abcdef 000010: 000000000000000a 0000000000000000 ................ Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4020
* Fix zdb calling behavior in ztestChunwei Chen2015-12-021-13/+41
| | | | | | | | | | | | | | The current zdb calling behaviour is really fragile, and is guaranteed to segfault if ztest is not installed in either /sbin or /usr/sbin. With this patch, the ztest will try to call zdb in the following order. 1. Use environmental variable ZDB_PATH if provided. 2. If ztest resides in build tree, guess the in tree zdb path. 3. Just pass zdb to popen and let it search it in PATH. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3126
* Prevent rm modules.* when make installtuxoko2015-12-021-1/+1
| | | | | | | | | | | This was originally in fe0ed8f910c1e4288dc190546cfe98ecf545b547, but somehow was changed and not working anymore. And it will cause the following error: modprobe: ERROR: ../libkmod/libkmod.c:506 lookup_builtin_file() could not open builtin file '/lib/modules/4.2.0-18-generic/modules.builtin.bin' Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4027
* Fix --enable-linux-builtinBrian Behlendorf2015-12-021-0/+2
| | | | | | | | | | Adding VPATH support, commit 47a4a6f, required that a `src` and `obj` line be added to the top of the Makefiles. They must be removed from the Makefiles when builtin. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/spl#481 Issue zfsonlinux/spl#498
* Linux 4.4 compat: xattr operations takes xattr_handlerChunwei Chen2015-12-013-2/+164
| | | | | | | | | | The xattr_hander->{list,get,set} were changed to take a xattr_handler, and handler_flags argument was removed and should be accessed by handler->flags. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4021
* Linux 4.4 compat: make_request_fn returns blk_qc_tChunwei Chen2015-12-012-2/+26
| | | | | | | | | | | As part of block polling support in Linux 4.4, make_request_fn should return a cookie value of type blk_qc_t. For now, we make zvol_request always return BLK_QC_T_NONE until we assess whether and how we want to support block polling. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4021
* Fix zfs_dirty_data_max overflow on 32-bittuxoko2015-11-191-2/+2
| | | | | | | | | | | | | On 32 bit, the calculation of zfs_dirty_data_max from phymem will overflow, causing it to be smaller than zfs_dirty_data_sync, and will cause txg being delayed while no one write to disk. The end result is horrendous write speed. On 4G ram 32-bit VM, before this patch, simple dd results in ~7MB/s. Now it can reach speed on par with 64-bit VM. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3973
* Fix null pointer in arc_kmem_reap_now on 32-bittuxoko2015-11-191-0/+5
| | | | | | | | | On 32 bit system, zio_buf_cache is limit to 1M. Larger than that is all NULL. So we need to avoid reaping them. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3973
* Fix snapshot automount behavior when concurrent or failChunwei Chen2015-11-191-17/+32
| | | | | | | | | | | | | | | | | | | | When concurrent threads accessing the snapdir, one will succeed the user helper mount while others will get EBUSY. However, the original code treats those EBUSY threads as success and goes on to do zfsctl_snapshot_add, which causes repeated avl_add and thus panic. Also, if the snapshot is already mounted somewhere else, a thread accessing the snapdir will also get EBUSY from user helper mount. And it will cause strange things as doing follow_down_one will fail and then follow_up will jump up to the mountpoint of the filesystem and confuse the hell out of VFS. The patch fix both behavior by returning 0 immediately for the EBUSY threads. Note, this will have a side effect for the second case where the VFS will retry several times before returning ELOOP. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4018
* sysmacros: Make P2ROUNDUP not trigger int overflowJason Zaman2015-11-161-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The original P2ROUNDUP and P2ROUNDUP_TYPED macros contain -x which triggers PaX's integer overflow detection for unsigned integers. Replace the macros with an equivalent version that does not trigger the overflow. Axioms: A. (-(x)) === (~((x) - 1)) === (~(x) + 1) under two's complement. B. ~(x & y) === ((~(x)) | (~(y))) under De Morgan's law. C. ~(~x) === x under the law of excluded middle. Proof: 0. (-(-(x) & -(align))) original 1. (~(-(x) & -(align)) + 1) by A 2. (((~(-(x))) | (~(-(align)))) + 1) by B 3. (((~(~((x) - 1))) | (~(~((align) - 1)))) + 1) by A 4. (((((x) - 1)) | (((align) - 1))) + 1) by C Q.E.D. Signed-off-by: Jason Zaman <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3949
* zimport.sh: Add configure/make option supportBrian Behlendorf2015-11-161-7/+12
| | | | | | | | | | | | Allow the following environment variables to control the build behavior of the zimport.sh script. This can be useful when you want a debug build or require specific build options. The default values are: CONFIG_OPTIONS="" MAKE_OPTIONS="-s -j$(nproc)" Signed-off-by: Brian Behlendorf <[email protected]>
* Follow 0/-E convention for module load errorsBrian Behlendorf2015-11-162-6/+2
| | | | | | | | | | Because errors during module load are so rare it went unnoticed that it was possible that a positive errno was returned. This would result in the module being loaded, nothing being initialized, and a system panic shortly thereafter. This is what was causing the hard failures in the automated testing. Signed-off-by: Brian Behlendorf <[email protected]>
* Obey arc_meta_limit default size when changing arc_maxAndCycle2015-11-131-1/+1
| | | | | | | | | | When decreasing the maximum ARC size preserve the 3/4 default ratio for the arc_meta_limit. Otherwise, the arc_meta_limit may be set the same as arc_max. Signed-off-by: AndCycle <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4001
* Add TEST configuration file for buildbotBrian Behlendorf2015-11-101-0/+95
| | | | | | | | | | | | The TEST file is provided as a hint to the automated test infra- structure. It controls which regression tests are run and how they are run. This file along with any lines in the commit messages which start with TEST_* are sourced by the test scripts and can be used to override the default values. For complete details see: https://github.com/zfsonlinux/zfs-buildbot/ Signed-off-by: Brian Behlendorf <[email protected]>