summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Don't hold mutex until release cv in cv_waitChunwei Chen2016-01-122-15/+70
| | | | | | | | | | | | | | | | | | | | | | | | If a thread is holding mutex when doing cv_destroy, it might end up waiting a thread in cv_wait. The waiter would wake up trying to aquire the same mutex and cause deadlock. We solve this by move the mutex_enter to the bottom of cv_wait, so that the waiter will release the cv first, allowing cv_destroy to succeed and have a chance to free the mutex. This would create race condition on the cv_mutex. We use xchg to set and check it to ensure we won't be harmed by the race. This would result in the cv_mutex debugging becomes best-effort. Also, the change reveals a race, which was unlikely before, where we call mutex_destroy while test threads are still holding the mutex. We use kthread_stop to make sure the threads are exit before mutex_destroy. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Issue zfsonlinux/zfs#4166 Issue zfsonlinux/zfs#4106
* Add spl_kmem_cache_kmem_threads man page entryBrian Behlendorf2016-01-121-0/+14
| | | | | | | | The spl_kmem_cache_kmem_threads module option was accidentally omitted from the documentation. Add it. Signed-off-by: Brian Behlendorf <[email protected]> Closes #512
* _ILP32 is always defined on SPARCAlex McWhirter2016-01-081-4/+0
| | | | | | Signed-off-by: Alex McWhirter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #520
* Fix do_div() types in condvar:timeoutBrian Behlendorf2015-12-221-5/+6
| | | | | | | | The do_div() macro expects unsigned types and this is detected in powerpc implementation of do_div(). Signed-off-by: Brian Behlendorf <[email protected]> Closes #516
* Use spl_fstrans_mark instead of memalloc_noio_saveChunwei Chen2015-12-185-38/+5
| | | | | | | | | | | | | | | | | | | For earlier versions of the kernel with memalloc_noio_save, it only turns off __GFP_IO but leaves __GFP_FS untouched during direct reclaim. This would cause threads to direct reclaim into ZFS and cause deadlock. Instead, we should stick to using spl_fstrans_mark. Since we would explicitly turn off both __GFP_IO and __GFP_FS before allocation, it will work on every version of the kernel. This impacts kernel versions 3.9-3.17, see upstream kernel commit torvalds/linux@934f307 for reference. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #515 Issue zfsonlinux/zfs#4111
* Provide kstat for taskqsTim Chase2015-12-163-7/+283
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides 2 new kstats to display task queues: /proc/spl/taskqs-all - Display all task queues /proc/spl/taskqs - Display only "active" task queues A task queue is considered to be "active" if it currently has active (running) threads or if any of its pending, priority, delay or waitq lists are not empty. If the task queue has running threads, displays each thread function's address (symbolically, if possibly) and its argument. If the task queue has a non-empty list of pending, priority or delayed task queue entries (taskq_ent_t), displays each entry's thread function address and arguemnt. If the task queue has any waiters, displays each waiting task's pid. Note: This patch also updates some comments in taskq.h which referred to "taskq_t" when they should have referred to "taskq_ent_t". Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #491
* Skip GPL-only symbols test when cross-compilingKamil Domanski2015-12-141-8/+11
| | | | | | | Signed-off-by: Kamil Domański <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes zfsonlinux/spl#507 Closes zfsonlinux/zfs#4075
* Revert "Skip GPL-only symbols test when cross-compiling"Brian Behlendorf2015-12-111-1/+0
| | | | | | | | | This reverts commit 61bbbd9a775a5517af513e5014edbdd73a32f7e4 because older versions of autoconf (2.63) do not support the cross-compile argument to AC_RUN_IFELSE. Signed-off-by: Brian Behlendorf <[email protected]> Issue #507
* Fix cstyle issues in spl-taskq.c and taskq.hBrian Behlendorf2015-12-112-79/+87
| | | | | | | This patch only addresses the issues identified by the style checker. It contains no functional changes. Signed-off-by: Brian Behlendorf <[email protected]>
* Don't use tq->tq_lock_flagsChunwei Chen2015-12-112-62/+62
| | | | | | | | | | The flags argument in spin_lock_irqsave is modified out side of spin_lock context. We cannot use a shared variable like tq->tq_lock_flags for them. This patch removes it and uses local variable for the flags. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #506
* Subclass tq_lock to eliminate a lockdep warningOlaf Faaland2015-12-112-21/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | When taskq_dispatch() calls taskq_thread_spawn() to create a new thread for a taskq, linux lockdep warns of possible recursive locking. This is a false positive. One such call chain is as follows, when a taskq needs more threads: taskq_dispatch->taskq_thread_spawn->taskq_dispatch The initial taskq_dispatch() holds tq_lock on the taskq that needed more worker threads. The later call into taskq_dispatch() takes dynamic_taskq->tq_lock. Without subclassing, lockdep believes these could potentially be the same lock and complains. A similar case occurs when taskq_dispatch() then calls task_alloc(). This patch uses spin_lock_irqsave_nested() when taking tq_lock, with one of two new lock subclasses: subclass taskq TQ_LOCK_DYNAMIC dynamic_taskq TQ_LOCK_GENERAL any other Signed-off-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #480
* Fix lockdep warning in spl_inode_{lock,unlock}Olaf Faaland2015-12-111-1/+1
| | | | | | | | | | | | | | | | spl_inode_{lock,unlock} are triggering possible recursive locking warnings from lockdep. The warning is a false positive. The lock is used to protect a parent directory during delete/add operations, used in zfs when writing/removing the cache file. The inode lock is taken on both the parent inode and the file inode. VFS provides an enum to subclass the lock. This patch changes the spin_lock call to _nested version and uses the provided enum. Signed-off-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #480
* Add new lock types MUTEX_NOLOCKDEP, and RW_NOLOCKDEPOlaf Faaland2015-12-112-3/+81
| | | | | | | | | | | | | | | | | | | | | | | | | When running a kernel with CONFIG_LOCKDEP=y, lockdep reports possible recursive locking in some cases and possible circular locking dependency in others, within the SPL and ZFS modules. When lockdep detects these conditions, it disables further lock analysis for all locks. This causes /proc/lock_stats not to reflect full information about lock contention, even in locks without dependency issues. This commit creates a new type of mutex, MUTEX_NOLOCKDEP. This mutex type causes subsequent attempts to take or release those locks to be wrapped in lockdep_off() and lockdep_on(). This commit also creates an RW_NOLOCKDEP type analagous to MUTEX_NOLOCKDEP. MUTEX_NOLOCKDEP and RW_NOLOCKDEP are also defined in zfs, in a commit to that repo, for userspace builds. Signed-off-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #480
* Skip GPL-only symbols test when cross-compilingKamil Domański2015-12-111-0/+1
| | | | | | | | | | | | | | | This test depends on being able to execute the resulting binary which will be impossible when cross-compiling. Instead make a worst case assumption which allows the build to continue as recommended by the autoconf manual. https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Runtime.html Signed-off-by: Kamil Domański <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: tuxoko <[email protected]> Closes zfsonlinux/spl#507 Closes zfsonlinux/zfs#4075
* Fix build issue on some configured kernelszgock2015-12-111-1/+1
| | | | | | | | | | The SPL fails to build with some "Configured" kernels (ex. openSUSE xen Kernel) this change should make same binaries with C compiler optimization. Signed-off-by: zgock <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #510
* Either _ILP32 or _LP64 must be definedBrian Behlendorf2015-12-101-11/+31
| | | | | | | | | | | For some arm, powerpc, and sparc platforms it was possible that neither _ILP32 of _LP64 would be defined. Update the isa_defs.h header to explicitly set these macros and generate a compile error in the case neither are defined. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: tuxoko <[email protected]> Issue zfsonlinux/zfs#4048
* Revert "Make taskq_member() use ->journal_info"Brian Behlendorf2015-12-082-4/+35
| | | | | | | | This reverts commit a430c11f0b1ef16ca5edf3059e4082709277376c. Using journal_info like this can cause a BUG at kernel fs/jbd2/transaction.c:425! Signed-off-by: Brian Behlendorf <[email protected]> Issue #500
* Make taskq_member() use ->journal_infoRichard Yao2015-12-082-35/+4
| | | | | | | | | | | | | | | | | | | | | | The ->journal_info pointer in the task_struct is reserved for use by filesystems and because the kernel can have multiple file systems on the same stack due to direct reclaim, each filesystem that touches ->journal_info in a callback function will save the value at the start of its frame and restore it at the end of its frame. This allows us to safely use ->journal_info to store a pointer to the taskq's struct in taskq threads so that ZFS code paths can detect the presence of a taskq. This could break if the ZFS code were to use taskq_member from the context of direct reclaim. However, there are no such uses of it in that manner, so this is safe. This eliminates an O(N) list traversal under a spinlock with an O(1) unlocked pointer comparison. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: tuxoko <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #500
* Fix race between getf() and areleasef()Richard Yao2015-12-032-2/+15
| | | | | | | | | | | | | | | | | | If a vnode is released asynchronously through areleasef(), it is possible for the user process to reuse the file descriptor before areleasef is called. When this happens, getf() will return a stale reference, any operations in the kernel on that file descriptor will fail (as it is closed) and the operations meant for that fd will never occur from userspace's perspective. We correct this by detecting this condition in getf(), doing a putf on the old file handle, updating the file descriptor and proceeding as if everything was fine. When the areleasef() is done, it will harmlessly decrement the reference counter on the Illumos file handle. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #492
* Prevent rm modules.* when make installtuxoko2015-12-021-1/+1
| | | | | | | | | | | This was originally in e80cd06b8e0428f3ca2c62e4cb0e4ec54fda1d5c, but somehow was changed and not working anymore. And it will cause the following error: modprobe: ERROR: ../libkmod/libkmod.c:506 lookup_builtin_file() could not open builtin file '/lib/modules/4.2.0-18-generic/modules.builtin.bin' Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #501
* Add a script to display SPL slab cache statisticsBoris Protopopov2015-12-028-11/+219
| | | | | | | | Useful when looking for the info on ZFS/SPL related memory consumption. Signed-off-by: Boris Protopopov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #460
* Additional dkio support for TRIM/DiscardTim Chase2015-12-023-8/+65
| | | | | | | | | Replace DKIOCTRIM with DKIOCFREE and add additional support required for Nextenta's TRIM support. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #469
* spl-kmem-cache: include linux/prefetch.h for prefetchw()Dimitri John Ledkov2015-12-021-0/+1
| | | | | | | | This is needed for architectures that do not have a builtin prefetchw() Signed-off-by: Dimitri John Ledkov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #502
* Fix --enable-linux-builtinBrian Behlendorf2015-12-021-2/+5
| | | | | | | | | | | | | | Adding VPATH support, commit 37d7cd9, required that a `src` and `obj` line be added to the top of the Makefiles. They must be removed from the Makefiles when builtin. The code which adds the `spl/` directory to the top level Makefile was failing due to the addition of the `certs/` path. The search pattern has been adjusted to be more tolerant. Signed-off-by: Brian Behlendorf <[email protected]> Issue #481 Issue #498
* Limit maximum object size in kmem testsBrian Behlendorf2015-11-161-1/+4
| | | | | | | | | | Limit the maximum object size to 1/128 of total system memory for the kmem cache tests. Large values can result in out of memory errors for systems with less the 512M of memory. Additionally, use the known number of objects per-slab for calculating the number of objects to use for a test. Signed-off-by: Brian Behlendorf <[email protected]>
* Remove superfluous `newline` characterloli10K2015-11-131-1/+1
| | | | | | | | | Remove superfluous `newline` character from spl_kmem_cache_magazine_size module parameter description. Signed-off-by: loli10K <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #499
* sysmacros: Make P2ROUNDUP not trigger int overflowJason Zaman2015-11-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | The original P2ROUNDUP and P2ROUNDUP_TYPED macros contain -x which triggers PaX's integer overflow detection for unsigned integers. Replace the macros with an equivalent version that does not trigger the overflow. Axioms: A. (-(x)) === (~((x) - 1)) === (~(x) + 1) under two's complement. B. ~(x & y) === ((~(x)) | (~(y))) under De Morgan's law. C. ~(~x) === x under the law of excluded middle. Proof: 0. (-(-(x) & -(align))) original 1. (~(-(x) & -(align)) + 1) by A 2. (((~(-(x))) | (~(-(align)))) + 1) by B 3. (((~(~((x) - 1))) | (~(~((align) - 1)))) + 1) by A 4. (((((x) - 1)) | (((align) - 1))) + 1) by C Q.E.D. Signed-off-by: Jason Zaman <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes zfsonlinux/zfs#2505 Closes #488
* Fix taskq dynamic spawningtuxoko2015-11-131-14/+11
| | | | | | | | | | | | | | | | | | | | Currently taskq_dispatch() will spawn new task with a condition that the caller is also a member of the taskq. However, under this condition, it will still cause deadlock where a task on tq1 is waiting another thread, who is trying to dispatch a task on tq1. So this patch removes the check. For example when you do: zfs send pp/fs0@001 | zfs recv pp/fs0_copy This will easily deadlock before this patch. Also, move the seq_task check from taskq_thread_spawn() to taskq_thread() because it's not used by the caller from taskq_dispatch(). Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #496
* Don't call kmem_cache_shrink from shrinkerChunwei Chen2015-11-111-6/+1
| | | | | | | | | | | | | Linux slab will automatically free empty slab when number of partial slab is over min_partial, so we don't need to explicitly shrink it. In fact, calling kmem_cache_shrink from shrinker will cause heavy contention on kmem_cache_node->list_lock, to the point that it might cause __slab_free to livelock (see zfsonlinux/zfs#3936) Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes zfsonlinux/zfs#3936 Closes #487
* Fix CPU hotplugBrian Behlendorf2015-10-132-9/+8
| | | | | | | | | | | | | | | | Allocate a kmem cache magazine for every possible CPU which might be added to the system. This ensures that when one of these CPUs is enabled it can be safely used immediately. For many systems the number of online CPUs is identical to the number of present CPUs so this does imply an increased memory footprint. In fact, dynamically allocating the array of magazine pointers instead of using the worst case NR_CPUS can end up decreasing our memory footprint. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #482
* Use tab indent in rwlock.hChunwei Chen2015-10-021-73/+73
| | | | | | Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #473
* rwsem use kernel provided owner when possibleChunwei Chen2015-10-021-1/+22
| | | | | | | | | If CONFIG_RWSEM_SPIN_ON_OWNER is defined, rw_semaphore will have an owner field, so we don't need our own. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #473
* Don't take spin lock on rwlock ownerChunwei Chen2015-10-021-19/+4
| | | | | | | | | | | | | | The spin lock around rw_owner is completely unnecessary. The reason is that it is only modified in the down_write context. If you race against another thread modifying it, that means that you aren't holding the rwlock, so taking the spin lock don't eliminate the race. Also, we only check rw_owner in RW_WRITE_HELD because spl_rwsem_is_locked is unnecessary and might need to take spin lock. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #473
* Fix spl-dkms uninstall/updateBrian Behlendorf2015-10-021-5/+6
| | | | | | | | | | | | | Modern versions of dkms cleanup the build directory after installing. This resulted in 'dkms uninstall' never running because the check added by commit 4cdcdbf which verifies the existence of the spl.release build product would never be true. This patch resolves the issue by updating the conditional to check in the explicitly installed spl_config.h file for the version. Signed-off-by: Brian Behlendorf <[email protected]> Closes #478
* Fix PAX Patch/Grsec SLAB_USERCOPY panicBrian Behlendorf2015-09-281-1/+11
| | | | | | | | | | | | | | | Support grsecurity/PaX kernel configurations where CONFIG_PAX_USERCOPY_SLABS are enabled. When this kernel option is enabled slabs which are used to copy between user and kernel space must be created with SLAB_USERCOPY. Stock Linux kernels do not have a SLAB_USERCOPY definition so this causes no change in behavior for non-PAX-enabled kernels. Verified-by: Wuffleton <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2977 Issue #3796
* Tag spl-0.6.5Brian Behlendorf2015-09-103-1/+7
| | | | | | META file and release log updated. Signed-off-by: Brian Behlendorf <[email protected]>
* Disable direct reclaim in taskq worker threads on Linux 3.9+Richard Yao2015-09-091-0/+4
| | | | | | | | | | | | | | | Illumos does not have direct reclaim and code run inside taskq worker threads is not designed to deal with it. Allowing direct reclaim inside a worker thread can therefore deadlock. We set PF_MEMALLOC_NOIO through memalloc_noio_save() to indicate to the kernel's reclaim code that we are inside a context where memory allocations cannot be allowed to block on filesystem activity. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#1274 Issue zfsonlinux/zfs#2390 Closes #474
* Linux 4.2 compat: misc_deregister()Brian Behlendorf2015-09-011-5/+1
| | | | | | | | | The misc_deregister() function was changed to a void return type. Rather than add compatibility code to detect this change simply ignore the return code on all kernels. It was only used to log an informational error message of no real value. Signed-off-by: Brian Behlendorf <[email protected]>
* Create a new thread during recursive taskq dispatch if necessaryTim Chase2015-09-011-3/+28
| | | | | | | | | | | | | When dynamic taskq is enabled and all threads for a taskq are occupied, a recursive dispatch can cause a deadlock if calling thread depends on the recursively-dispatched thread for its return condition. This patch attempts to create a new thread for recursive dispatch when none are available. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #472
* Revert "Create a new thread during recursive taskq dispatch if necessary"Brian Behlendorf2015-08-311-22/+0
| | | | | | | | | | | This reverts commit 076821e due to a locking issue uncovered in subsequent testing. An ASSERT is hit due to tq->tq_nspawn being updated outside the lock. The patch will need to be reworked. VERIFY3(0 == tq->tq_nspawn) failed (0 == -1) Signed-off-by: Brian Behlendorf <[email protected]> Issue #472
* Create a new thread during recursive taskq dispatch if necessaryTim Chase2015-08-311-0/+22
| | | | | | | | | | | | | When dynamic taskq is enabled and all threads for a taskq are occupied, a recursive dispatch can cause a deadlock if calling thread depends on the recursively-dispatched thread for its return condition. This patch attempts to create a new thread for recursive dispatch when none are available. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #472
* Restructure uio to accommodate bio_vecChunwei Chen2015-08-241-1/+8
| | | | | | | | | | | | | | | | Starting from Linux 4.1, bio_vec will be allowed to pass into filesystem via iter_read/iter_write, so we add a bio_vec field in uio_t to hold it, and use UIO_BVEC in segflg to determine which "vec". Also, to be consistent to newer kernel, we make iovec and bio_vec immutable, and make uio act as an iterator with the new uio_skip field indicating number of bytes to skip in the first segment. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#3511 Issue zfsonlinux/zfs#3640 Closes #468
* Linux 4.2 compat: vfs_rename()Brian Behlendorf2015-08-192-0/+7
| | | | | | | | | | | | | Attempting to perform a vfs_rename() on Linux 4.2 and newer kernels results in an EACCES error. Rather than attempting to add and maintain more ugly compatibility code it's best to just retire this interface. As a first step the SPLAT test is disabled for Linux 4.2 and newer kernels. vn_rename: Failed vn_rename /tmp/vn.tmp.1 -> /tmp/vn.tmp.2 (13) Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#3653
* Include other sources of freeable memory in the freemem calculationTim Chase2015-08-191-1/+4
| | | | | | | | | | Prevents ARC collapse when non-ZFS filesystems, the block layer or other memory consumers use a lot of reclaimable memory in the page cache. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes zfsonlinux/zfs#3680 Closes #471
* Remove needfree, desfree, lotsfree #definesBrian Behlendorf2015-07-301-3/+0
| | | | | | | | | | This patch reverts 77ab5dd. This is now possible because upstream has refactored the ARC in such a way that these values are only used in a few key places. Those places have subsequently been updated to use the Linux equivalent Linux functionality. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#3637
* Invert minclsyspri and maxclsyspriBrian Behlendorf2015-07-284-7/+6
| | | | | | | | | | | | | | | | | | | | | | | On Linux the meaning of a processes priority is inverted with respect to illumos. High values on Linux indicate a _low_ priority while high value on illumos indicate a _high_ priority. In order to preserve the logical meaning of the minclsyspri and maxclsyspri macros when they are used by the illumos wrapper functions their values have been inverted. This way when changes are merged from upstream illumos we won't need to remember to invert the macro. It could also lead to confusion. Note this change also reverts some of the priorities changes in prior commit 62aa81a. The rational is as follows: spl_kmem_cache - High priority may result in blocked memory allocs spl_system_taskq - May perform I/O for file backed VDEVs spl_dynamic_taskq - New taskq threads should be spawned promptly Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Issue zfsonlinux/zfs#3607
* Remove skc_ref from alloc/free pathsBrian Behlendorf2015-07-241-9/+2
| | | | | | | | | | | | | | | | As described in spl_kmem_cache_destroy() the ->skc_ref count was added to address the case of a cache reap or grow racing with a destroy. They are not strictly needed in the alloc/free paths because consumers of the cache are responsible for not using it while it's being destroyed. Removing this code is desirable because there is some evidence that contention on this atomic negative impacts performance on large-scale NUMA systems. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Issue #463
* Add defclsyspri macroBrian Behlendorf2015-07-2311-25/+48
| | | | | | | | | | | | | | | | | | | | | | | Add a new defclsyspri macro which can be used to request the default Linux scheduler priority. Neither the minclsyspri or maxclsyspri map to the default Linux kernel thread priority. This makes it awkward to create taskqs which run with the same priority as the rest of the kernel threads on the system which can lead to performance issues. All SPL callers which previously used minclsyspri or maxclsyspri have been changed to use defclsyspri. The vast majority of callers were part of the test suite which won't have an external impact. The few places where it could impact performance the change was from maxclsyspri to defclsyspri. This makes it more likely the process will be scheduled which may help performance. To facilitate further performance analysis the spl_taskq_thread_priority module option has been added. When disabled (0) all newly created kernel threads will use the default kernel thread priority. When enabled (1) the specified taskq priority will be used. By default this value is enabled (1). Signed-off-by: Brian Behlendorf <[email protected]>
* Default to --disable-debug-kmemBrian Behlendorf2015-07-212-22/+6
| | | | | | | | | | | | | | | | | | The default kmem debugging (--enable-debug-kmem) can severely impact performance on large-scale NUMA systems due to the atomic operations used in the memory accounting. A 32-thread fio test running on a 40-core 80-thread system and performing 100% cached reads with kmem debugging is: Enabled: READ: io=177071MB, aggrb=2951.2MB/s, minb=2951.2MB/s, maxb=2951.2MB/s, Disabled: READ: io=271454MB, aggrb=4524.4MB/s, minb=4524.4MB/s, maxb=4524.4MB/s, Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Issues #463
* Support parallel build trees (VPATH builds)Turbo Fredriksson2015-07-176-44/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | Build products from an out of tree build should be written relative to the build directory. Sources should be referred to by their locations in the source directory. This is accomplished by adding the 'src' and 'obj' variables for the module Makefile.am, using relative paths to reference source files, and by setting VPATH when source files are not co-located with the Makefile. This enables the following: $ mkdir build $ cd build $ ../configure $ make -s This change also has the advantage of resolving the following warning which is generated by modern versions of automake. Makefile.am:00: warning: source file 'xxx' is in a subdirectory, Makefile.am:00: but option 'subdir-objects' is disabled Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#1082