openzfs/zfs.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add spl_kmem_cache_expire module option	Brian Behlendorf	2013-01-28	3	-6/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cache aging was implemented because it was part of the default Solaris kmem_cache behavior. The idea is that per-cpu objects which haven't been accessed in several seconds should be returned to the cache. On the other hand Linux slabs never move objects back to the slabs unless there is memory pressure on the system. This behavior is now configurable through the 'spl_kmem_cache_expire' module option. The value is a bit mask with the following meaning. 0x1 - Solaris style cache aging eviction is enabled. 0x2 - Linux style low memory eviction is enabled. Both methods may be safely enabled simultaneously, but by default both are disabled. It has never been clear if the kmem cache aging (which has been around from day one) actually does any good. It has however been the source of numerous bugs so I wouldn't mind retiring it entirely. Signed-off-by: Brian Behlendorf <[email protected]> Closes zfsonlinux/zfs#1227 Closes #210
*	Remove spl_invalidate_inodes()	Brian Behlendorf	2013-01-17	3	-116/+0
\| \| \| \| \| \| \| \| \| \| \|	This functionality is no longer required by ZFS, see commit zfsonlinux/zfs@7b3e34ba5a7ee8d0fda44d214f6f11eb16cdb26f. Since there are no other consumers, and because it adds additional autoconf complexity which must be maintained the spl_invalidate_inodes() function has been removed. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#795
*	kmem-cache: Fix slab ageing soft lockup	Brian Behlendorf	2013-01-14	1	-43/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit a10287e00d13c4c4dbbff14f42b00b03da363fcb slightly reworked the slab ageing code such that it is no longer dependent on the Linux delayed work queue interfaces. This was good for portability and performance, but it requires us to use the on_each_cpu() function to execute the spl_magazine_age() function. That means that the function is now executing in interrupt context whereas before it was scheduled in normal process context. And that means we need to be slightly more careful about the locking in the interrupt handler. With the reworked code it's possible that we'll be holding the skc->skc_lock and be interrupted to handle the spl_magazine_age() IRQ. This will result in a deadlock and soft lockup errors unless we're careful to detect the contention and avoid taking the lock in the interupt handler. So that's what this patch does. Alternately, (and slightly more conventionally) we could have used spin_lock_irqsave() to prevent this race entirely but I'd perfer to avoid disabling interrupts as much as possible due to performance concerns. There is absolutely no penalty for us not aging objects out of the magazine due to contention. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Closes zfsonlinux/zfs#1193
*	call_usermodehelper() should wait for process	Ned Bass	2013-01-09	3	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	As of Linux 3.4 the UMH_WAIT_* constants were renumbered. In particular, the meaning of "1" changed from UMH_WAIT_PROC (wait for process to complete), to UMH_WAIT_EXEC (wait for the exec, but not the process). A number of call sites used the number 1 instead of the constant name, so the behavior was not as expected on kernels with this change. Signed-off-by: Brian Behlendorf <[email protected]>
*	Check for ZLIB_INFLATE and ZLIB_DEFLATE	Brian Behlendorf	2013-01-09	1	-0/+44
\| \| \| \| \| \| \| \| \| \|	Check at ./configure time that the kernel was built with zlib support enabled. This support may either be configured as a module or builtin to the kernel. But if it's missing the build will fail so it's best to catch this early. Signed-off-by: Brian Behlendorf <[email protected]> Closes zfsonlinux/zfs#582
*	Linux compat 3.7.1, on_each_cpu()	Brian Behlendorf	2013-01-09	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some kernels require that we include the 'linux/irqflags.h' header for the SPL_AC_3ARGS_ON_EACH_CPU check. Otherwise, the functions local_irq_enable()/local_irq_disable() will not be defined and the prototype will be misdetected as the four argument version. This change actually include 'linux/interrupt.h' which in turn includes 'linux/irqflags.h' to be as generic as possible. Additionally, passing NULL as the function can result in a gcc error because the on_each_cpu() macro executes it unconditionally. To make the test more robust we pass the dummy function on_each_cpu_func(). Signed-off-by: Brian Behlendorf <[email protected]> Closes #204
*	RHEL 6.4 compat, fallocate()	Brian Behlendorf	2013-01-08	3	-9/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the upstream kernel the FALLOC_FL_PUNCH_HOLE #define was introduced after the fallocate() function was moved from the inode_operations to the file_operations structure. Therefore, the SPL code assumed that if FALLOC_FL_PUNCH_HOLE was defined it was safe to use f_ops->fallocate(). Unfortunately, the RHEL6.4 kernel has only backported the FALLOC_FL_PUNCH_HOLE #define and not the fallocate() change. To address this compatibility issue the spl_filp_fallocate() helper function was added to properly detect which interface is available. Signed-off-by: Brian Behlendorf <[email protected]>
*	Add cv_wait_io() to account I/O time	Matt Johnston	2013-01-07	2	-4/+16
\| \| \| \| \| \| \| \| \| \|	Under Linux when a task is waiting on I/O it should call the io_schedule() function for proper accounting. The Solaris cv_wait() function provides no way to specify what the cv is waiting on therefore cv_wait_io() is introduced. Signed-off-by: Brian Behlendorf <[email protected]> Closes #206
*	SPL 0.6.0-rc13	Brian Behlendorf	2012-12-20	1	-1/+1
\|
*	Refresh AUTHORS	Brian Behlendorf	2012-12-19	1	-8/+23
\| \| \| \| \| \| \|	The AUTHORS file was getting stale. Refresh its contents using the authors listed in the git commit logs. Signed-off-by: Brian Behlendorf <[email protected]>
*	Remove the ChangeLog	Brian Behlendorf	2012-12-19	2	-556/+1
\| \| \| \| \| \| \|	The ChangeLog was retired long ago, the git commit logs are authoritative. To avoid any confusion remove the ChangeLog. Signed-off-by: Brian Behlendorf <[email protected]>
*	Fix spl_kmem_init_kallsyms_lookup() panic	Brian Behlendorf	2012-12-19	3	-0/+22
\| \| \| \| \| \| \| \| \| \| \|	Due to I/O buffering the helper may return successfully before the proc handler has a chance to execute. To catch this case wait up to 1 second to verify spl_kallsyms_lookup_name_fn was updated to a non SYMBOL_POISON value. Signed-off-by: Brian Behlendorf <[email protected]> Closes zfsonlinux/zfs#699 Closes zfsonlinux/zfs#859
*	Do not use KERNEL_DIR env var in Makefile.am	Richard Yao	2012-12-17	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	A Gentoo user reported an issue where the build system would attempt to recurse into the kernel source tree if KERNEL_DIR is set in the environment. KERNEL_DIR is an environment variable that is used when the kernel sources are in a non-standard location, so it is necessary to stop relying on it to prevent this issue. https://bugs.gentoo.org/show_bug.cgi?id=433946 Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
*	Merge branch 'taskq'	Brian Behlendorf	2012-12-12	8	-495/+986
\|\ \| \| \| \| \| \| \| \|	Signed-off-by: Brian Behlendorf <[email protected]> Closes #199
\| *	Removed SPL_AC_3ARGS_INIT_WORK check	Brian Behlendorf	2012-12-12	3	-71/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All consumers of the kernel delayed work queues have been shifted over to rely on the taskq implementation. This compatibility code can now be removed. Any new callers which need this functionality should use the taskq interfaces for delayed work items. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	kmem-cache: Use a taskq for async allocations	Brian Behlendorf	2012-12-12	2	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Shift the asynchronous allocations over to use the taskq interfaces. This allows us to abandon the kernels delayed work queue interface and all the compatibility code it requires. This code never actually used the delay functionality it was just done this way to leverage the existing compatibility code. All that is required is a thread context to perform the allocation in. The only thing clever in this change is that we take advantage of the preallocated task queue entries to avoid a memory allocation. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	kmem-cache: Use taskqs for ageing	Brian Behlendorf	2012-12-12	2	-43/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Shift the cache and magazine ageing functionality over to the new delayed taskq interfaces. This allows us to abandon the kernels delayed work queue interface and all the compatibility code it requires. However, the delayed taskq interface does not allow us to schedule a task for a specfic cpu so the ageing code was slightly reworked. The magazine ageing delay has been directly linked to the cache ageing function. The spl_cache_age() function invokes on_each_cpu() in order to run spl_magazine_age() on each cpu. It then blocks waiting for them to complete and promptly reclaims any free slabs. When restructing the code wasn't the primary goal I think the new code is far more understable and maintainable. It also should help minimize magazine thrashing because free slabs are immediately released after the magazine is aged. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	kmem-cache: spl_kmem_cache_create() may always sleep	Brian Behlendorf	2012-12-12	1	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When this code was originally written I went overboard and allowed for the possibility of creating a cache in an atomic context. In practice there are no callers which ever do this. This makes sense since a cache is by design a long lived data structure. To prevent abuse of this function going forward I'm removing the code which is supported to handle an atomic context. All allocators have been updated to use KM_SLEEP and the might_sleep() debug macro has been added to immediately detect atomic callers. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	splat taskq:front: Reduce stack frame	Brian Behlendorf	2012-12-12	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The slightly increased size of the taskq_ent_t when debugging is enabled has pushed the taskq:front splat test over frame size limit. To resolve this dynamically allocate the taskq_ent_t structures so they are part of the heap instead of the stack. In function 'splat_taskq_test6_impl' error: the frame size of 1648 bytes is larger than 1024 bytes Signed-off-by: Brian Behlendorf <[email protected]>
\| *	splat taskq:order: Reduce stack frame	Brian Behlendorf	2012-12-12	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The slightly increased size of the taskq_ent_t when debugging is enabled has pushed the taskq:order splat test over frame size limit. To resolve this dynamically allocate the taskq_ent_t structures so they are part of the heap instead of the stack. In function 'splat_taskq_test5_impl' error: the frame size of 1680 bytes is larger than 1024 bytes Signed-off-by: Brian Behlendorf <[email protected]>
\| *	splat taskq:cancel: Add test case	Brian Behlendorf	2012-12-12	1	-0/+183
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a test case for taskq_cancel_id() to verify it is working properly. Just like taskq:delay we start by dispatching 100 tasks. However this time 1/3 of the tasks use taskq_dispatch() and will be run immediately, and 2/3 use taskq_dispatch_delay(). The idea is to create a busy taskq with both active, pending, and delayed tasks. After all the items have been successfully dispatched the test begins randomly canceling known task ids. It will do this for 5 seconds randomly canceling a task id and then sleeping for a few milliseconds. The task being canceled may have already run, still be on the pending list, or may be currently being executed by a worker thread. The idea is to ensure we catch any subtle race conditions. Once all the non-canceled tasks have completed we cross check the number of tasks which ran with the number of tasks which were successfully canceled. Additionally, we verify that the taskq_cancel_id() function never blocks longer than needed. This time is bounded by the longest run time of the task which was dispatched. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	splat taskq:delay: Add test case	Brian Behlendorf	2012-12-12	1	-11/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a test case for taskq_dispatch_delay() to verify it is working properly. The test dispatchs 100 tasks to a taskq with random expiration times spread over 5 seconds. As each task expires and gets executed by a worker thread it verifies that it was run at the correct time. Once all the delayed tasks have been executed we double check that all the dispatched tasks were successful. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	taskq delay/cancel functionality	Brian Behlendorf	2012-12-12	3	-119/+389
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the ability to dispatch a delayed task to a taskq. The desired behavior is for the task to be queued but not executed by a worker thread until the expiration time is reached. To achieve this two new functions were added. * taskq_dispatch_delay() - This function behaves exactly like taskq_dispatch() however it takes a third 'expire_time' argument. The caller should pass the desired time the task should be executed as an absolute value in jiffies. The task is guarenteed not to run before this time, it may run slightly latter if all the worker threads are busy. * taskq_cancel_id() - Given a task id attempt to cancel the task before it gets executed. This is primarily useful for canceling delay tasks but can be used for canceling any previously dispatched task. There are three possible return values. 0 - The task was found and canceled before it was executed. ENOENT - The task was not found, either it was already run or an invalid task id was supplied by the caller. EBUSY - The task is currently executing any may not be canceled. This function will block until the task has been completed. * taskq_wait_all() - The taskq_wait_id() function was renamed taskq_wait_all() to more clearly reflect its actual behavior. It is only curreny used by the splat taskq regression tests. * taskq_wait_id() - Historically, the only difference between this function and taskq_wait() was that you passed the task id. In both functions you would block until ALL lower task ids which executed. This was semantically correct but could be very slow particularly if there were delay tasks submitted. To better accomidate the delay tasks this function was reimplemnted. It will now only block until the passed task id has been completed. This is actually a fairly low risk change for a few reasons. * Only new ZFS callers will make use of the new interfaces and very little common code was changed to support the new functions. * The existing taskq_wait() implementation was not changed just slightly refactored. * The newly optimized taskq_wait_id() implementation was never used by ZFS we can't accidentally introduce a new bug there. NOTE: This functionality does not exist in the Illumos taskqs. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	taskq style, remove #define wrappers	Brian Behlendorf	2012-12-12	2	-46/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the taskq implementation was originally written I wrapped all the API functions in #define's. This was done as a preventative measure to ensure that a taskq symbol never conflicted with an existing kernel symbol. However, in practice the taskq symbols never conflicted. The only major conflicts occured with the kmem cache API. Since this added layer of obfuscation never bought us anything for the taskq's I'm removing it. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	taskq style, convert spaces to soft tabs	Brian Behlendorf	2012-12-12	2	-217/+218
\|/ \| \| \| \| \| \| \|	Update the taskq implementation to conform with the style used throughout the rest of the code. There are no functional changes in this commit. Signed-off-by: Brian Behlendorf <[email protected]>
*	splat linux:shrinker: Fix fail-safe	Steven Johnson	2012-12-12	1	-4/+21
\| \| \| \| \| \|	Ensure the fail-safe is reset between successive tests. Signed-off-by: Brian Behlendorf <[email protected]>
*	splat linux:shrinker: Fix race condition	Steven Johnson	2012-12-12	1	-1/+27
\| \| \| \| \| \| \| \| \| \| \| \|	Ensure the test thread blocks until the shrinker has completed its work. This is done by putting the test thread to sleep and waking it each time the shrinker callback runs. Once the shrinker size drops to zero or we time out the test is allowed to proceed. Signed-off-by: Brian Behlendorf <[email protected]> Closes #96 Closes #125 Closes #182
*	splat command verbose behavior	Brian Behlendorf	2012-12-11	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The splat command takes a verbose option which when set prints the internal debug log for every test. This is helpful when tracking down a common failure, but for a rare failure the volume of log data is distracting. Therefore, the verbose option has been adjusted to allow only printing the debug log on failure. The legacy behavior is still available by specifying the verbose option twice. For example: $ splat -t all:all # Never print the debug log $ splat -v -t all:all # Only print debug log on failure $ splat -vv -t all:all # Always print the debug log Signed-off-by: Brian Behlendorf <[email protected]>
*	splat taskq:front: Fix race	Steven Johnson	2012-12-05	1	-30/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The taskq:front test has a race condition where task 4 and 8 race to complete, due to an incorrectly calculated set of delay "factors" (T). If task 4 wins and actually finishes first, the verification of the order of completion will fail. The delays calculated to order task completion do not take into account the terminal line in the table, and so are all off by a factor of 1. This causes all the tasks in all queues to finish sooner than expected and the accumulated error is the root cause of tasks 4 and 8 racing to complete first. Before the change the "actual" table looks like I commented in #130. I changed: * the table in the comment to correctly reflect the test and the factor timings needed. * the individual task delay factors of T so that ONLY 1 task will every 2T. (on average) * 1T was reduced from 100ms to 50ms. This halves the duration of the test and makes any remaining raciness more likely to cause failures, but it did not cause the test to fail. * simplified the delay factor logic by using a table look-up instead of a switch. * Added a "task started" message so that with -v it is possible to see the order tasks are started. * Moved the "task completed" message inside the spinlock so that with -v the message truly reflects the absolute order of completion as guaranteed by the spinlock. Signed-off-by: Brian Behlendorf <[email protected]> Closes #130
*	Handle errors from spl_kern_path_locked()	Brian Behlendorf	2012-12-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	When the Linux 3.6 KERN_PATH_LOCKED compatibility code was added by commit bcb1589 an entirely new vn_remove() implementation was added. That function did not properly handle an error from spl_kern_path_locked() which would result in an panic. This patch addresses the issue by returning the error to the caller. Signed-off-by: Brian Behlendorf <[email protected]> Closes #187
*	Linux compat 3.7, kernel_thread()	Brian Behlendorf	2012-12-03	2	-66/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The preferred kernel interface for creating threads has been kthread_create() for a long time now. However, several of the SPLAT tests still use the legacy kernel_thread() function which has finally been dropped (mostly). Update the condvar and rwlock SPLAT tests to use the modern interface. Frankly this is something we should have done a long time ago. Signed-off-by: Brian Behlendorf <[email protected]> Closes #194
*	Verify --with-linux source directory exists	Brian Behlendorf	2012-11-29	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Previously this check was only performed when ./configure was attempting to autodetect your kernel source directory. But we should also handle the case where --with-linux was provided and is obviously wrong. This way we catch the error before invoking make and compiling the source with an incorrect autoconf results. Signed-off-by: Brian Behlendorf <[email protected]> Closes #162
*	Disable FS reclaim when allocating new slabs	Brian Behlendorf	2012-11-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allowing the spl_cache_grow_work() function to reclaim inodes allows for two unlikely deadlocks. Therefore, we clear __GFP_FS for these allocations. The two deadlocks are: * While holding the ZFS_OBJ_HOLD_ENTER(zsb, obj1) lock a function calls kmem_cache_alloc() which happens to need to allocate a new slab. To allocate the new slab we enter FS level reclaim and attempt to evict several inodes. To evict these inodes we need to take the ZFS_OBJ_HOLD_ENTER(zsb, obj2) lock and it just happens that obj1 and obj2 use the same hashed lock. * Similar to the first case however instead of getting blocked on the hash lock we block in txg_wait_open() which is waiting for the next txg which isn't coming because the txg_sync thread is blocked in kmem_cache_alloc(). Note this isn't a 100% fix because vmalloc() won't strictly honor __GFP_FS. However, it practice this is sufficient because several very unlikely things must all occur concurrently. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#1101
*	SPL 0.6.0-rc12	Brian Behlendorf	2012-11-13	1	-1/+1
\|
*	Merge branch 'kmem-cache-optimization'	Brian Behlendorf	2012-11-08	3	-50/+131
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This branch contains kmem cache optimizations designed to resolve the lockups reported in zfsonlinux/zfs#922. The lockups were largely the result of spin lock contention in the slab under low memory conditions. Fundamentally, these changes are all designed to minimize that contention though a variety of methods. * Improved vmem cached deadlock detection * Track emergency objects in rbtree * Optimize spl_kmem_cache_free() * Never spin in kmem_cache_alloc() Signed-off-by: Brian Behlendorf <[email protected]> zfsonlinux/zfs#922
\| *	Never spin in kmem_cache_alloc()	Brian Behlendorf	2012-11-06	1	-5/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we are reaping from the cache and a concurrent allocation occurs then the caller must block until the reaping is complete. This is signaled by the clearing of the KMC_BIT_REAPING bit. Otherwise the caller will be in a tight loop which takes and releases the skc->skc_cache lock. When there are multiple concurrent callers the system will thrash on the lock and appear to lock up. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	Optimize spl_kmem_cache_free()	Brian Behlendorf	2012-11-06	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because only virtual slabs may have emergency objects and these objects are guaranteed to have physical addresses. It can be easily determined if the passed object is a virtual slab object or an emergency object. This allows us to completely optimize the emergency object free case out of the common free path. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	Track emergency object in rbtree	Brian Behlendorf	2012-11-06	2	-28/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the initial implementation emergency objects were tracked on a per-cache list. The assumption was that under normal operation we would never allocate more than a handful of these objects. So the cost of walking the list during free was expected to be negligible. However real world usage has shown that emergency objects tend to be allocated in batches. A deadlock will be detected and several thousand emergency objects will be allocated before the original blocked slab allocation can complete. Therefore the original list has been replaced by a red black tree which is sorted by the memory address of each allocated object. This bounds the worst case insertion and removal time to O(log n) which minimize contention on the assoicated spin lock. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	Improved vmem cached deadlock detection	Brian Behlendorf	2012-11-06	3	-13/+34
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The entire goal of performing the slab allocations asynchronously is to be able to detect when a vmalloc() deadlocks. In this case, and only this case, do we want to start allocating emergency objects. The trick here is to minimize false positives because the overhead of tracking emergency objects is far higher than normal slab objects. With that goal in mind the code was reworked to be less sensitive to slow allocations by increasing the wait time. Once a cache is is marked deadlocked all subsequent allocations which can not be satisfied with existing cache objects will immediately allocate new emergency objects. This behavior persists until the asynchronous allocation completes and clears the deadlocked flag. The result of these tweaks is that far fewer emergency objects get created which is important because this minimizes the cost of releasing them latter in kmem_cache_free(). Signed-off-by: Brian Behlendorf <[email protected]>
*	Merge branch 'splat'	Brian Behlendorf	2012-11-06	22	-81/+93
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Additional debugging, some cleanup, and an assortment of fixes to the SPLAT tests and infrastructure. Full details in the individual patches. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	splat kmem:slab_overcommit: Disabled	Brian Behlendorf	2012-11-06	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Disable this test because it may result in an OOM event on the system which can result in the test infrastructure being killed. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	splat atomic:64-bit: Create thread outside spin lock	Brian Behlendorf	2012-11-06	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Fedora 3.6 debug kernel identified the following issue where we create a thread under a spin lock. This isn't safe because sleeping could result in a deadlock. Therefore the lock is changed to a mutex so it's safe to sleep. BUG: sleeping function called from invalid context at mm/slub.c:930 in_atomic(): 1, irqs_disabled(): 0, pid: 10583, name: splat 1 lock held by splat/10583: Signed-off-by: Brian Behlendorf <[email protected]>
\| *	splat: Fix log buffer locking	Brian Behlendorf	2012-11-06	2	-14/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Fedora 3.6 debug kernel identified the following issue where we call copy_to_user() under a spin lock(). This used to be safe in older kernels but no longer appears to be true so the spin lock was changed to a mutex. None of this code is performance critical so allowing the process to sleep is harmless. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	splat: Cleanup headers	Brian Behlendorf	2012-11-06	20	-43/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Restructure the the SPLAT headers such that each test only includes the minimal set of headers it requires. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	Condition variable reference counts	Brian Behlendorf	2012-11-06	2	-9/+26
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reference count every entry and exit from the condition variable functions: cv_wait(), cv_wait_timeout(), cv_signal(), cv_broadcast(). This allows us to safely block in cv_destroy() until all consumers have been scheduled and are no longer accessing the condition variable memory. In addition poison the magic value at the start of cv_destroy() to ensure there are never any new callers after cv_destroy() is called. The consumer is responsible for ensuring this never occurs. Signed-off-by: Brian Behlendorf <[email protected]>
*	Merge remote branch 'eris/stats'	Brian Behlendorf	2012-11-06	2	-7/+70
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	Bring in support for the new KSTAT_TYPE_TXG type. This allows for additional visibility in to the txg handling. Signed-off-by: Brian Behlendorf <[email protected]>
\| *	Add KSTAT_TYPE_TXG type	Brian Behlendorf	2012-11-02	2	-1/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new kstat type for tracking useful statistics about a TXG. The new KSTAT_TYPE_TXG type can be used to tracks the following statistics per-txg. txg - Unique txg number state - State (O)pen/(Q)uiescing/(S)yncing/(C)ommitted birth; - Creation time nread - Bytes read nwritten; - Bytes written reads - IOPs read writes - IOPs write open_time; - Length in nanoseconds the txg was open quiesce_time - Length in nanoseconds the txg was quiescing sync_time; - Length in nanoseconds the txg was syncing Signed-off-by: Brian Behlendorf <[email protected]>
\| *	Make kstat.ks_update() callback atomic	Brian Behlendorf	2012-10-23	2	-6/+9
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the kstat ks_update() callback under the ks_lock. This enables dynamically sized kstats without modification to the kstat API. * Create a kstat with the KSTAT_FLAG_VIRTUAL flag. * Register a ->ks_update() callback which does: o Frees any existing ks_data buffer. o Set ks_data_size to the kstat array size. o Set ks_data to an allocated buffer of size ks_data_size o Populate the array of buffers with the required data. The buffer allocated in the ks_update() callback is guaranteed to remain allocated and valid while the proc sequence handler iterates over the buffer. The lock will not be dropped until kstat_seq_stop() function is run making it safe for concurrent access. To allow the ks_update() callback to perform memory allocations the lock was changed to a mutex. Signed-off-by: Brian Behlendorf <[email protected]>
*	Linux 3.7 compat, __clear_close_on_exec() removed	Brian Behlendorf	2012-10-18	3	-197/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit torvalds/linux@b8318b0 moved the __clear_close_on_exec() function out of include/linux/fdtable.h and in to fs/file.c making it unavailable to the SPL. Now as it turns out we only used this function to tear down some test infrastructure for the vn_getf()/vn_releasef() SPLAT regression tests. Rather than implement even more autoconf compatibilty code to handle this we just remove the test case. This also allows us to drop three existing autoconf tests. This does mean the SPLAT tests will no longer verify these functions but historically they have never been a problem. And if we feel we absolutely need this test coverage I'm sure a more portable version of the test case could be added. Signed-off-by: Brian Behlendorf <[email protected]> Closes #183
*	Linux 3.6 compat, kern_path_locked() added	Yuxuan Shui	2012-10-14	3	-0/+158
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The kern_path_parent() function was removed from Linux 3.6 because it was observed that all the callers just want the parent dentry. The simpler kern_path_locked() function replaces kern_path_parent() and does the lookup while holding the ->i_mutex lock. This is good news for the vn implementation because it removes the need for us to handle the locking. However, it makes it harder to implement a single readable vn_remove()/vn_rename() function which is usually what we prefer. Therefore, we implement a new version of vn_remove()/vn_rename() for Linux 3.6 and newer kernels. This allows us to leave the existing working implementation untouched, and to add a simpler version for newer kernels. Long term I would very much like to see all of the vn code removed since what this code enabled is generally frowned upon in the kernel. But that can't happen util we either abondon the zpool.cache file or implement alternate infrastructure to update is correctly in user space. Signed-off-by: Yuxuan Shui <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #154