summaryrefslogtreecommitdiffstats
path: root/module
Commit message (Collapse)AuthorAgeFilesLines
* Fix taskq_wait() not waiting bugBrian Behlendorf2009-03-152-12/+95
| | | | | | | | | | | | | | | | | I'm very surprised this has not surfaced until now. But the taskq_wait() implementation work only wait successfully the first time it was called. Subsequent usage of taskq_wait() on the taskq would not wait. The issue was caused by tq->tq_lowest_id being set to MAX_INT after the first wait completed. This caused subsequent waits which check that the waiting id is less than the lowest taskq id to always succeed. The fix is to ensure that tq->tq_lowest_id is never set larger than tq->tq_next.id. Additional fixes which were added to this patch include: 1) Fix a race by placing the taskq_wait_check() in the tq->tq_lock spinlock. 2) taskq_wait() should wait for the largest outstanding id. 3) Multiple spelling corrections. 4) Added taskq wait regression test to validate correct behavior.
* Mutex tests updated to use task queues instead of work queues.Brian Behlendorf2009-03-151-59/+77
| | | | | | | | | | | | Mainly for portability reasons I have rebased the mutex tests on Solaris taskqs instead of linux work queues. The linux workqueue API changed post 2.6.18 kernels and using task queues avoids having to conditionally detect which workqueue API to use. Additionally, this is basically free additional testing for the task queues. Much to my surprise after updating these test cases they did expose a long standing bug in the taskq_wait() implementation. This patch does not address that issue but the followup patch does.
* Fix off-by-1 truncation of hw_serial when converting from integer to string, ↵Ricardo M. Correia2009-03-121-9/+10
| | | | | | | | | | | | when writing to /proc/sys/kernel/spl/spl_hostid. Fixes hostid mismatch which leads to assertion failure when the hostid/hw_serial is a 10-character decimal number: $ zpool status pool: lustre state: ONLINE lt-zpool: zpool_main.c:3176: status_callback: Assertion `reason == ZPOOL_STATUS_OK' failed. zsh: 5262 abort zpool status
* Minor bug fix in XDR code introduced in last minute change before landing.Ricardo M. Correia2009-03-111-2/+2
| | | | | 1) Removed xdr_bytesrec typedef which has no consumers. If we re-add it should also probably be xdr_bytesrec_t.
* Add XDR implementationRicardo M. Correia2009-03-112-0/+518
| | | | | Added proper XDR implementation (Lustre bug 17662), needed for on-disk compatibility between platforms of different endianness.
* Build system cleanupBrian Behlendorf2009-03-111-1/+6
| | | | | | 1) Undefine non-unique entries in spl_config.h 2) Minor Makefile cleanup 3) Don't use includedir for proper kernel header install
* Build system and packaging (RPM support)Brian Behlendorf2009-03-094-59/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | An update to the build system to properly support all commonly used Makefile targets these include: make all # Build everything make install # Install everything make clean # Clean up build products make distclean # Clean up everything make dist # Create package tarball make srpm # Create package source RPM make rpm # Create package binary RPMs make tags # Create ctags and etags for everything Extra care was taken to ensure that the source RPMs are fully rebuildable against Fedora/RHEL/Chaos kernels. To build binary RPMs from the source RPM for your system simply run: rpmbuild --rebuild spl-x.y.z-1.src.rpm This will produce two binary RPMs with correct 'requires' dependencies for your kernel. One will contain all spl modules and support utilities, the other is a devel package for compiling additional kernel modules which are dependant on the spl. spl-x.y.z-1_<kernel version>.x86_64.rpm spl-devel-x.y.2-1_<kernel version>.x86_64.rpm
* XXX: Temporarily disable vmem_size().Ricardo M. Correia2009-03-051-0/+2
|
* Linux VM Integration CleanupBrian Behlendorf2009-03-043-72/+175
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove all instances of functions being reimplemented in the SPL. When the prototypes are available in the linux headers but the function address itself is not exported use kallsyms_lookup_name() to find the address. The function name itself can them become a define which calls a function pointer. This is preferable to reimplementing the function in the SPL because it ensures we get the correct version of the function for the running kernel. This is actually pretty safe because the prototype is defined in the headers so we know we are calling the function properly. This patch also includes a rhel5 kernel patch we exports the needed symbols so we don't need to use kallsyms_lookup_name(). There are autoconf checks to detect if the symbol is exported and if so to use it directly. We should add patches for stock upstream kernels as needed if for no other reason than so we can easily track which additional symbols we needed exported. Those patches can also be used by anyone willing to rebuild their kernel, but this should not be a requirement. The rhel5 version of the export-symbols patch has been applied to the chaos kernel. Additional fixes: 1) Implement vmem_size() function using get_vmalloc_info() 2) SPL_CHECK_SYMBOL_EXPORT macro updated to use $LINUX_OBJ instead of $LINUX because Module.symvers is a build product. When $LINUX_OBJ != $LINUX we will not properly detect exported symbols. 3) SPL_LINUX_COMPILE_IFELSE macro updated to add include2 and $LINUX/include search paths to allow proper compilation when the kernel target build directory is not the source directory.
* Add zone_get_hostid() functionBrian Behlendorf2009-02-192-2/+19
| | | | | | | | Minimal support added for the zone_get_hostid() function. Only global zones are supported therefore this function must be called with a NULL argumment. Additionally, I've added the HW_HOSTID_LEN define and updated all instances where a hard coded magic value of 11 was used; "A good riddance of bad rubbish!"
* Coverity 9657: Resource LeakBrian Behlendorf2009-02-181-2/+2
| | | | | | | | Accidentally leaked list item li in error path. The fix is to adjust this error path to ensure the allocated list item which has not yet been added to the list gets freed. To do this we simply add a new goto label slightly earlier to use the existing cleanup logic and minimize the number of unique return points.
* Coverity 9656: Forward NULLBrian Behlendorf2009-02-181-0/+1
| | | | | | | | This was a false positive the callpath being walked is impossible because the splat_kmem_cache_test_kcp_alloc() function will ensure kcp->kcp_kcd[0] is initialized to NULL. However, there is no harm is making this explicit for the test case so I'm adding a line to clearly set it to correct the analysis.
* Coverity 9649, 9650, 9651: UninitBrian Behlendorf2009-02-181-1/+0
| | | | | | | | | | This check was originally added to detect double initializations of mutex types (which it did find). Unfortunately, Coverity is right that there is a very small chance we could trigger the assertion by accident because an uninitialized stack variable happens to contain the mutex magic. This is particularly unlikely since we do poison the mutexs when destroyed but still possible. Therefore I'm simply removing the assertion.
* kmem slab magazine ageing deadlockBrian Behlendorf2009-02-171-4/+24
| | | | | | | | | | | | | - The previous magazine ageing sceme relied on the on_each_cpu() function to call spl_magazine_age() on each cpu. It turns out this could deadlock with do_flush_tlb_all() which also relies on the IPI based on_each_cpu(). To avoid this problem a per- magazine delayed work item is created and indepentantly scheduled to the correct cpu removing the need for on_each_cpu(). - Additionally two unused fields were removed from the type spl_kmem_cache_t, they were hold overs from previous cleanup. - struct work_struct work - struct timer_list timer
* kmem slab fixesBrian Behlendorf2009-02-131-36/+36
| | | | | | | | | | | - spl_slab_reclaim() 'continue' changed back to 'break' from commit 37db7d8cf9936e6d2851a4329c11efcd9f61305c. The original was correct, I have added a comment to ensure this does not happen again. - spl_slab_reclaim() further optimized by moving the destructor call in spl_slab_free() outside the skc->skc_lock. This minimizes the length of time the spin lock is held, allows the destructors to be invoked concurrently for different objects, and as a bonus makes it safe (although unwise) to sleep in the destructors.
* kmem slab fixesBrian Behlendorf2009-02-121-34/+49
| | | | | | | | | | | | | | | | | | | | | | | | - Default SPL_KMEM_CACHE_DELAY changed to 15 to match Solaris. - Aged out slab checking occurs every SPL_KMEM_CACHE_DELAY / 3. - skc->skc_reap tunable added whichs allows callers of spl_slab_reclaim() to cap the number of slabs reclaimed. On Solaris all eligible slabs are always reclaimed, and this is still the default behavior. However, I suspect that is not always wise for reasons such as in the next comment. - spl_slab_reclaim() added cond_resched() while walking the slab/object free lists. Soft lockups were observed when freeing large numbers of vmalloc'd slabs/objets. - spl_slab_reclaim() 'sks->sks_ref > 0' check changes from incorrect 'break' to 'continue' to ensure all slabs are checked. - spl_cache_age() reworked to avoid a deadlock with do_flush_tlb_all() which occured because we slept waiting for completion in spl_cache_age(). To waiting for magazine reclamation to finish is not required so we no longer wait. - spl_magazine_create() and spl_magazine_destroy() shifted back to using for_each_online_cpu() instead of the spl_on_each_cpu() approach which was of course a bad idea due to memory allocations which Ricardo pointed out.
* Additional Linux VM integrationBrian Behlendorf2009-02-052-22/+150
| | | | | | | | | | | | | Added support for Solaris swapfs_minfree, and swapfs_reserve tunables. In additional availrmem is now available and return a reasonable value which is reasonably analogous to the Solaris meaning. On linux we return the sun of free and inactive pages since these are all easily reclaimable. All tunables are available in /proc/sys/kernel/spl/vm/* and they may need a little adjusting once we observe the real behavior. Some of the defaults are mapped to similar linux counterparts, others are straight from the OpenSolaris defaults.
* Linux VM integration / device special filesBrian Behlendorf2009-02-044-35/+262
| | | | | | | | | | | | | | Support added to provide reasonable values for the global Solaris VM variables: minfree, desfree, lotsfree, needfree. These values are set to the sum of their per-zone linux counterparts which should be close enough for Solaris consumers. When a non-GPL app links against the SPL we cannot use the udev interfaces, which means non of the device special files are created. Because of this I had added a poor mans udev which cause the SPL to invoke an upcall and create the basic devices when a minor is registered. When a minor is unregistered we use the vnode interface to unlink the special file.
* 2.6.27+ portability changesBrian Behlendorf2009-02-022-18/+11
| | | | | | | | | | | | | | - Added SPL_AC_3ARGS_ON_EACH_CPU configure check to determine if the older 4 argument version of on_each_cpu() should be used or the new 3 argument version. The retry argument was dropped in the new API which was never used anyway. - Updated work queue compatibility wrappers. The old way this worked was to pass a data point when initialized the workqueue. The new API assumed the work item is embedding in a structure and we us container_of() to find that data pointer. - Updated skc->skc_flags to be an unsigned long which is now type checked in the bit operations. This silences the warnings. - Updated autogen products and splat tests accordingly
* Make the number of system taskq threads based on the node of cores in the ↵Brian Behlendorf2009-02-021-2/+4
| | | | node, as is done for most linux system tasks
* Update thread tests to have max_timeBrian Behlendorf2009-01-301-4/+4
|
* kmem_cache hardening and performance improvementsBrian Behlendorf2009-01-303-456/+977
| | | | | | | | | | | | | | | | | | | | | | | | | - Added slab work queue task which gradually ages and free's slabs from the cache which have not been used recently. - Optimized slab packing algorithm to ensure each slab contains the maximum number of objects without create to large a slab. - Fix deadlock, we can never call kv_free() under the skc_lock. We now unlink the objects and slabs from the cache itself and attach them to a private work list. The contents of the list are then subsequently freed outside the spin lock. - Move magazine create/destroy operation on to local cpu. - Further performace optimizations by minimize the usage of the large per-cache skc_lock. This includes the addition of KMC_BIT_REAPING bit mask which is used to prevent concurrent reaping, and to defer new slab creation when reaping is occuring. - Add KMC_BIT_DESTROYING bit mask which is set when the cache is being destroyed, this is used to catch any task accessing the cache while it is being destroyed. - Add comments to all the functions and additional comments to try and make everything as clear as possible. - Major cleanup and additions to the SPLAT kmem tests to more rigerously stress the cache implementation and look for any problems. This includes correctness and performance tests. - Updated portable work queue interfaces
* Remove debug check was was accidentally left in place an prevent the slab ↵Brian Behlendorf2009-01-261-3/+0
| | | | cache from working on systems with >4 cores
* Implement kmem cache alignment argumentBrian Behlendorf2009-01-262-81/+134
|
* Sleep uninteruptibly, waking up early may result in a crashBrian Behlendorf2009-01-222-13/+14
|
* Minor fix for compiler warning when KMEM_TRACKING is enabledBrian Behlendorf2009-01-201-1/+1
|
* KMEM_TRACKING turned up a missing free in list test 6, fix the leakBrian Behlendorf2009-01-201-0/+1
|
* Ensure -NDEBUG does not get added to spl_config.h and is only set in the ↵Brian Behlendorf2009-01-201-2/+0
| | | | build options. This allows other kernel modules to use spl_config to leverage the reset of the config checks without getting confused with the debug options
* Rename modules to module and update referencesBrian Behlendorf2009-01-1534-0/+13179