openzfs/zfs.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add skc_flags and full header to /proc/spl/kmem/slab.	Brian Behlendorf	2009-12-11	1	-19/+26
\|
*	Atomic64 compatibility for 32-bit systems without kernel support.	Brian Behlendorf	2009-12-04	2	-50/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is another step towards updating the code to handle the 32-bit kernels which I have not been regularly testing. This changes do not really impact the common case I'm expected which is the latest kernel running on an x86_64 arch. Until the linux-2.6.31 kernel the x86 arch did not have support for 64-bit atomic operations. Additionally, the new atomic_compat.h support for this case was wrong because it embedded a spinlock in the atomic variable which must always and only be 64-bits total. To handle these 32-bit issues we now simply fall back to the --enable-atomic-spinlock implementation if the kernel does not provide the 64-bit atomic funcs. The second issue this patch addresses is the DEBUG_KMEM assumption that there will always be atomic64 funcs available. On 32-bit archs this may not be true, and actually that's just fine. In that case the kernel will will never be able to allocate more the 32-bits worth anyway. So just check if atomic64 funcs are available, if they are not it means this is a 32-bit machine and we can safely use atomic_t's instead.
*	Correctly handle division on 32-bit RHEL5 systems by returning dividend.	Brian Behlendorf	2009-12-01	1	-1/+3
\|
*	Ensure spl_config.h is include in spl-generic.c	Brian Behlendorf	2009-11-15	1	-0/+1
\|
*	Linux 2.6.31 kmem cache alignment fixes and cleanup.	Brian Behlendorf	2009-11-13	1	-52/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The big fix here is the removal of kmalloc() in kv_alloc(). It used to be true in previous kernels that kmallocs over PAGE_SIZE would always be pages aligned. This is no longer true atleast in 2.6.31 there are no longer any alignment expectations. Since kv_alloc() requires the resulting address to be page align we no only either directly allocate pages in the KMC_KMEM case, or directly call __vmalloc() both of which will always return a page aligned address. Additionally, to avoid wasting memory size is always a power of two. As for cleanup several helper functions were introduced to calculate the aligned sizes of various data structures. This helps ensure no case is accidentally missed where the alignment needs to be taken in to account. The helpers now use P2ROUNDUP_TYPE instead of P2ROUNDUP which is safer since the type will be explict and we no longer count on the compiler to auto promote types hopefully as we expected. Always wnforce minimum (SPL_KMEM_CACHE_ALIGN) and maximum (PAGE_SIZE) alignment restrictions at cache creation time. Use SPL_KMEM_CACHE_ALIGN in splat alignment test.
*	Remove __GFP_NOFAIL in kmem and retry internally.	Brian Behlendorf	2009-11-12	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As of 2.6.31 it's clear __GFP_NOFAIL should no longer be used and it may disappear from the kernel at any time. To handle this I have simply added _nofail wrappers in the kmem implementation which perform the retry for non-atomic allocations. From linux-2.6.31 mm/page_alloc.c:1166 / * __GFP_NOFAIL is not to be used in new code. * * All __GFP_NOFAIL callers should be fixed so that they * properly detect and handle allocation failures. * * We most definitely don't want callers attempting to * allocate greater than order-1 page units with * __GFP_NOFAIL. */ WARN_ON_ONCE(order > 1);
*	Linux 2.6.31 Compatibility Updates	Brian Behlendorf	2009-11-10	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	SPL_AC_2ARGS_SET_FS_PWD macro updated to explicitly include linux/fs_struct.h which was dropped from linux/sched.h. min_wmark_pages, low_wmark_pages, high_wmark_pages macros introduced in newer kernels. For older kernels mm_compat.h was introduced to define them as needed as direct mappings to per zone min_pages, low_pages, max_pages.
*	Autoconf --enable-debug-* cleanup	Brian Behlendorf	2009-10-30	3	-37/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cleanup the --enable-debug-* configure options, this has been pending for quite some time and I am glad I finally got to it. To summerize: 1) All SPL_AC_DEBUG_* macros were updated to be a more autoconf friendly. This mainly involved shift to the GNU approved usage of AC_ARG_ENABLE and ensuring AS_IF is used rather than directly using an if [ test ] construct. 2) --enable-debug-kmem=yes by default. This simply enabled keeping a running tally of total memory allocated and freed and reporting a memory leak if there was one at module unload. Additionally, it ensure /proc/spl/kmem/slab will exist by default which is handy. The overhead is low for this and it should not impact performance. 3) --enable-debug-kmem-tracking=no by default. This option was added to provide a configure option to enable to detailed memory allocation tracking. This support was always there but you had to know where to turn it on. By default this support is disabled because it is known to badly hurt performence, however it is invaluable when chasing a memory leak. 4) --enable-debug-kstat removed. After further reflection I can't see why you would ever really want to turn this support off. It is now always on which had the nice side effect of simplifying the proc handling code in spl-proc.c. We can now always assume the top level directory will be there. 5) --enable-debug-callb removed. This never really did anything, it was put in provisionally because it might have been needed. It turns out it was not so I am just removing it to prevent confusion.
*	Use Linux atomic primitives by default.	Brian Behlendorf	2009-10-30	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously Solaris style atomic primitives were implemented simply by wrapping the desired operation in a global spinlock. This was easy to implement at the time when I wasn't 100% sure I could safely layer the Solaris atomic primatives on the Linux counterparts. It however was likely not good for performance. After more investigation however it does appear the Solaris primitives can be layered on Linux's fairly safely. The Linux atomic_t type really just wraps a long so we can simply cast the Solaris unsigned value to either a atomic_t or atomic64_t. The only lingering problem for both implementations is that Solaris provides no atomic read function. This means reading a 64-bit value on a 32-bit arch can (and will) result in word breaking. I was very concerned about this initially, but upon further reflection it is a limitation of the Solaris API. So really we are just being bug-for-bug compatible here. With this change the default implementation is layered on top of Linux atomic types. However, because we're assuming a lot about the internal implementation of those types I've made it easy to fall-back to the generic approach. Simply build with --enable-atomic_spinlocks if issues are encountered with the new implementation.
*	I should not have removed these, they are important.	Brian Behlendorf	2009-10-27	1	-0/+2
\|
*	Rebase cmn_err on vcmn_err and don't warn about missing \n	Brian Behlendorf	2009-10-27	2	-22/+19
\| \| \| \| \| \| \| \| \| \| \|	The cmn_err/vcmn_err functions are layered on top of the debug system which usually expects a newline at the end. However, there really doesn't need to be a newline there and there in fact should not be for the CE_CONT case so let's just drop the warning. Also we make a half-hearted attempt to handle a leading ! which means only send it to the syslog not the console. In this case we just send to the the debug logs and not the console.
*	Set cwd to '/' for the process executing insmod.	Brian Behlendorf	2009-10-01	2	-3/+109
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ricardo has pointed out that under Solaris the cwd is set to '/' during module load, while under Linux it is set to the callers cwd. To handle this cleanly I've reworked the module _init()/_exit() macros so they call a _setup()/_cleanup() function when any SPL dependent module is loaded or unloaded. This gives us a chance to perform any needed modification of the process, in this case changing the cwd. It also handily provides a way to avoid creating wrapper init()/exit() functions because the Solaris and Linux prototypes differ slightly. All dependent modules should now call the spl helper macros spl_module_{init,exit}() instead of the native linux versions. Unfortunately, it appears that under Linux there has been no consistent API in the kernel to set the cwd in a module. Because of this I have had to add more autoconf magic than I'd like. However, what I have done is correct and has been tested on RHEL5, SLES11, FC11, and CHAOS kernels. In addition, I have change the rootdir type from a 'void ' to the correct 'vnode_t ' type. And I've set rootdir to a non-NULL value.
*	Reimplement mutexs for Linux lock profiling/analysis	Brian Behlendorf	2009-09-25	2	-429/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For a generic explanation of why mutexs needed to be reimplemented to work with the kernel lock profiling see commits: e811949a57044d60d12953c5c3b808a79a7d36ef and d28db80fd0fd4fd63aec09037c44408e51a222d6 The specific changes made to the mutex implemetation are as follows. The Linux mutex structure is now directly embedded in the kmutex_t. This allows a kmutex_t to be directly case to a mutex struct and passed directly to the Linux primative. Just like with the rwlocks it is critical that these functions be implemented as '#defines to ensure the location information is preserved. The preprocessor can then do a direct replacement of the Solaris primative with the linux primative. Just as with the rwlocks we need to track the lock owner. Here things get a little more interesting because depending on your kernel version, and how you've built your kernel Linux may already do this for you. If your running a 2.6.29 or newer kernel on a SMP system the lock owner will be tracked. This was added to Linux to support adaptive mutexs, more on that shortly. Alternately, your kernel might track the lock owner if you've set CONFIG_DEBUG_MUTEXES in the kernel build. If neither of the above things is true for your kernel the kmutex_t type will include and track the lock owner to ensure correct behavior. This is all handled by a new autoconf check called SPL_AC_MUTEX_OWNER. Concerning adaptive mutexs these are a very recent development and they did not make it in to either the latest FC11 of SLES11 kernels. Ideally, I'd love to see this kernel change appear in one of these distros because it does help performance. From Linux kernel commit: 0d66bf6d3514b35eb6897629059443132992dbd7 "Testing with Ingo's test-mutex application... gave a 345% boost for VFS scalability on my testbox" However, if you don't want to backport this change yourself you can still simply export the task_curr() symbol. The kmutex_t implementation will use this symbol when it's available to provide it's own adaptive mutexs. Finally, DEBUG_MUTEX support was removed including the proc handlers. This was done because now that we are cleanly integrated with the kernel profiling all this information and much much more is available in debug kernel builds. This code was now redundant. Update mutexs validated on: - SLES10 (ppc64) - SLES11 (x86_64) - CHAOS4.2 (x86_64) - RHEL5.3 (x86_64) - RHEL6 (x86_64) - FC11 (x86_64)
*	Update rwlocks to track owner to ensure correct semantics	Brian Behlendorf	2009-09-25	2	-14/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The behavior of RW__HELD was updated because it was not quite right. It is not sufficient to return non-zero when the lock is help, we must only do this when the current task in the holder. This means we need to track the lock owner which is not something tracked in a Linux semaphore. After some experimentation the solution I settled on was to embed the Linux semaphore at the start of a larger krwlock_t structure which includes the owner field. This maintains good performance and allows us to cleanly intergrate with the kernel lock analysis tools. My reasons: 1) By placing the Linux semaphore at the start of krwlock_t we can then simply cast krwlock_t to a rw_semaphore and pass that on to the linux kernel. This allows us to use '#defines so the preprocessor can do direct replacement of the Solaris primative with the linux equivilant. This is important because it then maintains the location information for each rw_ call point. 2) Additionally, by adding the owner to krwlock_t we can keep this needed extra information adjacent to the lock itself. This removes the need for a fancy lookup to get the owner which is optimal for performance. We can also leverage the existing spin lock in the semaphore to ensure owner is updated correctly. 3) All helper functions which do not need to strictly be implemented as a define to preserve location information can be done as a static inline function. 4) Adding the owner to krwlock_t allows us to remove all memory allocations done during lock initialization. This is good for all the obvious reasons, we do give up the ability to specific the lock name. The Linux profiling tools will stringify the lock name used in the code via the preprocessor and use that. Update rwlocks validated on: - SLES10 (ppc64) - SLES11 (x86_64) - CHAOS4.2 (x86_64) - RHEL5.3 (x86_64) - RHEL6 (x86_64) - FC11 (x86_64)
*	Reimplement rwlocks for Linux lock profiling/analysis.	Brian Behlendorf	2009-09-18	2	-297/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It turns out that the previous rwlock implementation worked well but did not integrate properly with the upstream kernel lock profiling/ analysis tools. This is a major problem since it would be awfully nice to be able to use the automatic lock checker and profiler. The problem is that the upstream lock tools use the pre-processor to create a lock class for each uniquely named locked. Since the rwsem was embedded in a wrapper structure the name was always the same. The effect was that we only ended up with one lock class for the entire SPL which caused the lock dependency checker to flag nearly everything as a possible deadlock. The solution was to directly map a krwlock to a Linux rwsem using a typedef there by eliminating the wrapper structure. This was not done initially because the rwsem implementation is specific to the arch. To fully implement the Solaris krwlock API using only the provided rwsem API is not possible. It can only be done by directly accessing some of the internal data member of the rwsem structure. For example, the Linux API provides a different function for dropping a reader vs writer lock. Whereas the Solaris API uses the same function and the caller does not pass in what type of lock it is. This means to properly drop the lock we need to determine if the lock is currently a reader or writer lock. Then we need to call the proper Linux API function. Unfortunately, there is no provided API for this so we must extracted this information directly from arch specific lock implementation. This is all do able, and what I did, but it does complicate things considerably. The good news is that in addition to the profiling benefits of this change. We may see performance improvements due to slightly reduced overhead when creating rwlocks and manipulating them. The only function I was forced to sacrafice was rw_owner() because this information is simply not stored anywhere in the rwsem. Luckily this appears not to be a commonly used function on Solaris, and it is my understanding it is mainly used for debugging anyway. In addition to the core rwlock changes, extensive updates were made to the rwlock regression tests. Each class of test was extended to provide more API coverage and to be more rigerous in checking for misbehavior. This is a pretty significant change and with that in mind I have been careful to validate it on several platforms before committing. The full SPLAT regression test suite was run numberous times on all of the following platforms. This includes various kernels ranging from 2.6.16 to 2.6.29. - SLES10 (ppc64) - SLES11 (x86_64) - CHAOS4.2 (x86_64) - RHEL5.3 (x86_64) - RHEL6 (x86_64) - FC11 (x86_64)
*	Update global_page_state() support for 2.6.29 kernels.	Brian Behlendorf	2009-07-28	1	-24/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Basically everything we need to monitor the global memory state of the system is now cleanly available via global_page_state(). The problem is that this interface is still fairly recent, and there has been one change in the page state enum which we need to handle. These changes basically boil down to the following: - If global_page_state() is available we should use it. Several autoconf checks have been added to detect the correct enum names. - If global_page_state() is not available check to see if get_zone_counts() symbol is available and use that. - If the get_zone_counts() symbol is not exported we have no choice be to dynamically aquire it at load time. This is an absolute last resort for old kernel which we don't want to patch to cleanly export the symbol.
*	Remove get/put_task_struct as they are not available for SLES11	Brian Behlendorf	2009-07-28	1	-10/+2
\| \| \| \| \| \| \|	This interface is going away, and it's not as if most callers actually use crhold/crfree when working with credentials. So it'll be okay they we're not taking a reference on the task structure the odds of it going away while working with a credential and pretty small.
*	Add basic credential support and splat tests.	Brian Behlendorf	2009-07-27	3	-0/+311
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous credential implementation simply provided the needed types and a couple of dummy functions needed. This update correctly ties the basic Solaris credential API in to one of two Linux kernel APIs. Prior to 2.6.29 the linux kernel embeded all credentials in the task structure. For these kernels, we pass around the entire task struct as if it were the credential, then we use the helper functions to extract the credential related bits. As of 2.6.29 a new credential type was added which we can and do fairly cleanly layer on top of. Once again the helper functions nicely hide the implementation details from all callers. Three tests were added to the splat test framework to verify basic correctness. They should be extended as needed when need credential functions are added.
*	Positive Solaris ioctl return codes need to be negated for use by libc	Brian Behlendorf	2009-07-23	1	-2/+9
\|
*	The HAVE_PATH_IN_NAMEIDATA compat macros should have been used here.	Brian Behlendorf	2009-07-22	1	-3/+3
\|
*	Register a basic compat ioctl handler (32 vs 64 bit compat)	Brian Behlendorf	2009-07-21	1	-3/+17
\| \| \| \| \| \| \| \| \| \| \|	Simply pass the ioctl on to the normal handler. If the ioctl helper macros are used correctly this should be safe as they will handle the packing/unpacking of the data encoded in the ioctl command. And actually, if the caller does not use the IO* macros at all, and just passes small values, it will probably be OK as well. We only get in to trouble if they try and use the upper 32-bits. Endianness is not really a concern here, we we are pretty much assumed they user and kernel will match.
*	Prevent integer overflow after ~164 days of uptime.	Ricardo M. Correia	2009-07-14	1	-1/+1
\| \| \| \|	Signed-off-by: Brian Behlendorf <[email protected]>
*	Add ddi_copyin/ddi_copyout support for fake kernel originated ioctls.	Brian Behlendorf	2009-07-10	1	-0/+27
\|
*	Add basic support for TASKQ_THREADS_CPU_PCT taskq flag which is	Brian Behlendorf	2009-07-09	1	-0/+9
\| \| \| \| \| \|	used to scale the number of threads based on the number of online CPUs. As CPUs are added/removed we should rescale the thread count appropriately, but currently this is only done at create.
*	Use do_div on older kernel where do_div64 doesn't exist.	Brian Behlendorf	2009-06-26	1	-1/+1
\|
*	SLES10 Fixes (part 7)	Brian Behlendorf	2009-05-20	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	- Initial SLES testing uncovered a long standing bug in the debug tracing. The tcd_for_each() macro expected a NULL to terminate the trace_data[i] array but this was only ever true due to luck. All trace_data[] iterators are now properly capped by TCD_TYPE_MAX. - SPLAT_MAJOR 229 conflicted with a 'hvc' device on my SLES system. Since this was always an arbitrary choice I picked something else. - The HAVE_PGDAT_LIST case should set pgdat_list_addr to the value stored at the address of the memory location returned by kallsyms_lookup_name().
*	SLES10 Fixes (part 6)	Brian Behlendorf	2009-05-20	1	-12/+35
\| \| \| \| \| \| \| \| \|	- Prior to 2.6.17 there were no *_pgdat helper functions in mm/mmzone.c. Instead for_each_zone() operated directly on pgdat_list which may or may not have been exported depending on how your kernel was compiled. Now new configure checks determine if you have the helpers or not, and if the needed symbols are exported. If they are not exported then they are dynamically aquired at runtime by kallsyms_lookup_name().
*	SLES10 Fixes (part 4):	Brian Behlendorf	2009-05-20	1	-5/+14
\| \| \| \| \| \| \|	- Configure check for SLES specific API change to vfs_unlink() and vfs_rename() which added a 'struct vfsmount *' argument. This was for something called the linux-security-module, but it appears that it was never adopted upstream.
*	SLES10 Fixes (part 2):	Brian Behlendorf	2009-05-20	4	-31/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Configure check, the div64_64() function was renamed to div64_u64() as of 2.6.26. - Configure check, the global_page_state() fuction was introduced in 2.6.18 kernels. The earlier 2.6.16 based SLES10 must not try and use it, thankfully get_zone_counts() is still available. - To simplify debugging poison all symbols aquired dynamically using spl_kallsyms_lookup_name() with SYMBOL_POISON. - Add console messages when the user mode helpers fail. - spl_kmem_init_globals() use bit shifts instead of division. - When the monotonic clock is unavailable __gethrtime() must perform the HZ division as an 'unsigned long long' because the SPL only implements __udivdi3(), and not __divdi3() for 'long long' division on 32-bit arches.
*	Allow spl_config.h to be included by dependant packages	Brian Behlendorf	2009-03-17	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need dependent packages to be able to include spl_config.h so they can leverage the configure checks the SPL has done. This is important because several of the spl headers need the results of these checks to work properly. Unfortunately, the autoheader build product is always private to a particular build and defined certain common things. (PACKAGE, VERSION, etc). This prevents other packages which also use autoheader from being include because the definitions conflict. To avoid this problem the SPL build system leverage AH_BOTTOM to include a spl_unconfig.h at the botton of the autoheader build product. This custom include undefs all known shared symbols to prevent the confict. This does however mean that those definition are also not availble to the SPL package either. The SPL package therefore uses the equivilant SPL_META_* definitions.
*	FC10/i686 Compatibility Update (2.6.27.19-170.2.35.fc10.i686)	Brian Behlendorf	2009-03-17	1	-13/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the interests of portability I have added a FC10/i686 box to my list of development platforms. The hope is this will allow me to keep current with upstream kernel API changes, and at the same time ensure I don't accidentally break x86 support. This patch resolves all remaining issues observed under that environment. 1) SPL_AC_ZONE_STAT_ITEM_FIA autoconf check added. As of 2.6.21 the kernel added a clean API for modules to get the global count for free, inactive, and active pages. The SPL attempts to detect if this API is available and directly map spl_global_page_state() to global_page_state(). If the full API is not available then spl_global_page_state() is implemented as a thin layer to get these values via get_zone_counts() if that symbol is available. 2) New kmem:vmem_size regression test added to validate correct vmem_size() functionality. The test case acquires the current global vmem state, allocates from the vmem region, then verifies the allocation is correctly reflected in the vmem_size() stats. 3) Change splat_kmem_cache_thread_test() to always use KMC_KMEM based memory. On x86 systems with limited virtual address space failures resulted due to exhaustig the address space. The tests really need to problem exhausting all memory on the system thus we need to use the physical address space. 4) Change kmem:slab_lock to cap it's memory usage at availrmem instead of using the native linux nr_free_pages(). This provides additional test coverage of the SPL Linux VM integration. 5) Change kmem:slab_overcommit to perform allocation of 256K instead of 1M. On x86 based systems it is not possible to create a kmem backed slab with entires of that size. To compensate for this the number of allocations performed in increased by 4x. 6) Additional autoconf documentation for proposed upstream API changes to make additional symbols available to modules. 7) Console error messages added when spl_kallsyms_lookup_name() fails to locate an expected symbol. This causes the module to fail to load and we need to know exactly which symbol was not available.
*	Fix taskq_wait() not waiting bug	Brian Behlendorf	2009-03-15	1	-12/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm very surprised this has not surfaced until now. But the taskq_wait() implementation work only wait successfully the first time it was called. Subsequent usage of taskq_wait() on the taskq would not wait. The issue was caused by tq->tq_lowest_id being set to MAX_INT after the first wait completed. This caused subsequent waits which check that the waiting id is less than the lowest taskq id to always succeed. The fix is to ensure that tq->tq_lowest_id is never set larger than tq->tq_next.id. Additional fixes which were added to this patch include: 1) Fix a race by placing the taskq_wait_check() in the tq->tq_lock spinlock. 2) taskq_wait() should wait for the largest outstanding id. 3) Multiple spelling corrections. 4) Added taskq wait regression test to validate correct behavior.
*	Fix off-by-1 truncation of hw_serial when converting from integer to string, ↵	Ricardo M. Correia	2009-03-12	1	-9/+10
\| \| \| \| \| \| \| \| \| \| \| \|	when writing to /proc/sys/kernel/spl/spl_hostid. Fixes hostid mismatch which leads to assertion failure when the hostid/hw_serial is a 10-character decimal number: $ zpool status pool: lustre state: ONLINE lt-zpool: zpool_main.c:3176: status_callback: Assertion `reason == ZPOOL_STATUS_OK' failed. zsh: 5262 abort zpool status
*	Minor bug fix in XDR code introduced in last minute change before landing.	Ricardo M. Correia	2009-03-11	1	-2/+2
\| \| \| \| \|	1) Removed xdr_bytesrec typedef which has no consumers. If we re-add it should also probably be xdr_bytesrec_t.
*	Add XDR implementation	Ricardo M. Correia	2009-03-11	2	-0/+518
\| \| \| \| \|	Added proper XDR implementation (Lustre bug 17662), needed for on-disk compatibility between platforms of different endianness.
*	Build system and packaging (RPM support)	Brian Behlendorf	2009-03-09	2	-30/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An update to the build system to properly support all commonly used Makefile targets these include: make all # Build everything make install # Install everything make clean # Clean up build products make distclean # Clean up everything make dist # Create package tarball make srpm # Create package source RPM make rpm # Create package binary RPMs make tags # Create ctags and etags for everything Extra care was taken to ensure that the source RPMs are fully rebuildable against Fedora/RHEL/Chaos kernels. To build binary RPMs from the source RPM for your system simply run: rpmbuild --rebuild spl-x.y.z-1.src.rpm This will produce two binary RPMs with correct 'requires' dependencies for your kernel. One will contain all spl modules and support utilities, the other is a devel package for compiling additional kernel modules which are dependant on the spl. spl-x.y.z-1_<kernel version>.x86_64.rpm spl-devel-x.y.2-1_<kernel version>.x86_64.rpm
*	XXX: Temporarily disable vmem_size().	Ricardo M. Correia	2009-03-05	1	-0/+2
\|
*	Linux VM Integration Cleanup	Brian Behlendorf	2009-03-04	3	-72/+175
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove all instances of functions being reimplemented in the SPL. When the prototypes are available in the linux headers but the function address itself is not exported use kallsyms_lookup_name() to find the address. The function name itself can them become a define which calls a function pointer. This is preferable to reimplementing the function in the SPL because it ensures we get the correct version of the function for the running kernel. This is actually pretty safe because the prototype is defined in the headers so we know we are calling the function properly. This patch also includes a rhel5 kernel patch we exports the needed symbols so we don't need to use kallsyms_lookup_name(). There are autoconf checks to detect if the symbol is exported and if so to use it directly. We should add patches for stock upstream kernels as needed if for no other reason than so we can easily track which additional symbols we needed exported. Those patches can also be used by anyone willing to rebuild their kernel, but this should not be a requirement. The rhel5 version of the export-symbols patch has been applied to the chaos kernel. Additional fixes: 1) Implement vmem_size() function using get_vmalloc_info() 2) SPL_CHECK_SYMBOL_EXPORT macro updated to use $LINUX_OBJ instead of $LINUX because Module.symvers is a build product. When $LINUX_OBJ != $LINUX we will not properly detect exported symbols. 3) SPL_LINUX_COMPILE_IFELSE macro updated to add include2 and $LINUX/include search paths to allow proper compilation when the kernel target build directory is not the source directory.
*	Add zone_get_hostid() function	Brian Behlendorf	2009-02-19	2	-2/+19
\| \| \| \| \| \| \| \|	Minimal support added for the zone_get_hostid() function. Only global zones are supported therefore this function must be called with a NULL argumment. Additionally, I've added the HW_HOSTID_LEN define and updated all instances where a hard coded magic value of 11 was used; "A good riddance of bad rubbish!"
*	Coverity 9649, 9650, 9651: Uninit	Brian Behlendorf	2009-02-18	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	This check was originally added to detect double initializations of mutex types (which it did find). Unfortunately, Coverity is right that there is a very small chance we could trigger the assertion by accident because an uninitialized stack variable happens to contain the mutex magic. This is particularly unlikely since we do poison the mutexs when destroyed but still possible. Therefore I'm simply removing the assertion.
*	kmem slab magazine ageing deadlock	Brian Behlendorf	2009-02-17	1	-4/+24
\| \| \| \| \| \| \| \| \| \| \| \| \|	- The previous magazine ageing sceme relied on the on_each_cpu() function to call spl_magazine_age() on each cpu. It turns out this could deadlock with do_flush_tlb_all() which also relies on the IPI based on_each_cpu(). To avoid this problem a per- magazine delayed work item is created and indepentantly scheduled to the correct cpu removing the need for on_each_cpu(). - Additionally two unused fields were removed from the type spl_kmem_cache_t, they were hold overs from previous cleanup. - struct work_struct work - struct timer_list timer
*	kmem slab fixes	Brian Behlendorf	2009-02-13	1	-36/+36
\| \| \| \| \| \| \| \| \| \| \|	- spl_slab_reclaim() 'continue' changed back to 'break' from commit 37db7d8cf9936e6d2851a4329c11efcd9f61305c. The original was correct, I have added a comment to ensure this does not happen again. - spl_slab_reclaim() further optimized by moving the destructor call in spl_slab_free() outside the skc->skc_lock. This minimizes the length of time the spin lock is held, allows the destructors to be invoked concurrently for different objects, and as a bonus makes it safe (although unwise) to sleep in the destructors.
*	kmem slab fixes	Brian Behlendorf	2009-02-12	1	-34/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Default SPL_KMEM_CACHE_DELAY changed to 15 to match Solaris. - Aged out slab checking occurs every SPL_KMEM_CACHE_DELAY / 3. - skc->skc_reap tunable added whichs allows callers of spl_slab_reclaim() to cap the number of slabs reclaimed. On Solaris all eligible slabs are always reclaimed, and this is still the default behavior. However, I suspect that is not always wise for reasons such as in the next comment. - spl_slab_reclaim() added cond_resched() while walking the slab/object free lists. Soft lockups were observed when freeing large numbers of vmalloc'd slabs/objets. - spl_slab_reclaim() 'sks->sks_ref > 0' check changes from incorrect 'break' to 'continue' to ensure all slabs are checked. - spl_cache_age() reworked to avoid a deadlock with do_flush_tlb_all() which occured because we slept waiting for completion in spl_cache_age(). To waiting for magazine reclamation to finish is not required so we no longer wait. - spl_magazine_create() and spl_magazine_destroy() shifted back to using for_each_online_cpu() instead of the spl_on_each_cpu() approach which was of course a bad idea due to memory allocations which Ricardo pointed out.
*	Additional Linux VM integration	Brian Behlendorf	2009-02-05	2	-22/+150
\| \| \| \| \| \| \| \| \| \| \| \| \|	Added support for Solaris swapfs_minfree, and swapfs_reserve tunables. In additional availrmem is now available and return a reasonable value which is reasonably analogous to the Solaris meaning. On linux we return the sun of free and inactive pages since these are all easily reclaimable. All tunables are available in /proc/sys/kernel/spl/vm/* and they may need a little adjusting once we observe the real behavior. Some of the defaults are mapped to similar linux counterparts, others are straight from the OpenSolaris defaults.
*	Linux VM integration / device special files	Brian Behlendorf	2009-02-04	4	-35/+262
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support added to provide reasonable values for the global Solaris VM variables: minfree, desfree, lotsfree, needfree. These values are set to the sum of their per-zone linux counterparts which should be close enough for Solaris consumers. When a non-GPL app links against the SPL we cannot use the udev interfaces, which means non of the device special files are created. Because of this I had added a poor mans udev which cause the SPL to invoke an upcall and create the basic devices when a minor is registered. When a minor is unregistered we use the vnode interface to unlink the special file.
*	2.6.27+ portability changes	Brian Behlendorf	2009-02-02	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Added SPL_AC_3ARGS_ON_EACH_CPU configure check to determine if the older 4 argument version of on_each_cpu() should be used or the new 3 argument version. The retry argument was dropped in the new API which was never used anyway. - Updated work queue compatibility wrappers. The old way this worked was to pass a data point when initialized the workqueue. The new API assumed the work item is embedding in a structure and we us container_of() to find that data pointer. - Updated skc->skc_flags to be an unsigned long which is now type checked in the bit operations. This silences the warnings. - Updated autogen products and splat tests accordingly
*	Make the number of system taskq threads based on the node of cores in the ↵	Brian Behlendorf	2009-02-02	1	-2/+4
\| \| \| \|	node, as is done for most linux system tasks
*	kmem_cache hardening and performance improvements	Brian Behlendorf	2009-01-30	1	-140/+325
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Added slab work queue task which gradually ages and free's slabs from the cache which have not been used recently. - Optimized slab packing algorithm to ensure each slab contains the maximum number of objects without create to large a slab. - Fix deadlock, we can never call kv_free() under the skc_lock. We now unlink the objects and slabs from the cache itself and attach them to a private work list. The contents of the list are then subsequently freed outside the spin lock. - Move magazine create/destroy operation on to local cpu. - Further performace optimizations by minimize the usage of the large per-cache skc_lock. This includes the addition of KMC_BIT_REAPING bit mask which is used to prevent concurrent reaping, and to defer new slab creation when reaping is occuring. - Add KMC_BIT_DESTROYING bit mask which is set when the cache is being destroyed, this is used to catch any task accessing the cache while it is being destroyed. - Add comments to all the functions and additional comments to try and make everything as clear as possible. - Major cleanup and additions to the SPLAT kmem tests to more rigerously stress the cache implementation and look for any problems. This includes correctness and performance tests. - Updated portable work queue interfaces
*	Remove debug check was was accidentally left in place an prevent the slab ↵	Brian Behlendorf	2009-01-26	1	-3/+0
\| \| \| \|	cache from working on systems with >4 cores
*	Implement kmem cache alignment argument	Brian Behlendorf	2009-01-26	1	-70/+97
\|