aboutsummaryrefslogtreecommitdiffstats
path: root/module/spl/spl-kmem.c
Commit message (Collapse)AuthorAgeFilesLines
* Detect kernels that honor gfp flags passed to vmalloc()Richard Yao2012-07-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | zfsonlinux/spl@2092cf68d89a51eb0d6193aeadabb579dfc4b4a0 used PF_MEMALLOC to workaround a bug in the Linux kernel where allocations did not honor the gfp flags passed to vmalloc(). Unfortunately, PF_MEMALLOC has the side effect of permitting allocations to allocate pages outside of ZONE_NORMAL. This has been observed to result in the depletion of ZONE_DMA32. A kernel patch is available in the Gentoo bug tracker for this issue. https://bugs.gentoo.org/show_bug.cgi?id=416685 This negates any benefit PF_MEMALLOC provides, so we introduce an autotools check to disable the use of PF_MEMALLOC on systems with patched kernels. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #126
* Constify memory management functionsRichard Yao2012-07-031-5/+5
| | | | | | | | | | This prevents warnings in ZFS that were caused by changes necessary to support PaX patched kernels. When debugging is enabled, these warnings become build failures. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #131
* Ensure a minimum of one slab is reclaimedBrian Behlendorf2012-05-071-2/+32
| | | | | | | | | | | | | | | | | To minimize the chance of triggering an OOM during direct reclaim. The kmem caches have been improved to make a best effort to reclaim at least one slab when a reclaim function is registered. This helps avoid the case where objects are released but they are spread over multiple slabs so no memory gets reclaimed. Care has been taken to avoid deadlocking if the reclaim function is unable to make forward progress. Additionally, the reclaim function may be skipped entirely if there are already free slabs which can be safely reaped. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #107
* Ensure direct reclaim forward progressBrian Behlendorf2012-05-071-0/+10
| | | | | | | | | | | | | | | | | | The Linux direct reclaim path uses this out of band value to determine if forward progress is being made. Normally this is incremented by kmem_freepages() which is part of the various Linux slab implementations. However, since we are using none of that infrastructure we're responsible for incrementing this count. If no forward progress is detected and a subsequent allocation fails the OOM killer will be invoked. If there was forward progress additional reclaim will be attempted via the page cache and registerd shrinker until the allocation succeeds. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #107
* Ignore slab cache age and delay in direct reclaimPrakash Surya2012-05-071-1/+2
| | | | | | | | | | | | | | | When memory pressure triggers direct memory reclaim, a slabs age and delay should not prevent it from being freed. This patch ensures these values are ignored, allowing an empty slab to be freed in this code path no matter the value of its age and delay. This prevents needless scanning of the partial slabs and has been observed to significantly reduce the total cpu usage. In addition, it should allow for snappier reclaim under memory pressure. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #102
* Throttle number of freed slabs based on nr_to_scanPrakash Surya2012-05-071-6/+8
| | | | | | | | | | | | | | | | Previously, the SPL tried to maintain Solaris semantics by freeing all available (empty) slabs from its slab caches when the shrinker was called. This is not desirable when running on Linux. To make the SPL shrinker more Linux friendly, the actual number of freed slabs from each of the slab caches is now derived from nr_to_scan and skc_slab_objs. Additionally, an accounting bug was fixed in spl_slab_reclaim() which could cause us to reclaim one more slab than requested. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #101
* Add --enable-debug-log configure optionBrian Behlendorf2012-02-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Until now the notion of an internal debug logging infrastructure was conflated with enabling ASSERT()s. This patch clarifies things by cleanly breaking the two subsystem apart. The result of this is the following behavior. --enable-debug - Enable/disable code wrapped in ASSERT()s. --disable-debug ASSERT()s are used to check invariants and are never required for correct operation. They are disabled by default because they may impact performance. --enable-debug-log - Enable/disable the debug log infrastructure. --disable-debug-log This infrastructure allows the spl code and its consumer to log messages to an in-kernel log. The granularity of the logging can be controlled by a debug mask. By default the mask disables most debug messages resulting in a negligible performance impact. Because of this the debug log is enabled by default. Signed-off-by: Brian Behlendorf <[email protected]>
* Proxmox VE kernel compat, invalidate_inodes()Brian Behlendorf2011-12-211-4/+4
| | | | | | | | | | | | The Proxmox VE kernel contains a patch which renames the function invalidate_inodes() to invalidate_inodes_check(). In the process it adds a 'check' argument and a '#define invalidate_inodes(x)' compatibility wrapper for legacy callers. Therefore, if either of these functions are exported invalidate_inodes() can be safely used. Signed-off-by: Brian Behlendorf <[email protected]> Closes #58
* Linux 3.1 compat, shrink_*cache_memoryBrian Behlendorf2011-11-091-10/+4
| | | | | | | | | | | | | | | | | | | | | | As of Linux 3.1 the shrink_dcache_memory and shrink_icache_memory functions have been removed. This same task is now accomplished more cleanly with per super block shrinkers. This unfortunately leaves us no easy way to support the dnlc_reduce_cache() function. This support has always been entirely optional. So when no reasonable interface is available allow the dnlc_reduce_cache() function to effectively become a no-op. The downside of this change is that it will prevent the zfs arc meta data limts from being enforced. However, the current zfs implementation in this regard is already flawed and needs to be reworked. If the arc needs to enfore a meta data limit it will need to be extended to coordinate directly with the zpl. This will allow us to drop all this compatibility code and get more fine grained control over the cache management. Signed-off-by: Brian Behlendorf <[email protected]> Issue #52
* Fix NULL deref in balance_pgdat()Brian Behlendorf2011-11-031-5/+8
| | | | | | | | | | | | | | | Be careful not to unconditionally clear the PF_MEMALLOC bit in the task structure. It may have already been set when entering kv_alloc() in which case it must remain set on exit. In particular the kswapd thread will have PF_MEMALLOC set in order to prevent it from entering direct reclaim. By clearing it we allow the following NULL deref to potentially occur. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8109c7ab>] balance_pgdat+0x25b/0x4ff Signed-off-by: Brian Behlendorf <[email protected]> Closes ZFS issue #287
* Fix various typos in commentsBrian Behlendorf2011-10-111-27/+27
| | | | | | | Just clean up some of the typos and spelling mistakes in the comments of spl-kmem.c. Signed-off-by: Brian Behlendorf <[email protected]>
* Fixed typo in spl_slab_alloc()Gunnar Beutner2011-10-111-1/+1
| | | | | | | | The typo did not have any effect (apart from a negligible performance impact) because skc->skc_flags * KMC_OFFSLAB is always non-null when at least one bit in skc->skc_flags is set. Signed-off-by: Brian Behlendorf <[email protected]>
* Properly destroy work items in spl_kmem_cache_destroy()Gunnar Beutner2011-10-111-3/+3
| | | | | | | | | | | | | In a non-debug build the ASSERT() would be optimized away which could cause pending work items to not be cancelled. We must also use cancel_delayed_work_sync() rather than just cancel_delayed_work() to actually wait until work items have completed. Otherwise they might accidentally access free'd memory. Signed-off-by: Brian Behlendorf <[email protected]> Closes ZFS bugs #279, #62, #363, #418
* Linux 3.0: Shrinker compatibilityBrian Behlendorf2011-06-211-6/+13
| | | | | | | | Update the the wrapper macros for the memory shrinker to handle this 4th API change. The callback function now takes a shrink_control structure. This is certainly a step in the right direction but it's annoying to have to accomidate yet another version of the API.
* Linux 2.6.39 compat, invalidate_inodes()Brian Behlendorf2011-04-191-1/+1
| | | | | | | | To resolve a potiential filesystem corruption issue a second argument was added to invalidate_inodes(). This argument controls whether dirty inodes are dropped or treated as busy when invalidating a super block. When only the legacy API is available the second argument will be dropped for compatibility.
* Add dnlc_reduce_cache() supportBrian Behlendorf2011-04-061-0/+28
| | | | | | | | | | | | | Provide the dnlc_reduce_cache() function which attempts to prune cached entries from the dcache and icache. After the entries are pruned any slabs which they may have been using are reaped. Note the API takes a reclaim percentage but we don't have easy access to the total number of cache entries to calculate the reclaim count. However, in practice this doesn't need to be exactly correct. We simply need to reclaim some useful fraction (but not all) of the cache. The caller can determine if more needs to be done.
* Linux shrinker compatBrian Behlendorf2011-04-061-41/+9
| | | | | | | | | | | | | | The Linux shrinker has gone through three API changes since 2.6.22. Rather than force every caller to understand all three APIs this change consolidates the compatibility code in to the mm-compat.h header. The caller then can then use a single spl provided shrinker API which does the right thing for your kernel. SPL_SHRINKER_CALLBACK_PROTO(shrinker_callback, cb, nr_to_scan, gfp_mask); SPL_SHRINKER_DECLARE(shrinker_struct, shrinker_callback, seeks); spl_register_shrinker(&shrinker_struct); spl_unregister_shrinker(&&shrinker_struct); spl_exec_shrinker(&shrinker_struct, nr_to_scan, gfp_mask);
* Disable vmalloc() direct reclaimBrian Behlendorf2011-03-201-2/+22
| | | | | | | | | | | | | | | | | | | As part of vmalloc() a __pte_alloc_kernel() allocation may occur. This internal allocation does not honor the gfp flags passed to vmalloc(). This means even when vmalloc(GFP_NOFS) is called it is possible that a synchronous reclaim will occur. This reclaim can trigger file IO which can result in a deadlock. This issue can be avoided by explicitly setting PF_MEMALLOC on the process to subvert synchronous reclaim when vmalloc() is called with !__GFP_FS. An example stack of the deadlock can be found here (1), along with the upstream kernel bug (2), and the original bug discussion on the linux-mm mailing list (3). This code can be properly autoconf'ed when the upstream bug is fixed. 1) http://github.com/behlendorf/zfs/issues/labels/Vmalloc#issue/133 2) http://bugzilla.kernel.org/show_bug.cgi?id=30702 3) http://marc.info/?l=linux-mm&m=128942194520631&w=4
* Linux compat 2.6.37, invalidate_inodes()Brian Behlendorf2011-02-231-0/+14
| | | | | | | | | | | | | | | | | | In the 2.6.37 kernel the function invalidate_inodes() is no longer exported for use by modules. This memory management functionality is needed to invalidate the inodes attached to a super block without unmounting the filesystem. Because this function still exists in the kernel and the prototype is available is a common header all we strictly need is the symbol address. The address is obtained using spl_kallsyms_lookup_name() and assigned to the variable invalidate_inodes_fn. Then a #define is used to replace all instances of invalidate_inodes() with a call to the acquired address. All the complexity is hidden behind HAVE_INVALIDATE_INODES and invalidate_inodes() can be used as usual. Long term we should try to get this, or another, interface made available to modules again.
* Fix 2.6.35 shrinker callback API changeBrian Behlendorf2010-10-221-3/+18
| | | | | | | | | | | | | As of linux-2.6.35 the shrinker callback API now takes an additional argument. The shrinker struct is passed to the callback so that users can embed the shrinker structure in private data and use container_of() to access it. This removes the need to always use global state for the shrinker. To handle this we add the SPL_AC_3ARGS_SHRINKER_CALLBACK autoconf check to properly detect the API. Then we simply setup a callback function with the correct number of arguments. For now we do not make use of the new 3rd argument.
* Stub out kmem cache defrag APIBrian Behlendorf2010-08-271-0/+12
| | | | | | | | | At some point we are going to need to implement the kmem cache move callbacks to allow for kmem cache defragmentation. This commit simply lays a small part of the API ground work, it does not actually implement any of this feature. This is safe for now because the move callbacks are just an optimization. Even if they are registered we don't ever really have to call them.
* Strfree() should call kfree() not kmem_free()Brian Behlendorf2010-07-301-1/+1
| | | | | | | | | | | | Using kmem_free() results in deducting X bytes from the memory accounting when --enable-debug is set. Unfortunately, currently the counterpart kmem_asprintf() and friends do not properly account for memory allocated, so we must do the same on free. If we don't then we end up with a negative number of lost bytes reported when the module is unloaded. A better long term fix would be to add the accounting in to the allocation side but that's a project for another day.
* Ensure kmem_alloc() and vmem_alloc() never failBrian Behlendorf2010-07-261-105/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Solaris semantics for kmem_alloc() and vmem_alloc() are that they must never fail when called with KM_SLEEP. They may only fail if called with KM_NOSLEEP otherwise they must block until memory is available. This is quite different from how the Linux memory allocators work, under Linux a memory allocation failure is always possible and must be dealt with. At one point in the past the kmem code did properly implement this behavior, however as the code evolved this behavior was overlooked in places. This patch goes through all three implementations of the kmem/vmem allocation functions and ensures that they will all block in the KM_SLEEP case when memory is not available. They may still fail in the KM_NOSLEEP case in which case the caller is responsible for handling the failure. Special care is taken in vmalloc_nofail() to avoid thrashing the system on the virtual address space spin lock. The down side of course is if you do see a failure here, which is unlikely for 64-bit systems, your allocation will delay for an entire second. Still this is preferable to locking up your system and it is the best we can do given the constraints. Additionally, the code was cleaned up to be much more readable and comments were added to describe the various kmem-debug-* configure options. The default configure options remain: "--enable-debug-kmem --disable-debug-kmem-tracking"
* Fix buggy kmem_{v}asprintf() functionsRicardo M. Correia2010-07-201-4/+4
| | | | | | | | | | When the kvasprintf() call fails they should reset the arguments by calling va_start()/va_copy() and va_end() inside the loop, otherwise they'll try to read more arguments rather than starting over and reading them from the beginning. Signed-off-by: Ricardo M. Correia <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Prefix all SPL debug macros with 'S'Brian Behlendorf2010-07-201-100/+100
| | | | | | | | To avoid conflicts with symbols defined by dependent packages all debugging symbols have been prefixed with a 'S' for SPL. Any dependent package needing to integrate with the SPL debug should include the spl-debug.h header and use the 'S' prefixed macros. They must also build with DEBUG defined.
* Split <sys/debug.h> headerBrian Behlendorf2010-07-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | To avoid symbol conflicts with dependent packages the debug header must be split in to several parts. The <sys/debug.h> header now only contains the Solaris macro's such as ASSERT and VERIFY. The spl-debug.h header contain the spl specific debugging infrastructure and should be included by any package which needs to use the spl logging. Finally the spl-trace.h header contains internal data structures only used for the log facility and should not be included by anythign by spl-debug.c. This way dependent packages can include the standard Solaris headers without picking up any SPL debug macros. However, if the dependant package want to integrate with the SPL debugging subsystem they can then explicitly include spl-debug.h. Along with this change I have dropped the CHECK_STACK macros because the upstream Linux kernel now has much better stack depth checking built in and we don't need this complexity. Additionally SBUG has been replaced with PANIC and provided as part of the Solaris macro set. While the Solaris version is really panic() that conflicts with the Linux kernel so we'll just have to make due to PANIC. It should rarely be called directly, the prefered usage would be an ASSERT or VERIFY. There's lots of change here but this cleanup was overdue.
* Add kmem_vasprintf functionBrian Behlendorf2010-06-241-4/+20
| | | | | | We might as well have both asprintf() variants. This allows us to safely pass a va_list through several levels of the stack using va_copy() instead of va_start().
* Update warnings in kmem debug codeBrian Behlendorf2010-06-161-36/+49
| | | | | | | | | This fix was long overdue. Most of the ground work was laid long ago to include the exact function and line number in the error message which there was an issue with a memory allocation call. However, probably due to lack of time at the moment that informatin never made it in to the error message. This patch fixes that and trys to standardize the kmem debug messages as well.
* Add kmem_asprintf(), strfree(), strdup(), and minor cleanup.Brian Behlendorf2010-06-111-0/+46
| | | | | | | | | | | This patch adds three missing Solaris functions: kmem_asprintf(), strfree(), and strdup(). They are all implemented as a thin layer which just calls their Linux counterparts. As part of this an autoconf check for kvasprintf was added because it does not appear in older kernels. If the kernel does not provide it then spl-generic implements it. Additionally the dead DEBUG_KMEM_UNIMPLEMENTED code was removed to clean things up and make the kmem.h a little more readable.
* Use KM_NODEBUG macro in preference to __GFP_NOWARN.Brian Behlendorf2010-05-201-5/+5
|
* Remove kmem_set_warning() interface replace with __GFP_NOWARN flag.Brian Behlendorf2010-05-191-11/+13
| | | | | | | | | | | | | | | | | Remove the kmem_set_warning() hack used by the kmem-splat regression tests with a per-allocation flag called __GFP_NOWARN. This matches the lower level linux flag of similar by slightly different function. The idea is you can then explicitly set this flag on requests where you know your breaking the max 8k rule but you need/want to do it anyway. This is currently used by the regression tests where we intentionally push things to the limit but don't want the log noise. Additionally, we are forced to use it in spl_kmem_cache_create() because by default NR_CPUS is very large and theres no easy way to handle that. Finally, I've added a stack_dump() call to the warning when it is trigger to make to clear exactly where the allocation is taking place.
* Public Release PrepBrian Behlendorf2010-05-171-17/+17
| | | | | | Updated AUTHORS, COPYING, DISCLAIMER, and INSTALL files. Added standardized headers to all source file to clearly indicate the copyright, license, and to give credit where credit is due.
* Reduce max kmem based slab sizeBrian Behlendorf2010-03-181-1/+1
| | | | | | | | Allowing MAX_ORDER-1 sized allocations for kmem based slabs have been observed to result in deadlocks. To help prvent this limit max kmem based slab size to MAX_ORDER-3. Just for the record callers should not be creating slabs like this, but if they do we should still handle it as safely as we can.
* Strip __GFP_ZERO from kmalloc it is not available for older kernels.Brian Behlendorf2009-12-231-1/+2
| | | | | This is needed to avoid a BUG_ON() on RHEL5.4 kernel 2.6.18-164.6.1, since __GFP_ZERO is not a valid flag for kmalloc().
* Atomic64 compatibility for 32-bit systems without kernel support.Brian Behlendorf2009-12-041-46/+46
| | | | | | | | | | | | | | | | | | | | | This patch is another step towards updating the code to handle the 32-bit kernels which I have not been regularly testing. This changes do not really impact the common case I'm expected which is the latest kernel running on an x86_64 arch. Until the linux-2.6.31 kernel the x86 arch did not have support for 64-bit atomic operations. Additionally, the new atomic_compat.h support for this case was wrong because it embedded a spinlock in the atomic variable which must always and only be 64-bits total. To handle these 32-bit issues we now simply fall back to the --enable-atomic-spinlock implementation if the kernel does not provide the 64-bit atomic funcs. The second issue this patch addresses is the DEBUG_KMEM assumption that there will always be atomic64 funcs available. On 32-bit archs this may not be true, and actually that's just fine. In that case the kernel will will never be able to allocate more the 32-bits worth anyway. So just check if atomic64 funcs are available, if they are not it means this is a 32-bit machine and we can safely use atomic_t's instead.
* Linux 2.6.31 kmem cache alignment fixes and cleanup.Brian Behlendorf2009-11-131-52/+92
| | | | | | | | | | | | | | | | | | | | | | | The big fix here is the removal of kmalloc() in kv_alloc(). It used to be true in previous kernels that kmallocs over PAGE_SIZE would always be pages aligned. This is no longer true atleast in 2.6.31 there are no longer any alignment expectations. Since kv_alloc() requires the resulting address to be page align we no only either directly allocate pages in the KMC_KMEM case, or directly call __vmalloc() both of which will always return a page aligned address. Additionally, to avoid wasting memory size is always a power of two. As for cleanup several helper functions were introduced to calculate the aligned sizes of various data structures. This helps ensure no case is accidentally missed where the alignment needs to be taken in to account. The helpers now use P2ROUNDUP_TYPE instead of P2ROUNDUP which is safer since the type will be explict and we no longer count on the compiler to auto promote types hopefully as we expected. Always wnforce minimum (SPL_KMEM_CACHE_ALIGN) and maximum (PAGE_SIZE) alignment restrictions at cache creation time. Use SPL_KMEM_CACHE_ALIGN in splat alignment test.
* Remove __GFP_NOFAIL in kmem and retry internally.Brian Behlendorf2009-11-121-9/+9
| | | | | | | | | | | | | | | | | | | | As of 2.6.31 it's clear __GFP_NOFAIL should no longer be used and it may disappear from the kernel at any time. To handle this I have simply added *_nofail wrappers in the kmem implementation which perform the retry for non-atomic allocations. From linux-2.6.31 mm/page_alloc.c:1166 /* * __GFP_NOFAIL is not to be used in new code. * * All __GFP_NOFAIL callers should be fixed so that they * properly detect and handle allocation failures. * * We most definitely don't want callers attempting to * allocate greater than order-1 page units with * __GFP_NOFAIL. */ WARN_ON_ONCE(order > 1);
* Linux 2.6.31 Compatibility UpdatesBrian Behlendorf2009-11-101-3/+3
| | | | | | | | | | SPL_AC_2ARGS_SET_FS_PWD macro updated to explicitly include linux/fs_struct.h which was dropped from linux/sched.h. min_wmark_pages, low_wmark_pages, high_wmark_pages macros introduced in newer kernels. For older kernels mm_compat.h was introduced to define them as needed as direct mappings to per zone min_pages, low_pages, max_pages.
* Autoconf --enable-debug-* cleanupBrian Behlendorf2009-10-301-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cleanup the --enable-debug-* configure options, this has been pending for quite some time and I am glad I finally got to it. To summerize: 1) All SPL_AC_DEBUG_* macros were updated to be a more autoconf friendly. This mainly involved shift to the GNU approved usage of AC_ARG_ENABLE and ensuring AS_IF is used rather than directly using an if [ test ] construct. 2) --enable-debug-kmem=yes by default. This simply enabled keeping a running tally of total memory allocated and freed and reporting a memory leak if there was one at module unload. Additionally, it ensure /proc/spl/kmem/slab will exist by default which is handy. The overhead is low for this and it should not impact performance. 3) --enable-debug-kmem-tracking=no by default. This option was added to provide a configure option to enable to detailed memory allocation tracking. This support was always there but you had to know where to turn it on. By default this support is disabled because it is known to badly hurt performence, however it is invaluable when chasing a memory leak. 4) --enable-debug-kstat removed. After further reflection I can't see why you would ever really want to turn this support off. It is now always on which had the nice side effect of simplifying the proc handling code in spl-proc.c. We can now always assume the top level directory will be there. 5) --enable-debug-callb removed. This never really did anything, it was put in provisionally because it might have been needed. It turns out it was not so I am just removing it to prevent confusion.
* Update global_page_state() support for 2.6.29 kernels.Brian Behlendorf2009-07-281-24/+55
| | | | | | | | | | | | | | | | Basically everything we need to monitor the global memory state of the system is now cleanly available via global_page_state(). The problem is that this interface is still fairly recent, and there has been one change in the page state enum which we need to handle. These changes basically boil down to the following: - If global_page_state() is available we should use it. Several autoconf checks have been added to detect the correct enum names. - If global_page_state() is not available check to see if get_zone_counts() symbol is available and use that. - If the get_zone_counts() symbol is not exported we have no choice be to dynamically aquire it at load time. This is an absolute last resort for old kernel which we don't want to patch to cleanly export the symbol.
* SLES10 Fixes (part 7)Brian Behlendorf2009-05-201-1/+1
| | | | | | | | | | | - Initial SLES testing uncovered a long standing bug in the debug tracing. The tcd_for_each() macro expected a NULL to terminate the trace_data[i] array but this was only ever true due to luck. All trace_data[] iterators are now properly capped by TCD_TYPE_MAX. - SPLAT_MAJOR 229 conflicted with a 'hvc' device on my SLES system. Since this was always an arbitrary choice I picked something else. - The HAVE_PGDAT_LIST case should set pgdat_list_addr to the value stored at the address of the memory location returned by kallsyms_lookup_name().
* SLES10 Fixes (part 6)Brian Behlendorf2009-05-201-12/+35
| | | | | | | | | - Prior to 2.6.17 there were no *_pgdat helper functions in mm/mmzone.c. Instead for_each_zone() operated directly on pgdat_list which may or may not have been exported depending on how your kernel was compiled. Now new configure checks determine if you have the helpers or not, and if the needed symbols are exported. If they are not exported then they are dynamically aquired at runtime by kallsyms_lookup_name().
* SLES10 Fixes (part 2):Brian Behlendorf2009-05-201-12/+12
| | | | | | | | | | | | | | | | - Configure check, the div64_64() function was renamed to div64_u64() as of 2.6.26. - Configure check, the global_page_state() fuction was introduced in 2.6.18 kernels. The earlier 2.6.16 based SLES10 must not try and use it, thankfully get_zone_counts() is still available. - To simplify debugging poison all symbols aquired dynamically using spl_kallsyms_lookup_name() with SYMBOL_POISON. - Add console messages when the user mode helpers fail. - spl_kmem_init_globals() use bit shifts instead of division. - When the monotonic clock is unavailable __gethrtime() must perform the HZ division as an 'unsigned long long' because the SPL only implements __udivdi3(), and not __divdi3() for 'long long' division on 32-bit arches.
* FC10/i686 Compatibility Update (2.6.27.19-170.2.35.fc10.i686)Brian Behlendorf2009-03-171-13/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the interests of portability I have added a FC10/i686 box to my list of development platforms. The hope is this will allow me to keep current with upstream kernel API changes, and at the same time ensure I don't accidentally break x86 support. This patch resolves all remaining issues observed under that environment. 1) SPL_AC_ZONE_STAT_ITEM_FIA autoconf check added. As of 2.6.21 the kernel added a clean API for modules to get the global count for free, inactive, and active pages. The SPL attempts to detect if this API is available and directly map spl_global_page_state() to global_page_state(). If the full API is not available then spl_global_page_state() is implemented as a thin layer to get these values via get_zone_counts() if that symbol is available. 2) New kmem:vmem_size regression test added to validate correct vmem_size() functionality. The test case acquires the current global vmem state, allocates from the vmem region, then verifies the allocation is correctly reflected in the vmem_size() stats. 3) Change splat_kmem_cache_thread_test() to always use KMC_KMEM based memory. On x86 systems with limited virtual address space failures resulted due to exhaustig the address space. The tests really need to problem exhausting all memory on the system thus we need to use the physical address space. 4) Change kmem:slab_lock to cap it's memory usage at availrmem instead of using the native linux nr_free_pages(). This provides additional test coverage of the SPL Linux VM integration. 5) Change kmem:slab_overcommit to perform allocation of 256K instead of 1M. On x86 based systems it is not possible to create a kmem backed slab with entires of that size. To compensate for this the number of allocations performed in increased by 4x. 6) Additional autoconf documentation for proposed upstream API changes to make additional symbols available to modules. 7) Console error messages added when spl_kallsyms_lookup_name() fails to locate an expected symbol. This causes the module to fail to load and we need to know exactly which symbol was not available.
* Build system and packaging (RPM support)Brian Behlendorf2009-03-091-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | An update to the build system to properly support all commonly used Makefile targets these include: make all # Build everything make install # Install everything make clean # Clean up build products make distclean # Clean up everything make dist # Create package tarball make srpm # Create package source RPM make rpm # Create package binary RPMs make tags # Create ctags and etags for everything Extra care was taken to ensure that the source RPMs are fully rebuildable against Fedora/RHEL/Chaos kernels. To build binary RPMs from the source RPM for your system simply run: rpmbuild --rebuild spl-x.y.z-1.src.rpm This will produce two binary RPMs with correct 'requires' dependencies for your kernel. One will contain all spl modules and support utilities, the other is a devel package for compiling additional kernel modules which are dependant on the spl. spl-x.y.z-1_<kernel version>.x86_64.rpm spl-devel-x.y.2-1_<kernel version>.x86_64.rpm
* XXX: Temporarily disable vmem_size().Ricardo M. Correia2009-03-051-0/+2
|
* Linux VM Integration CleanupBrian Behlendorf2009-03-041-72/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove all instances of functions being reimplemented in the SPL. When the prototypes are available in the linux headers but the function address itself is not exported use kallsyms_lookup_name() to find the address. The function name itself can them become a define which calls a function pointer. This is preferable to reimplementing the function in the SPL because it ensures we get the correct version of the function for the running kernel. This is actually pretty safe because the prototype is defined in the headers so we know we are calling the function properly. This patch also includes a rhel5 kernel patch we exports the needed symbols so we don't need to use kallsyms_lookup_name(). There are autoconf checks to detect if the symbol is exported and if so to use it directly. We should add patches for stock upstream kernels as needed if for no other reason than so we can easily track which additional symbols we needed exported. Those patches can also be used by anyone willing to rebuild their kernel, but this should not be a requirement. The rhel5 version of the export-symbols patch has been applied to the chaos kernel. Additional fixes: 1) Implement vmem_size() function using get_vmalloc_info() 2) SPL_CHECK_SYMBOL_EXPORT macro updated to use $LINUX_OBJ instead of $LINUX because Module.symvers is a build product. When $LINUX_OBJ != $LINUX we will not properly detect exported symbols. 3) SPL_LINUX_COMPILE_IFELSE macro updated to add include2 and $LINUX/include search paths to allow proper compilation when the kernel target build directory is not the source directory.
* kmem slab magazine ageing deadlockBrian Behlendorf2009-02-171-4/+24
| | | | | | | | | | | | | - The previous magazine ageing sceme relied on the on_each_cpu() function to call spl_magazine_age() on each cpu. It turns out this could deadlock with do_flush_tlb_all() which also relies on the IPI based on_each_cpu(). To avoid this problem a per- magazine delayed work item is created and indepentantly scheduled to the correct cpu removing the need for on_each_cpu(). - Additionally two unused fields were removed from the type spl_kmem_cache_t, they were hold overs from previous cleanup. - struct work_struct work - struct timer_list timer
* kmem slab fixesBrian Behlendorf2009-02-131-36/+36
| | | | | | | | | | | - spl_slab_reclaim() 'continue' changed back to 'break' from commit 37db7d8cf9936e6d2851a4329c11efcd9f61305c. The original was correct, I have added a comment to ensure this does not happen again. - spl_slab_reclaim() further optimized by moving the destructor call in spl_slab_free() outside the skc->skc_lock. This minimizes the length of time the spin lock is held, allows the destructors to be invoked concurrently for different objects, and as a bonus makes it safe (although unwise) to sleep in the destructors.
* kmem slab fixesBrian Behlendorf2009-02-121-34/+49
| | | | | | | | | | | | | | | | | | | | | | | | - Default SPL_KMEM_CACHE_DELAY changed to 15 to match Solaris. - Aged out slab checking occurs every SPL_KMEM_CACHE_DELAY / 3. - skc->skc_reap tunable added whichs allows callers of spl_slab_reclaim() to cap the number of slabs reclaimed. On Solaris all eligible slabs are always reclaimed, and this is still the default behavior. However, I suspect that is not always wise for reasons such as in the next comment. - spl_slab_reclaim() added cond_resched() while walking the slab/object free lists. Soft lockups were observed when freeing large numbers of vmalloc'd slabs/objets. - spl_slab_reclaim() 'sks->sks_ref > 0' check changes from incorrect 'break' to 'continue' to ensure all slabs are checked. - spl_cache_age() reworked to avoid a deadlock with do_flush_tlb_all() which occured because we slept waiting for completion in spl_cache_age(). To waiting for magazine reclamation to finish is not required so we no longer wait. - spl_magazine_create() and spl_magazine_destroy() shifted back to using for_each_online_cpu() instead of the spl_on_each_cpu() approach which was of course a bad idea due to memory allocations which Ricardo pointed out.