aboutsummaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* Change spl-kmod-devel install pathBrian Behlendorf2013-03-1411-11/+11
| | | | | | | | | | | | | | | Install the common spl kernel development headers under /usr/src/spl-<version>/ rather than in a kernel specific directory. The kernel specific build products such as spl_config.h and Modules.symvers are left installed under /usr/src/spl-<version>/<kernel>. This was done to be consistent with where dkms expects kernel module source to be packaged. It also allows for a common spl-kmod-devel package which includes the headers, and per-kernel spl-kmod-devel-<kernel> packages. Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 3.9 compat: Include linux/sched/rt.hRichard Yao2013-03-141-0/+5
| | | | | | | | | Linux 3.9 reorganized sched.h, splitting it into numerous files. torvalds/linux@8bd75c77b7c6a3954140dd2e20346aef3efe4a35 moved MAX_PRIO and MAX_RT_PRIO to linux/sched/rt.h. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Refresh links to web siteNed Bass2013-03-04131-131/+131
| | | | | | | Update links to refer to the official ZFS on Linux website instead of @behlendorf's personal fork on github. Signed-off-by: Brian Behlendorf <[email protected]>
* Remove custom install-data-local for headersBrian Behlendorf2013-03-0111-26/+266
| | | | | | | | | | | Rather than use a custom install target it is cleaner to define a 'kerneldir' and set 'kernel_HEADERS' appropriately. This allows us to leverage the standing configure install support. Additionally, I took this opertunity add the missing make files to the include subdirectories. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix spl_config.h install permissionsBrian Behlendorf2013-03-011-1/+1
| | | | | | | The default permissions used by install are 755. Since this file isn't executable 644 is more appropriate. Signed-off-by: Brian Behlendorf <[email protected]>
* Define BE_IN16 & BE_IN32 for lz4 compressionEric Dillmann2013-01-291-0/+9
| | | | | | | | The new lz4 compression algorithm, zfsonlinux/zfs@9759c60, requires the generic BE_IN16 and BE_IN32 functions. These are added to the SPL for other consumers to take advantage of. Signed-off-by: Brian Behlendorf <[email protected]>
* Add spl_kmem_cache_expire module optionBrian Behlendorf2013-01-281-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | Cache aging was implemented because it was part of the default Solaris kmem_cache behavior. The idea is that per-cpu objects which haven't been accessed in several seconds should be returned to the cache. On the other hand Linux slabs never move objects back to the slabs unless there is memory pressure on the system. This behavior is now configurable through the 'spl_kmem_cache_expire' module option. The value is a bit mask with the following meaning. 0x1 - Solaris style cache aging eviction is enabled. 0x2 - Linux style low memory eviction is enabled. Both methods may be safely enabled simultaneously, but by default both are disabled. It has never been clear if the kmem cache aging (which has been around from day one) actually does any good. It has however been the source of numerous bugs so I wouldn't mind retiring it entirely. Signed-off-by: Brian Behlendorf <[email protected]> Closes zfsonlinux/zfs#1227 Closes #210
* Remove spl_invalidate_inodes()Brian Behlendorf2013-01-171-28/+0
| | | | | | | | | | | This functionality is no longer required by ZFS, see commit zfsonlinux/zfs@7b3e34ba5a7ee8d0fda44d214f6f11eb16cdb26f. Since there are no other consumers, and because it adds additional autoconf complexity which must be maintained the spl_invalidate_inodes() function has been removed. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#795
* RHEL 6.4 compat, fallocate()Brian Behlendorf2013-01-081-0/+20
| | | | | | | | | | | | | | | | | In the upstream kernel the FALLOC_FL_PUNCH_HOLE #define was introduced after the fallocate() function was moved from the inode_operations to the file_operations structure. Therefore, the SPL code assumed that if FALLOC_FL_PUNCH_HOLE was defined it was safe to use f_ops->fallocate(). Unfortunately, the RHEL6.4 kernel has only backported the FALLOC_FL_PUNCH_HOLE #define and not the fallocate() change. To address this compatibility issue the spl_filp_fallocate() helper function was added to properly detect which interface is available. Signed-off-by: Brian Behlendorf <[email protected]>
* Add cv_wait_io() to account I/O timeMatt Johnston2013-01-071-0/+2
| | | | | | | | | | Under Linux when a task is waiting on I/O it should call the io_schedule() function for proper accounting. The Solaris cv_wait() function provides no way to specify what the cv is waiting on therefore cv_wait_io() is introduced. Signed-off-by: Brian Behlendorf <[email protected]> Closes #206
* Fix spl_kmem_init_kallsyms_lookup() panicBrian Behlendorf2012-12-191-0/+1
| | | | | | | | | | | Due to I/O buffering the helper may return successfully before the proc handler has a chance to execute. To catch this case wait up to 1 second to verify spl_kallsyms_lookup_name_fn was updated to a non SYMBOL_POISON value. Signed-off-by: Brian Behlendorf <[email protected]> Closes zfsonlinux/zfs#699 Closes zfsonlinux/zfs#859
* Removed SPL_AC_3ARGS_INIT_WORK checkBrian Behlendorf2012-12-122-50/+0
| | | | | | | | | All consumers of the kernel delayed work queues have been shifted over to rely on the taskq implementation. This compatibility code can now be removed. Any new callers which need this functionality should use the taskq interfaces for delayed work items. Signed-off-by: Brian Behlendorf <[email protected]>
* kmem-cache: Use a taskq for async allocationsBrian Behlendorf2012-12-121-1/+1
| | | | | | | | | | | | | | Shift the asynchronous allocations over to use the taskq interfaces. This allows us to abandon the kernels delayed work queue interface and all the compatibility code it requires. This code never actually used the delay functionality it was just done this way to leverage the existing compatibility code. All that is required is a thread context to perform the allocation in. The only thing clever in this change is that we take advantage of the preallocated task queue entries to avoid a memory allocation. Signed-off-by: Brian Behlendorf <[email protected]>
* kmem-cache: Use taskqs for ageingBrian Behlendorf2012-12-121-2/+2
| | | | | | | | | | | | | | | | | | | | | Shift the cache and magazine ageing functionality over to the new delayed taskq interfaces. This allows us to abandon the kernels delayed work queue interface and all the compatibility code it requires. However, the delayed taskq interface does not allow us to schedule a task for a specfic cpu so the ageing code was slightly reworked. The magazine ageing delay has been directly linked to the cache ageing function. The spl_cache_age() function invokes on_each_cpu() in order to run spl_magazine_age() on each cpu. It then blocks waiting for them to complete and promptly reclaims any free slabs. When restructing the code wasn't the primary goal I think the new code is far more understable and maintainable. It also should help minimize magazine thrashing because free slabs are immediately released after the magazine is aged. Signed-off-by: Brian Behlendorf <[email protected]>
* taskq delay/cancel functionalityBrian Behlendorf2012-12-121-14/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the ability to dispatch a delayed task to a taskq. The desired behavior is for the task to be queued but not executed by a worker thread until the expiration time is reached. To achieve this two new functions were added. * taskq_dispatch_delay() - This function behaves exactly like taskq_dispatch() however it takes a third 'expire_time' argument. The caller should pass the desired time the task should be executed as an absolute value in jiffies. The task is guarenteed not to run before this time, it may run slightly latter if all the worker threads are busy. * taskq_cancel_id() - Given a task id attempt to cancel the task before it gets executed. This is primarily useful for canceling delay tasks but can be used for canceling any previously dispatched task. There are three possible return values. 0 - The task was found and canceled before it was executed. ENOENT - The task was not found, either it was already run or an invalid task id was supplied by the caller. EBUSY - The task is currently executing any may not be canceled. This function will block until the task has been completed. * taskq_wait_all() - The taskq_wait_id() function was renamed taskq_wait_all() to more clearly reflect its actual behavior. It is only curreny used by the splat taskq regression tests. * taskq_wait_id() - Historically, the only difference between this function and taskq_wait() was that you passed the task id. In both functions you would block until ALL lower task ids which executed. This was semantically correct but could be very slow particularly if there were delay tasks submitted. To better accomidate the delay tasks this function was reimplemnted. It will now only block until the passed task id has been completed. This is actually a fairly low risk change for a few reasons. * Only new ZFS callers will make use of the new interfaces and very little common code was changed to support the new functions. * The existing taskq_wait() implementation was not changed just slightly refactored. * The newly optimized taskq_wait_id() implementation was never used by ZFS we can't accidentally introduce a new bug there. NOTE: This functionality does not exist in the Illumos taskqs. Signed-off-by: Brian Behlendorf <[email protected]>
* taskq style, remove #define wrappersBrian Behlendorf2012-12-121-25/+15
| | | | | | | | | | | | | | When the taskq implementation was originally written I wrapped all the API functions in #define's. This was done as a preventative measure to ensure that a taskq symbol never conflicted with an existing kernel symbol. However, in practice the taskq symbols never conflicted. The only major conflicts occured with the kmem cache API. Since this added layer of obfuscation never bought us anything for the taskq's I'm removing it. Signed-off-by: Brian Behlendorf <[email protected]>
* taskq style, convert spaces to soft tabsBrian Behlendorf2012-12-121-59/+61
| | | | | | | | Update the taskq implementation to conform with the style used throughout the rest of the code. There are no functional changes in this commit. Signed-off-by: Brian Behlendorf <[email protected]>
* Track emergency object in rbtreeBrian Behlendorf2012-11-061-2/+3
| | | | | | | | | | | | | | | | | | | In the initial implementation emergency objects were tracked on a per-cache list. The assumption was that under normal operation we would never allocate more than a handful of these objects. So the cost of walking the list during free was expected to be negligible. However real world usage has shown that emergency objects tend to be allocated in batches. A deadlock will be detected and several thousand emergency objects will be allocated before the original blocked slab allocation can complete. Therefore the original list has been replaced by a red black tree which is sorted by the memory address of each allocated object. This bounds the worst case insertion and removal time to O(log n) which minimize contention on the assoicated spin lock. Signed-off-by: Brian Behlendorf <[email protected]>
* Improved vmem cached deadlock detectionBrian Behlendorf2012-11-061-0/+3
| | | | | | | | | | | | | | | | | | | | | The entire goal of performing the slab allocations asynchronously is to be able to detect when a vmalloc() deadlocks. In this case, and only this case, do we want to start allocating emergency objects. The trick here is to minimize false positives because the overhead of tracking emergency objects is far higher than normal slab objects. With that goal in mind the code was reworked to be less sensitive to slow allocations by increasing the wait time. Once a cache is is marked deadlocked all subsequent allocations which can not be satisfied with existing cache objects will immediately allocate new emergency objects. This behavior persists until the asynchronous allocation completes and clears the deadlocked flag. The result of these tweaks is that far fewer emergency objects get created which is important because this minimizes the cost of releasing them latter in kmem_cache_free(). Signed-off-by: Brian Behlendorf <[email protected]>
* splat: Cleanup headersBrian Behlendorf2012-11-062-1/+2
| | | | | | | Restructure the the SPLAT headers such that each test only includes the minimal set of headers it requires. Signed-off-by: Brian Behlendorf <[email protected]>
* Condition variable reference countsBrian Behlendorf2012-11-061-1/+2
| | | | | | | | | | | | | | | Reference count every entry and exit from the condition variable functions: cv_wait(), cv_wait_timeout(), cv_signal(), cv_broadcast(). This allows us to safely block in cv_destroy() until all consumers have been scheduled and are no longer accessing the condition variable memory. In addition poison the magic value at the start of cv_destroy() to ensure there are never any new callers after cv_destroy() is called. The consumer is responsible for ensuring this never occurs. Signed-off-by: Brian Behlendorf <[email protected]>
* Add KSTAT_TYPE_TXG typeBrian Behlendorf2012-11-021-1/+22
| | | | | | | | | | | | | | | | | | | Add a new kstat type for tracking useful statistics about a TXG. The new KSTAT_TYPE_TXG type can be used to tracks the following statistics per-txg. txg - Unique txg number state - State (O)pen/(Q)uiescing/(S)yncing/(C)ommitted birth; - Creation time nread - Bytes read nwritten; - Bytes written reads - IOPs read writes - IOPs write open_time; - Length in nanoseconds the txg was open quiesce_time - Length in nanoseconds the txg was quiescing sync_time; - Length in nanoseconds the txg was syncing Signed-off-by: Brian Behlendorf <[email protected]>
* Make kstat.ks_update() callback atomicBrian Behlendorf2012-10-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | Move the kstat ks_update() callback under the ks_lock. This enables dynamically sized kstats without modification to the kstat API. * Create a kstat with the KSTAT_FLAG_VIRTUAL flag. * Register a ->ks_update() callback which does: o Frees any existing ks_data buffer. o Set ks_data_size to the kstat array size. o Set ks_data to an allocated buffer of size ks_data_size o Populate the array of buffers with the required data. The buffer allocated in the ks_update() callback is guaranteed to remain allocated and valid while the proc sequence handler iterates over the buffer. The lock will not be dropped until kstat_seq_stop() function is run making it safe for concurrent access. To allow the ks_update() callback to perform memory allocations the lock was changed to a mutex. Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 3.7 compat, __clear_close_on_exec() removedBrian Behlendorf2012-10-181-4/+0
| | | | | | | | | | | | | | | | | | | | Commit torvalds/linux@b8318b0 moved the __clear_close_on_exec() function out of include/linux/fdtable.h and in to fs/file.c making it unavailable to the SPL. Now as it turns out we only used this function to tear down some test infrastructure for the vn_getf()/vn_releasef() SPLAT regression tests. Rather than implement even more autoconf compatibilty code to handle this we just remove the test case. This also allows us to drop three existing autoconf tests. This does mean the SPLAT tests will no longer verify these functions but historically they have never been a problem. And if we feel we absolutely need this test coverage I'm sure a more portable version of the test case could be added. Signed-off-by: Brian Behlendorf <[email protected]> Closes #183
* Linux 3.6 compat, kern_path_locked() addedYuxuan Shui2012-10-141-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The kern_path_parent() function was removed from Linux 3.6 because it was observed that all the callers just want the parent dentry. The simpler kern_path_locked() function replaces kern_path_parent() and does the lookup while holding the ->i_mutex lock. This is good news for the vn implementation because it removes the need for us to handle the locking. However, it makes it harder to implement a single readable vn_remove()/vn_rename() function which is usually what we prefer. Therefore, we implement a new version of vn_remove()/vn_rename() for Linux 3.6 and newer kernels. This allows us to leave the existing working implementation untouched, and to add a simpler version for newer kernels. Long term I would very much like to see all of the vn code removed since what this code enabled is generally frowned upon in the kernel. But that can't happen util we either abondon the zpool.cache file or implement alternate infrastructure to update is correctly in user space. Signed-off-by: Yuxuan Shui <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #154
* Add interface for file hole punching.Etienne Dechamps2012-10-041-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds an interface to "punch holes" (deallocate space) in VFS files. The interface is identical to the Solaris VOP_SPACE interface. This interface is necessary for TRIM support on file vdevs. This is implemented using Linux fallocate(FALLOC_FL_PUNCH_HOLE), which was introduced in 2.6.38. For a brief time before 2.6.38 this was done using the truncate_range inode operation, which was quickly deprecated. This patch only supports FALLOC_FL_PUNCH_HOLE. This adds support for the truncate_range() inode operation to VOP_SPACE() for file hole punching. This API is deprecated and removed in 3.5, so it's only useful for old kernels. On tmpfs, the truncate_range() inode operation translates to shmem_truncate_range(). Unfortunately, this function expects the end offset to be inclusive and aligned to the end of a page. If it is not, the kernel will stop with a BUG_ON(). This patch fixes the issue by adapting to the constraints set forth by shmem_truncate_range(). Signed-off-by: Brian Behlendorf <[email protected]> Closes #168
* Remove TQ_SLEEP -> KM_SLEEP mappingBrian Behlendorf2012-09-121-2/+3
| | | | | | | | | | | | | | | | | When the taskq code was originally written it seemed like a good idea to simply map TQ_SLEEP to KM_SLEEP. Unfortunately, this assumed that the TQ_* flags would never confict with any of the Linux GFP_* flags. When adding the TQ_PUSHPAGE support in commit cd5ca4b this invariant was accidentally broken. Therefore to support TQ_PUSHPAGE, which is needed for Linux, and prevent any further confusion I have removed this direct mapping. The TQ_SLEEP, TQ_NOSLEEP, and TQ_PUSHPAGE are no longer defined in terms of their KM_* counterparts. Instead a simple mapping function is introduce to convert TQ_* -> KM_* where needed. Signed-off-by: Brian Behlendorf <[email protected]> Issue #171
* Revert "Switch KM_SLEEP to KM_PUSHPAGE"Brian Behlendorf2012-09-121-1/+0
| | | | | | | | This reverts commit cd5ca4b2f86a606aa6ed68341a3672fdde1c9856 due to conflicts in the higher TQ_ bits which caused incorrect behavior. Signed-off-by: Brian Behlendorf <[email protected]>
* Add KMC_NOEMERGENCY slab flagBrian Behlendorf2012-09-071-0/+2
| | | | | | | | | Provide a flag to disable the use of emergency objects for a specific kmem cache. There may be instances where under no circumstances should you kmalloc() an emergency object. For example, when you cache contains very large objects (>128k). Signed-off-by: Brian Behlendorf <[email protected]>
* Add DKIOCTRIM for TRIM support.Etienne Dechamps2012-09-021-0/+1
| | | | | | | | | | See dechamps/zfs@cc6cd40ad71e1e611591929ad08184516357eaf5 for details. This harmless addition was merged to simplify testing the ZFS TRIM support patches. Signed-off-by: Brian Behlendorf <[email protected]> Closes #167
* Switch KM_SLEEP to KM_PUSHPAGEBrian Behlendorf2012-08-271-0/+1
| | | | | | | | | | Under certain circumstances the following functions may be called in a context where KM_SLEEP is unsafe and can result in a deadlocked system. To avoid this problem the unconditional KM_SLEEPs are converted to KM_PUSHPAGEs. This will prevent them from attempting to initiate any I/O during direct reclaim. Signed-off-by: Brian Behlendorf <[email protected]>
* Mutex ASSERT on self deadlockBrian Behlendorf2012-08-271-2/+7
| | | | | | | | | | | | Generate an assertion if we're going to deadlock the system by attempting to acquire a mutex the process is already holding. There are currently no known instances of this under normal operation, but it _might_ be possible when using a ZVOL as a swap device. I want to ensure we catch this immediately if it were to occur. Signed-off-by: Brian Behlendorf <[email protected]>
* Add PF_NOFS debugging flagBrian Behlendorf2012-08-271-0/+49
| | | | | | | | | | | | | | | | | | | | PF_NOFS is a per-process debug flag which is set in current->flags to detect when a process is performing an unsafe allocation. All tasks with PF_NOFS set must strictly use KM_PUSHPAGE for allocations because if they enter direct reclaim and initiate I/O they may deadlock. When debugging is disabled, any incorrect usage will be detected and a call stack with a warning will be printed to the console. The flags will then be automatically corrected to allow for safe execution. If debugging is enabled this will be treated as a fatal condition. To avoid any risk of conflicting with the existing PF_ flags. The PF_NOFS bit shadows the rarely used PF_MUTEX_TESTER bit. Only when CONFIG_RT_MUTEX_TESTER is not set, and we know this bit is unused, will the PF_NOFS bit be valid. Happily, most existing distributions ship a kernel with CONFIG_RT_MUTEX_TESTER disabled. Signed-off-by: Brian Behlendorf <[email protected]>
* Revert "Add TASKQ_NORECLAIM flag"Brian Behlendorf2012-08-271-1/+0
| | | | | | | | | This reverts commit 372c2572336468cbf60272aa7e735b7ca0c3807c. The use of the PF_MEMALLOC flag was always a hack to work around memory reclaim deadlocks. Those issues are believed to be resolved so this workaround can be safely reverted. Signed-off-by: Brian Behlendorf <[email protected]>
* Emergency slab objectsBrian Behlendorf2012-08-271-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is designed to resolve a deadlock which can occur with __vmalloc() based slabs. The issue is that the Linux kernel does not honor the flags passed to __vmalloc(). This makes it unsafe to use in a writeback context. Unfortunately, this is a use case ZFS depends on for correct operation. Fixing this issue in the upstream kernel was pursued and patches are available which resolve the issue. https://bugs.gentoo.org/show_bug.cgi?id=416685 However, these changes were rejected because upstream felt that using __vmalloc() in the context of writeback should never be done. Their solution was for us to rewrite parts of ZFS to accomidate the Linux VM. While that is probably the right long term solution, and it is something we want to pursue, it is not a trivial task and will likely destabilize the existing code. This work has been planned for the 0.7.0 release but in the meanwhile we want to improve the SPL slab implementation to accomidate this expected ZFS usage. This is accomplished by performing the __vmalloc() asynchronously in the context of a work queue. This doesn't prevent the posibility of the worker thread from deadlocking. However, the caller can now safely block on a wait queue for the slab allocation to complete. Normally this will occur in a reasonable amount of time and the caller will be woken up when the new slab is available,. The objects will then get cached in the per-cpu magazines and everything will proceed as usual. However, if the __vmalloc() deadlocks for the reasons described above, or is just very slow, then the callers on the wait queues will timeout out. When this rare situation occurs they will attempt to kmalloc() a single minimally sized object using the GFP_NOIO flags. This allocation will not deadlock because kmalloc() will honor the passed flags and the caller will be able to make forward progress. As long as forward progress can be maintained then even if the worker thread is deadlocked the critical thread will make progress. This will eventually allow the deadlocked worker thread to complete and normal operation will resume. These emergency allocations will likely be slow since they require contiguous pages. However, their use should be rare so the impact is expected to be minimal. If that turns out not to be the case in practice further optimizations are possible. One additional concern is if these emergency objects are long lived. Right now they are simply tracked on a list which must be walked when an object is freed. Is they accumulate on a system and the list grows freeing objects will become more expensive. This could be handled relatively easily by using a hash instead of a list, but that optimization (if needed) is left for a follow up patch. Additionally, these emeregency objects could be repacked in to existing slabs as objects are freed if the kmem_cache_set_move() functionality was implemented. See issue https://github.com/zfsonlinux/spl/issues/26 for full details. This work would also help reduce ZFS's memory fragmentation problems. The /proc/spl/kmem/slab file has had two new columns added at the end. The 'emerg' column reports the current number of these emergency objects in use for the cache, and the following 'max' column shows the historical worst case. These value should give us a good idea of how often these objects are needed. Based on these values under real use cases we can tune the default behavior. Lastly, as a side benefit using a single work queue for the slab allocations should reduce cpu contention on the global virtual address space lock. This should manifest itself as reduced cpu usage for the system. Signed-off-by: Brian Behlendorf <[email protected]>
* Remove autotools productsBrian Behlendorf2012-08-2711-4724/+0
| | | | | | | | Remove all of the generated autotools products from the repository and update the .gitignore files accordingly. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#718
* Add kpreempt_[dis|en]able macros in <sys/disp.h>Prakash Surya2012-08-241-0/+5
| | | | | | | | | | | To support preempt enabled kernels in ZFS on Linux, there are a couple places where the ZFS code needs to disable interrupts. This change adds the Solaris preempt functions and maps them to the equivalent ZFS functions, allowing the ZFS to make use of them. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #98
* Avoid calling smp_processor_id in spl_magazine_agePrakash Surya2012-08-241-0/+1
| | | | | | | | | | | | | The spl_magazine_age function had the implied assumption that it will remain on its current cpu through its execution. In order to support preempt enabled kernels, this assumption had to be removed. The spl_kmem_magazine structure now holds the cpu id of the cpu it is local to. This allows spl_magazine_age to use this field when scheduling work to be done by the magazine's local cpu. Signed-off-by: Brian Behlendorf <[email protected]> Issue #98
* Remove Makefile from non-toplevel .gitignore filesRichard Yao2012-08-231-1/+0
| | | | | | | | | | | | | | When building SPL support into the kernel, ./copy-builtin will copy non-toplevel .gitignore files. These files list /Makefile, which causes git-archive to omit ./module/{spl,splat}/Makefile. The absence of these files result in build failures when SPL is selected. ZFS is unaffected because it puts Makefile in the toplevel .gitignore, which is not copied. We fix SPL by emulating that behavior. Reported-by: Fabio Erculiani <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #152
* Fix incorrect type in spl_kmem_cache_set_move() parameterRichard Yao2012-08-011-1/+1
| | | | | | | A preprocessor definition renders this harmless. However, it is a good idea to change this to be consistent. Signed-off-by: Richard Yao <[email protected]>
* Optimize spl_rwsem_is_locked()Brian Behlendorf2012-07-131-42/+25
| | | | | | | | | | | | | | | | | The spl_rwsem_is_locked() compatibility function has been observed to be a hot spot. The root cause of this is that we must check the rwsem activity under the rwsem->wait_lock to avoid a race. When the lock is busy significant contention can occur. The upstream kernel fix for this race had the insight that by using spin_trylock_irqsave() this contention could be avoided. When the lock is contended it's reasonable to return that it is locked. This change updates the SPLs implemention to be like the upstream kernel. Since the kernel code has been in use for years now this a low risk change. Signed-off-by: Brian Behlendorf <[email protected]>
* Constify memory management functionsRichard Yao2012-07-031-4/+4
| | | | | | | | | | This prevents warnings in ZFS that were caused by changes necessary to support PaX patched kernels. When debugging is enabled, these warnings become build failures. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #131
* PowerPC CompatibilityBrian Behlendorf2012-07-022-2/+2
| | | | | | | | | Usage of get_current() is not supported across all architectures. The correct interface to use is the '#define current' which will map to the appropriate function, usually current_thread_info(). Signed-off-by: Brian Behlendorf <[email protected]> Closes #119
* Linux 3.4 compat, __clear_close_on_exec replaces FD_CLRRichard Yao2012-06-131-0/+4
| | | | | | | | | | | | | | | | | | | | torvalds/linux@1dce27c5aa6770e9d195f2bb7db1db3d4dde5591 introduced __clear_close_on_exec() as a replacement for FD_CLR. Further commits appear to have removed FD_CLR from the Linux source tree. This causes the following failure: error: implicit declaration of function '__FD_CLR' [-Werror=implicit-function-declaration] To correct this we update the code to use the current __clear_close_on_exec() interface for readability. Then we introduce an autotools check to determine if __clear_close_on_exec() is available. If it isn't then we define some compatibility logic which used the older FD_CLR() interface. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #124
* Remove Solaris module emulationBrian Behlendorf2012-05-181-252/+10
| | | | | | | | | | Originally I believed that these interfaces would be needed. However, in practice it turned out that it was more straight forward and maintainable to use the native Linux interfaces. As such, this is all dead code and can be safely removed. Signed-off-by: Brian Behlendorf <[email protected]> Closes #109
* Modify KM_PUSHPAGE to use GFP_NOIO instead of GFP_NOFSRichard Yao2012-05-071-1/+1
| | | | | | | | | | | | | | | | | The resolution of issue #31 made KM_PUSHPAGE imply GFP_NOFS. This was done to prevent situations where filesystem operations which are holding locks enter direct reclaim and attempt to reaquire those same locks. This clearly will result in a deadlock. This works for datasets which are implemented in terms for filesystem operations. But unfortunately, swapping to a zvol will encounter many of the same deadlocks and GFP_NOFS will not prevent this. As such, it is appropriate to extend KM_PUSHPAGE to use the broader GFP_NOIO mask to handle these non-filesystem cases. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#342 Closes #105
* Throttle number of freed slabs based on nr_to_scanPrakash Surya2012-05-071-2/+3
| | | | | | | | | | | | | | | | Previously, the SPL tried to maintain Solaris semantics by freeing all available (empty) slabs from its slab caches when the shrinker was called. This is not desirable when running on Linux. To make the SPL shrinker more Linux friendly, the actual number of freed slabs from each of the slab caches is now derived from nr_to_scan and skc_slab_objs. Additionally, an accounting bug was fixed in spl_slab_reclaim() which could cause us to reclaim one more slab than requested. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #101
* Define the needed ISA types for ARMJorgen Lundman2012-05-031-1/+18
| | | | | | Add the minimum required ISA types to support the ARM architecture. Signed-off-by: Brian Behlendorf <[email protected]>
* Remove condition variable namesBrian Behlendorf2012-04-061-9/+1
| | | | | | | | | | | | | | | | | | | Long ago I added support to the spl for condition variable names because I thought they might be needed. It turns out they aren't. In fact the official Solaris cv_init(9F) man page discourages their use in the kernel. cv_init(9F) Parameters name - Descriptive string. This is obsolete and should be NULL. (Non-NULL strings are legal, but they're a waste of kernel memory.) Therefore, I'm removing them from the spl to reclaim this memory and adding an ASSERT() to ensure no new consumers are added which make use of the name. Signed-off-by: Brian Behlendorf <[email protected]>
* Add SPL_META_RELEASE to module load/unload messagesBrian Behlendorf2012-03-231-1/+1
| | | | | | | | Include the ZFS_META_RELEASE in the module load/unload messages to more clearly indicate exactly what version of the SPL has been loaded. Signed-off-by: Brian Behlendorf <[email protected]>