summaryrefslogtreecommitdiffstats
path: root/module/spl
Commit message (Collapse)AuthorAgeFilesLines
* Proxmox VE kernel compat, invalidate_inodes()Brian Behlendorf2011-12-211-4/+4
| | | | | | | | | | | | The Proxmox VE kernel contains a patch which renames the function invalidate_inodes() to invalidate_inodes_check(). In the process it adds a 'check' argument and a '#define invalidate_inodes(x)' compatibility wrapper for legacy callers. Therefore, if either of these functions are exported invalidate_inodes() can be safely used. Signed-off-by: Brian Behlendorf <[email protected]> Closes #58
* Store copy of tqent_flags prior to servicing taskPrakash Surya2011-12-161-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A preallocated taskq_ent_t's tqent_flags must be checked prior to servicing the taskq_ent_t. Once a preallocated taskq entry is serviced, the ownership of the entry is handed back to the caller of taskq_dispatch, thus the entry's contents can potentially be mangled. In particular, this is a problem in the case where a preallocated taskq entry is serviced, and the caller clears it's tqent_flags field. Thus, when the function returns and task_done is called, it looks as though the entry is **not** a preallocated task (when in fact it **is** a preallocated task). In this situation, task_done will place the preallocated taskq_ent_t structure onto the taskq_t's free list. This is a **huge** mistake. If the taskq_ent_t is then freed by the caller of taskq_dispatch, the taskq_t's free list will hold a pointer to garbage data. Even worse, if nothing has over written the freed memory before the pointer is dereferenced, it may still look as though it points to a valid list_head belonging to a taskq_ent_t structure. Thus, the task entry's flags are now copied prior to servicing the task. This copy is then checked to see if it is a preallocated task, and determine if the entry needs to be passed down to the task_done function. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #71
* Swap taskq_ent_t with taskqid_t in taskq_thread_tPrakash Surya2011-12-161-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The taskq_t's active thread list is sorted based on its tqt_ent->tqent_id field. The list is kept sorted solely by inserting new taskq_thread_t's in their correct sorted location; no other means is used. This means that once inserted, if a taskq_thread_t's tqt_ent->tqent_id field changes, the list runs the risk of no longer being sorted. Prior to the introduction of the taskq_dispatch_prealloc() interface, this was not a problem as a taskq_ent_t actively being serviced under the old interface should always have a static tqent_id field. Thus, once the taskq_thread_t is added to the taskq_t's active thread list, the taskq_thread_t's tqt_ent->tqent_id field would remain constant. Now, this is no longer the case. Currently, if using the taskq_dispatch_prealloc() interface, any given taskq_ent_t actively being serviced _may_ have its tqent_id value incremented. This happens when the preallocated taskq_ent_t structure is recursively dispatched. Thus, a taskq_thread_t could potentially have its tqt_ent->tqent_id field silently modified from under its feet. If this were to happen to a taskq_thread_t on a taskq_t's active thread list, this would compromise the integrity of the order of the list (as the list _may_ no longer be sorted). To get around this, the taskq_thread_t's taskq_ent_t pointer was replaced with its own static copy of the tqent_id. So, as a taskq_ent_t is pulled off of the taskq_t's pending list, a static copy of its tqent_id is made and this copy is used to sort the active thread list. Using a static copy is key in ensuring the integrity of the order of the active thread list. Even if the underlying taskq_ent_t is recursively dispatched (as has its tqent_id modified), this static copy stored inside the taskq_thread_t will remain constant. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #71
* Implement taskq_dispatch_prealloc() interfacePrakash Surya2011-12-131-6/+86
| | | | | | | | | | | | | | | | | | | This patch implements the taskq_dispatch_prealloc() interface which was introduced by the following illumos-gate commit. It allows for a preallocated taskq_ent_t to be used when dispatching items to a taskq. This eliminates a memory allocation which helps minimize lock contention in the taskq when dispatching functions. commit 5aeb94743e3be0c51e86f73096334611ae3a058e Author: Garrett D'Amore <[email protected]> Date: Wed Jul 27 07:13:44 2011 -0700 734 taskq_dispatch_prealloc() desired 943 zio_interrupt ends up calling taskq_dispatch with TQ_SLEEP Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #65
* Replace tq_work_list and tq_threads in taskq_tPrakash Surya2011-12-131-55/+77
| | | | | | | | | | | | | | | | | | | | | | | | To lay the ground work for introducing the taskq_dispatch_prealloc() interface, the tq_work_list and tq_threads fields had to be replaced with new alternatives in the taskq_t structure. The tq_threads field was replaced with tq_thread_list. Rather than storing the pointers to the taskq's kernel threads in an array, they are now stored as a list. In addition to laying the ground work for the taskq_dispatch_prealloc() interface, this change could also enable taskq threads to be dynamically created and destroyed as threads can now be added and removed to this list relatively easily. The tq_work_list field was replaced with tq_active_list. Instead of keeping a list of taskq_ent_t's which are currently being serviced, a list of taskq_threads currently servicing a taskq_ent_t is kept. This frees up the taskq_ent_t's tqent_list field when it is being serviced (i.e. now when a taskq_ent_t is being serviced, it's tqent_list field will be empty). Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #65
* Replace struct spl_task with struct taskq_entPrakash Surya2011-12-131-63/+62
| | | | | | | | | | | | The spl_task structure was renamed to taskq_ent, and all of its fields were renamed to have a prefix of 'tqent' rather than 't'. This was to align with the naming convention which the ZFS code assumes. Previously these fields were private so the name never mattered. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #65
* Prepend spl_ to all init/fini functionsBrian Behlendorf2011-11-116-25/+29
| | | | | | | | | | This is a bit of cleanup I'd been meaning to get to for a while to reduce the chance of a type conflict. Well that conflict finally occurred with the kstat_init() function which conflicts with a function in the 2.6.32-6-pve kernel. Signed-off-by: Brian Behlendorf <[email protected]> Closes #56
* Linux 3.1 compat, shrink_*cache_memoryBrian Behlendorf2011-11-091-10/+4
| | | | | | | | | | | | | | | | | | | | | | As of Linux 3.1 the shrink_dcache_memory and shrink_icache_memory functions have been removed. This same task is now accomplished more cleanly with per super block shrinkers. This unfortunately leaves us no easy way to support the dnlc_reduce_cache() function. This support has always been entirely optional. So when no reasonable interface is available allow the dnlc_reduce_cache() function to effectively become a no-op. The downside of this change is that it will prevent the zfs arc meta data limts from being enforced. However, the current zfs implementation in this regard is already flawed and needs to be reworked. If the arc needs to enfore a meta data limit it will need to be extended to coordinate directly with the zpl. This will allow us to drop all this compatibility code and get more fine grained control over the cache management. Signed-off-by: Brian Behlendorf <[email protected]> Issue #52
* Linux 3.1 compat, kern_path_parent()Brian Behlendorf2011-11-092-5/+31
| | | | | | | | | | Prior to Linux 3.1 the kern_path_parent symbol was exported for use by kernel modules. As of Linux 3.1 it is now longer easily available. To handle this case the spl will now dynamically look up address of the missing symbol at module load time. Signed-off-by: Brian Behlendorf <[email protected]> Issue #52
* Fix NULL deref in balance_pgdat()Brian Behlendorf2011-11-031-5/+8
| | | | | | | | | | | | | | | Be careful not to unconditionally clear the PF_MEMALLOC bit in the task structure. It may have already been set when entering kv_alloc() in which case it must remain set on exit. In particular the kswapd thread will have PF_MEMALLOC set in order to prevent it from entering direct reclaim. By clearing it we allow the following NULL deref to potentially occur. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8109c7ab>] balance_pgdat+0x25b/0x4ff Signed-off-by: Brian Behlendorf <[email protected]> Closes ZFS issue #287
* vn_rdwr() didn't properly advance the file positionGunnar Beutner2011-10-181-0/+1
| | | | | | | | | This would cause problems when using 'zfs send' with a file as the target (rather than a pipe or a socket as is usually the case) as for each write the destination offset in the file would be 0. Signed-off-by: Brian Behlendorf <[email protected]> Closes ZFS issue #391
* Fix various typos in commentsBrian Behlendorf2011-10-111-27/+27
| | | | | | | Just clean up some of the typos and spelling mistakes in the comments of spl-kmem.c. Signed-off-by: Brian Behlendorf <[email protected]>
* Fixed typo in spl_slab_alloc()Gunnar Beutner2011-10-111-1/+1
| | | | | | | | The typo did not have any effect (apart from a negligible performance impact) because skc->skc_flags * KMC_OFFSLAB is always non-null when at least one bit in skc->skc_flags is set. Signed-off-by: Brian Behlendorf <[email protected]>
* Properly destroy work items in spl_kmem_cache_destroy()Gunnar Beutner2011-10-111-3/+3
| | | | | | | | | | | | | In a non-debug build the ASSERT() would be optimized away which could cause pending work items to not be cancelled. We must also use cancel_delayed_work_sync() rather than just cancel_delayed_work() to actually wait until work items have completed. Otherwise they might accidentally access free'd memory. Signed-off-by: Brian Behlendorf <[email protected]> Closes ZFS bugs #279, #62, #363, #418
* Fixed invalid resource re-use in file_find()Gunnar Beutner2011-10-111-1/+2
| | | | | | | | | File descriptors are a per-process resource. The same descriptor in different processes can refer to different files. find_file() incorrectly assumed that file descriptors are globally unique. Signed-off-by: Brian Behlendorf <[email protected]> Closes ZFS issue #386
* Remove /etc/hostid missing warningBrian Behlendorf2011-10-061-5/+1
| | | | | | | | | | | No longer print the following warning to the console when the /etc/hostid file is missing. This is the expected default behavior. Keeping the hostid in sync with the initramfs is now accomplished by creating the /etc/hostid in the initramfs not on the system. SPL: The /etc/hostid file is not found. Signed-off-by: Brian Behlendorf <[email protected]>
* Read the /etc/hostid file directly.Darik Horn2011-06-241-5/+105
| | | | | | | | | | | | Deprecate the /usr/bin/hostid call by reading the /etc/hostid file directly. Add the spl_hostid_path parameter to override the default /etc/hostid path. Rename the set_hostid() function to hostid_exec() to better reflect actual behavior and complement the new hostid_read() function. Use HW_INVALID_HOSTID as the spl_hostid sentinel value because zero seems to be a valid gethostid() result on Linux.
* Linux 3.0: Shrinker compatibilityBrian Behlendorf2011-06-211-6/+13
| | | | | | | | Update the the wrapper macros for the memory shrinker to handle this 4th API change. The callback function now takes a shrink_control structure. This is certainly a step in the right direction but it's annoying to have to accomidate yet another version of the API.
* Add TASKQ_NORECLAIM flagBrian Behlendorf2011-05-061-0/+4
| | | | | | | It has become necessary to be able to optionally disable direct memory reclaim for certain taskqs. To support this the TASKQ_NORECLAIM flags has been added which sets the PF_MEMALLOC bit for all threads in the taskq.
* Correct typos in the spl proc handler.Darik Horn2011-04-241-3/+3
| | | | | Correct a format typo that causes /proc/sys/kernel/spl/hostid to return a decimal number instead of a hexadecimal number.
* Make the SPL kernel messages consistent with ZFS.Darik Horn2011-04-211-3/+3
| | | | | | | Change the SPL kernel messages for module loading and module unloading so that they are similar to the ZFS kernel messages. Signed-off-by: Brian Behlendorf <[email protected]>
* Remove the gawk dependency.Darik Horn2011-04-211-11/+15
| | | | | | | | | | | | | This reverts commit 1814251453c8140f50170ad29d9105c1273d7e08. Demote the gawk call back to awk and ensure that stderr is attached. GNU gawk tolerates a missing stderr handle, but many utilities do not, which could be why a regular awk call was unexplainably failing on some systems. Use argv[0] instead of sh_path for consistency internally and with other Linux drivers. Signed-off-by: Brian Behlendorf <[email protected]>
* Import spl_hostid as a module parameter.Darik Horn2011-04-212-12/+17
| | | | | | | | | | | | | | | | | | Provide a call_usermodehelper() alternative by letting the hostid be passed as a module parameter like this: $ modprobe spl spl_hostid=0x12345678 Internally change the spl_hostid variable to unsigned long because that is the type that the coreutils /usr/bin/hostid returns. Move the hostid command into GET_HOSTID_CMD for consistency with the similar GET_KALLSYMS_ADDR_CMD invocation. Use argv[0] instead of sh_path for consistency internally and with other Linux drivers. Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 2.6.39 compat, zlib_deflate_workspacesize()Brian Behlendorf2011-04-201-2/+6
| | | | | | | | | | | | | The function zlib_deflate_workspacesize() now take 2 arguments. This was done to avoid always having to allocate the maximum size workspace (268K). The caller can now specific the windowBits and memLevel compression parameters to get a smaller workspace. For our purposes we introduce a spl_zlib_deflate_workspacesize() wrapper which accepts both arguments. When the two argument version of zlib_deflate_workspacesize() is available the arguments are passed through. When it's not we assume the worst case and a maximally sized workspace is used.
* Linux 2.6.39 compat, kern_path_parent()Brian Behlendorf2011-04-201-3/+3
| | | | | | | The path_lookup() function has been renamed to kern_path_parent() and the flags argument has been removed. The only behavior now offered is that of LOOKUP_PARENT. The spl already always passed this flag so dropping the flag does not impact us.
* Linux 2.6.39 compat, DEFINE_SPINLOCK()Brian Behlendorf2011-04-203-4/+4
| | | | | | | | | | This is a long over due compatibility change. Way, way, way back in 2007 there was a push to remove all consumers of SPIN_LOCK_UNLOCKED. Finally, in 2011 with 2.6.39 all the consumers have been updated and SPIN_LOCK_UNLOCKED was removed. It's about time we use the new API as well, this change does exactly that. DEFINE_SPINLOCK() was available as far back as 2.6.12 so there doesn't need to be any additional autoconf-foo for this change.
* Fix unused variableBrian Behlendorf2011-04-191-3/+0
| | | | | | Flagged by the default -Wunused-but-set-variable gcc option when running under Fedora 15. Since it's correct this variable is entirely unused this commit removes it.
* Linux 2.6.39 compat, invalidate_inodes()Brian Behlendorf2011-04-191-1/+1
| | | | | | | | To resolve a potiential filesystem corruption issue a second argument was added to invalidate_inodes(). This argument controls whether dirty inodes are dropped or treated as busy when invalidating a super block. When only the legacy API is available the second argument will be dropped for compatibility.
* Add dnlc_reduce_cache() supportBrian Behlendorf2011-04-061-0/+28
| | | | | | | | | | | | | Provide the dnlc_reduce_cache() function which attempts to prune cached entries from the dcache and icache. After the entries are pruned any slabs which they may have been using are reaped. Note the API takes a reclaim percentage but we don't have easy access to the total number of cache entries to calculate the reclaim count. However, in practice this doesn't need to be exactly correct. We simply need to reclaim some useful fraction (but not all) of the cache. The caller can determine if more needs to be done.
* Add slab usage summeries to /procBrian Behlendorf2011-04-061-1/+119
| | | | | | | | | | | | | | | | | | One of the most common things you want to know when looking at the slab is how much memory is being used. This information was available in /proc/spl/kmem/slab but only on a per-slab basis. This commit adds the following /proc/sys/kernel/spl/kmem/slab* entries to make total slab usage easily available at a glance. slab_kmem_total - Total kmem slab size slab_kmem_avail - Alloc'd kmem slab size slab_kmem_max - Max observed kmem slab size slab_vmem_total - Total vmem slab size slab_vmem_avail - Alloc'd vmem slab size slab_vmem_max - Max observed vmem slab size NOTE: The slab_*_max values are expected to over report because they show maximum values since boot, not current values.
* Update /proc/spl/kmem/slab outputBrian Behlendorf2011-04-061-22/+24
| | | | | | | | | | | | | The 'slab_fail', 'slab_create', and 'slab_destroy' columns in the slab output have been removed because they are virtually always zero and not very useful. The much more useful 'size' and 'alloc' columns have been added which show the total slab size and how much of the total size has been allocated to objects. Finally, the formatting has been updated to be much more human readable while still being friendly for tool like awk to parse.
* Linux shrinker compatBrian Behlendorf2011-04-061-41/+9
| | | | | | | | | | | | | | The Linux shrinker has gone through three API changes since 2.6.22. Rather than force every caller to understand all three APIs this change consolidates the compatibility code in to the mm-compat.h header. The caller then can then use a single spl provided shrinker API which does the right thing for your kernel. SPL_SHRINKER_CALLBACK_PROTO(shrinker_callback, cb, nr_to_scan, gfp_mask); SPL_SHRINKER_DECLARE(shrinker_struct, shrinker_callback, seeks); spl_register_shrinker(&shrinker_struct); spl_unregister_shrinker(&&shrinker_struct); spl_exec_shrinker(&shrinker_struct, nr_to_scan, gfp_mask);
* Add crgetfsuid()/crgetfsgid() helpersBrian Behlendorf2011-03-221-78/+58
| | | | | | | | | | | | Solaris credentials don't have an fsuid/fsguid field but Linux credentials do. To handle this case the Solaris API is being modestly extended to include the crgetfsuid()/crgetfsgid() helper functions. Addititionally, because the crget*() helpers are implemented identically regardless of HAVE_CRED_STRUCT they have been moved outside the #ifdef to common code. This simplification means we only have one version of the helper to keep to to date.
* Disable vmalloc() direct reclaimBrian Behlendorf2011-03-201-2/+22
| | | | | | | | | | | | | | | | | | | As part of vmalloc() a __pte_alloc_kernel() allocation may occur. This internal allocation does not honor the gfp flags passed to vmalloc(). This means even when vmalloc(GFP_NOFS) is called it is possible that a synchronous reclaim will occur. This reclaim can trigger file IO which can result in a deadlock. This issue can be avoided by explicitly setting PF_MEMALLOC on the process to subvert synchronous reclaim when vmalloc() is called with !__GFP_FS. An example stack of the deadlock can be found here (1), along with the upstream kernel bug (2), and the original bug discussion on the linux-mm mailing list (3). This code can be properly autoconf'ed when the upstream bug is fixed. 1) http://github.com/behlendorf/zfs/issues/labels/Vmalloc#issue/133 2) http://bugzilla.kernel.org/show_bug.cgi?id=30702 3) http://marc.info/?l=linux-mm&m=128942194520631&w=4
* Remove xvattr supportBrian Behlendorf2011-03-021-2/+2
| | | | | | | | | | | | | | | | | | | The xvattr support in the spl has always simply consisted of defining a couple structures and a few #defines. This was enough to enable compilation of code which just passed xvattr types around but not enough to effectively manipulate them. This change removes even this minimal support leaving it up to packages which leverage the spl to prove the full xvattr support. By removing it from the spl we ensure not conflict with the higher level packages. This just leaves minimal vnode support for basical manipulation of files. This code is does have the proper support functions in the spl and a set of regression tests. Additionally, this change removed the unused 'caller_context_t *' type and replaces it with a 'void *'.
* Fix zlib compressionBrian Behlendorf2011-02-254-3/+230
| | | | | | | | | | | | | | | | While portions of the code needed to support z_compress_level() and z_uncompress() where in place. In reality the current implementation was non-functional, it just was compilable. The critical missing component was to setup a workspace for the compress/uncompress stream structures to use. A kmem_cache was added for the workspace area because we require a large chunk of memory. This avoids to need to continually alloc/free this memory and vmap() the pages which is very slow. Several objects will reside in the per-cpu kmem_cache making them quick to acquire and release. A further optimization would be to adjust the implementation to additional ensure the memory is local to the cpu. Currently that may not be the case.
* Linux compat 2.6.37, invalidate_inodes()Brian Behlendorf2011-02-231-0/+14
| | | | | | | | | | | | | | | | | | In the 2.6.37 kernel the function invalidate_inodes() is no longer exported for use by modules. This memory management functionality is needed to invalidate the inodes attached to a super block without unmounting the filesystem. Because this function still exists in the kernel and the prototype is available is a common header all we strictly need is the symbol address. The address is obtained using spl_kallsyms_lookup_name() and assigned to the variable invalidate_inodes_fn. Then a #define is used to replace all instances of invalidate_inodes() with a call to the acquired address. All the complexity is hidden behind HAVE_INVALIDATE_INODES and invalidate_inodes() can be used as usual. Long term we should try to get this, or another, interface made available to modules again.
* Block in cv_destroy() on all waitersBrian Behlendorf2011-02-041-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we would ASSERT in cv_destroy() if it was ever called with active waiters. However, I've now seen several instances in OpenSolaris code where they do the following: cv_broadcast(); cv_destroy(); This leaves no time for active waiters to be woken up and scheduled and we trip the ASSERT. This has not been observed to be an issue on OpenSolaris because their cv_destroy() basically does nothing. They still do run the risk of the memory being free'd after the cv_destroy() and hitting a bad paging request. But in practice this race is so small and unlikely it either doesn't happen, or is so unlikely when it does happen the root cause has not yet been identified. Rather than risk the same issue in our code this change updates cv_destroy() to block until all waiters have been woken and scheduled. This may take some time because each waiter must acquire the mutex. This change may have an impact on performance for frequently created and destroyed condition variables. That however is a price worth paying it avoid crashing your system. If performance issues are observed they can be addressed by the caller.
* Remove VN_HOLD/VN_RELE/VOP_PUTPAGEBrian Behlendorf2011-01-121-1/+0
| | | | | | Previously these were defined to noops but rather than give the misleading impression that these are actually implemented I'm removing the type entirely for clarity.
* Make vn_cache|vn_file_cache kmem cachesBrian Behlendorf2011-01-121-2/+2
| | | | | | | Both of these caches were previously allowed to be either a vmem or kmem cache based on the size of the object involved. Since we know the object won't be to large and performce is much better for a kmem cache for them to be kmem backed.
* Clean vattr_t and vsecattr_t typesBrian Behlendorf2011-01-121-9/+6
| | | | Minor cleanup for the vattr_t and vsecattr_t types.
* Add vn_mode_to_vtype/vn_vtype to_mode helpersBrian Behlendorf2011-01-121-6/+35
| | | | | Add simple helpers to convert a vnode->v_type to a inode->i_mode. These should be used sparingly but they are handy to have.
* Add cv_timedwait_interruptible() functionNeependra Khare2011-01-111-4/+17
| | | | | | | | | | | | | | | | | | The cv_timedwait() function by definition must wait unconditionally for cv_signal()/cv_broadcast() before waking. This causes processes to go in the D state which increases the load average. The load average is the summation of processes in D state and run queue. To avoid this it can be desirable to sleep interruptibly. These processes do not count against the load average but may be woken by a signal. It is up to the caller to determine why the process was woken it may be for one of three reasons. 1) cv_signal()/cv_broadcast() 2) the timeout expired 3) a signal was received Signed-off-by: Brian Behlendorf <[email protected]>
* Linux Compat: inode->i_mutex/i_semBrian Behlendorf2011-01-111-10/+3
| | | | | | | | Create spl_inode_lock/spl_inode_unlock compability macros to simply access to the inode mutex/sem. This avoids the need to have to ugly up the code with the required #define's at every call site. At the moment the SPL only uses this in one place but higher layers can benefit from the macro.
* Add Thread Specific Data (TSD) ImplementationBrian Behlendorf2010-12-075-3/+653
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thread specific data has implemented using a hash table, this avoids the need to add a member to the task structure and allows maximum portability between kernels. This implementation has been optimized to keep the tsd_set() and tsd_get() times as small as possible. The majority of the entries in the hash table are for specific tsd entries. These entries are hashed by the product of their key and pid because by design the key and pid are guaranteed to be unique. Their product also has the desirable properly that it will be uniformly distributed over the hash bins providing neither the pid nor key is zero. Under linux the zero pid is always the init process and thus won't be used, and this implementation is careful to never to assign a zero key. By default the hash table is sized to 512 bins which is expected to be sufficient for light to moderate usage of thread specific data. The hash table contains two additional type of entries. They first type is entry is called a 'key' entry and it is added to the hash during tsd_create(). It is used to store the address of the destructor function and it is used as an anchor point. All tsd entries which use the same key will be linked to this entry. This is used during tsd_destory() to quickly call the destructor function for all tsd associated with the key. The 'key' entry may be looked up with tsd_hash_search() by passing the key you wish to lookup and DTOR_PID constant as the pid. The second type of entry is called a 'pid' entry and it is added to the hash the first time a process set a key. The 'pid' entry is also used as an anchor and all tsd for the process will be linked to it. This list is using during tsd_exit() to ensure all registered destructors are run for the process. The 'pid' entry may be looked up with tsd_hash_search() by passing the PID_KEY constant as the key, and the process pid. Note that tsd_exit() is called by thread_exit() so if your using the Solaris thread API you should not need to call tsd_exit() directly.
* Clear cv->cv_mutex when not in useBrian Behlendorf2010-11-291-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | For debugging purposes the condition varaibles keep track of the mutex used during a wait. The idea is to validate that all callers always use the same mutex. Unfortunately, we have seen cases where the caller reuses the condition variable with a different mutex but in a way which is known to be safe. My reading of the man pages suggests you should not do this and always cv_destroy()/cv_init() a new mutex. However, there is overhead in doing this and it does appear to be allowed under Solaris. To accomidate this behavior cv_wait_common() and __cv_timedwait() have been modified to clear the associated mutex when the last waiter is dropped. This ensures that while the condition variable is in use the incorrect mutex case is detected. It also allows the condition variable to be safely recycled without requiring the overhead of a cv_destroy()/cv_init() as long as it isn't currently in use. Finally, spin lock cv->cv_lock was removed because it is not required. When the condition variable is used properly the caller will always be holding the mutex so the spin lock is redundant. The lock was originally added because I expected to need to protect more than just the cv->cv_mutex. It turns out that was not the case. Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 2.6.36 compat, use fops->unlocked_ioctl()Brian Behlendorf2010-11-101-5/+6
| | | | | | | | | As of linux-2.6.36 the last in-tree consumer of fops->ioctl() has been removed and thus fops()->ioctl() has also been removed. The replacement hook is fops->unlocked_ioctl() which has existed in kernel since 2.6.12. Since the SPL only contains support back to 2.6.18 vintage kernels, I'm not adding an autoconf check for this and simply moving everything to use fops->unlocked_ioctl().
* Linux 2.6.36 compat, fs_struct->lock type changeBrian Behlendorf2010-11-091-10/+18
| | | | | | | | | | In the linux-2.6.36 kernel the fs_struct lock was changed from a rwlock_t to a spinlock_t. If the kernel would export the set_fs_pwd() symbol by default this would not have caused us any issues, but they don't. So we're forced to add a new autoconf check which sets the HAVE_FS_STRUCT_SPINLOCK define when a spinlock_t is used. We can then correctly use either spin_lock or write_lock in our custom set_fs_pwd() implementation.
* Fix 2.6.35 shrinker callback API changeBrian Behlendorf2010-10-221-3/+18
| | | | | | | | | | | | | As of linux-2.6.35 the shrinker callback API now takes an additional argument. The shrinker struct is passed to the callback so that users can embed the shrinker structure in private data and use container_of() to access it. This removes the need to always use global state for the shrinker. To handle this we add the SPL_AC_3ARGS_SHRINKER_CALLBACK autoconf check to properly detect the API. Then we simply setup a callback function with the correct number of arguments. For now we do not make use of the new 3rd argument.
* Support custom build directoriesBrian Behlendorf2010-09-051-19/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the neat tricks an autoconf style project is capable of is allow configurion/building in a directory other than the source directory. The major advantage to this is that you can build the project various different ways while making changes in a single source tree. For example, this project is designed to work on various different Linux distributions each of which work slightly differently. This means that changes need to verified on each of those supported distributions perferably before the change is committed to the public git repo. Using nfs and custom build directories makes this much easier. I now have a single source tree in nfs mounted on several different systems each running a supported distribution. When I make a change to the source base I suspect may break things I can concurrently build from the same source on all the systems each in their own subdirectory. wget -c http://github.com/downloads/behlendorf/spl/spl-x.y.z.tar.gz tar -xzf spl-x.y.z.tar.gz cd spl-x-y-z ------------------------- run concurrently ---------------------- <ubuntu system> <fedora system> <debian system> <rhel6 system> mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6 cd ubuntu cd fedora cd debian cd rhel6 ../configure ../configure ../configure ../configure make make make make make check make check make check make check This is something the project has almost supported for a long time but finishing this support should save me lots of time.