aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Prefer /lib/modules/$(uname -r)/ linksBrian Behlendorf2011-02-102-9/+24
| | | | | | | | Preferentially use the /lib/modules/$(uname -r)/source and /lib/modules/$(uname -r)/build links. Only if neither of these links exist fallback to alternate methods for deducing which kernel to build with. This resolves the need to manually specify --with-linux= and --with-linux-obj= on Debian systems.
* Update META to 0.6.0Brian Behlendorf2011-02-071-1/+1
| | | | | | Roll the version forward to 0.6.0. While no major changes really warrant this I want to keep the version in step with ZFS for now which is the only SPL consumer.
* Block in cv_destroy() on all waitersBrian Behlendorf2011-02-042-3/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we would ASSERT in cv_destroy() if it was ever called with active waiters. However, I've now seen several instances in OpenSolaris code where they do the following: cv_broadcast(); cv_destroy(); This leaves no time for active waiters to be woken up and scheduled and we trip the ASSERT. This has not been observed to be an issue on OpenSolaris because their cv_destroy() basically does nothing. They still do run the risk of the memory being free'd after the cv_destroy() and hitting a bad paging request. But in practice this race is so small and unlikely it either doesn't happen, or is so unlikely when it does happen the root cause has not yet been identified. Rather than risk the same issue in our code this change updates cv_destroy() to block until all waiters have been woken and scheduled. This may take some time because each waiter must acquire the mutex. This change may have an impact on performance for frequently created and destroyed condition variables. That however is a price worth paying it avoid crashing your system. If performance issues are observed they can be addressed by the caller.
* Minor policy interfaceBrian Behlendorf2011-01-271-5/+18
| | | | | | | | | Simply add the policy function wrappers. They are completely non-functional and always return that everything is OK, but once again they simplify compilation of dependent packages for now. These can/should be removed once the security policy of the dependent application is completely understood and intergrade as appropriate with Linux.
* Add missing headersBrian Behlendorf2011-01-274-0/+99
| | | | | | Dependent packages require the following missing headers to simplify compilation. The headers are basically just stubbed out with minimal content required.
* Add VSA_ACE_* and MAX_ACL_ENTRIES definesBrian Behlendorf2011-01-271-0/+7
| | | | | | | | | | The following flags are use to get the proper mask when getting and setting ACLs. I'm hopeful this can all largely go away at some point. We also add a define for the maximum number of ACL entries. MAX_ACL_ENTRIES is used as the maximum number of entries for each type.
* Add MAXUID defineBrian Behlendorf2011-01-271-0/+2
| | | | | | | | For Linux the maximum uid can vary depending on how your kernel is built. The Linux kernel still can be compiled with 16 but uids and gids, although I'm not aware of a major distribution which does this (maybe an embedded one?). Given that caviot it is reasonably safe to define the MAXUID as 2147483647.
* Add FIGNORECASE defineBrian Behlendorf2011-01-271-1/+2
| | | | | The FIGNORECASE case define is now needed, place it with the related flags.
* Add ksid_index_t and ksid_t typesBrian Behlendorf2011-01-271-0/+9
| | | | | Add the ksid_index_t enum and ksid_t type for use. These types are now used by packages which depend on the SPL.
* Minimal VFS additionsBrian Behlendorf2011-01-271-4/+10
| | | | | | This patch simply removes the place holder vfs_t type and includes some generic Linux VFS headers. It also makes some minor fid_t additions for compatibility.
* Remove VN_HOLD/VN_RELE/VOP_PUTPAGEBrian Behlendorf2011-01-123-22/+1
| | | | | | Previously these were defined to noops but rather than give the misleading impression that these are actually implemented I'm removing the type entirely for clarity.
* Add a few additional vnode #definesBrian Behlendorf2011-01-121-0/+8
| | | | These additional constants now have users in dependant packages.
* Make vn_cache|vn_file_cache kmem cachesBrian Behlendorf2011-01-121-2/+2
| | | | | | | Both of these caches were previously allowed to be either a vmem or kmem cache based on the size of the object involved. Since we know the object won't be to large and performce is much better for a kmem cache for them to be kmem backed.
* Clean vattr_t and vsecattr_t typesBrian Behlendorf2011-01-122-18/+16
| | | | Minor cleanup for the vattr_t and vsecattr_t types.
* FRSYNC Should Use O_SYNCBrian Behlendorf2011-01-121-1/+1
| | | | | The Solaris FRSYNC maps most logically to the Linux O_SYNC. There is no O_RSYNC on Linux but this wasn't noticed until just recently.
* Add vn_mode_to_vtype/vn_vtype to_mode helpersBrian Behlendorf2011-01-123-6/+41
| | | | | Add simple helpers to convert a vnode->v_type to a inode->i_mode. These should be used sparingly but they are handy to have.
* Add cv_timedwait_interruptible() functionNeependra Khare2011-01-112-16/+33
| | | | | | | | | | | | | | | | | | The cv_timedwait() function by definition must wait unconditionally for cv_signal()/cv_broadcast() before waking. This causes processes to go in the D state which increases the load average. The load average is the summation of processes in D state and run queue. To avoid this it can be desirable to sleep interruptibly. These processes do not count against the load average but may be woken by a signal. It is up to the caller to determine why the process was woken it may be for one of three reasons. 1) cv_signal()/cv_broadcast() 2) the timeout expired 3) a signal was received Signed-off-by: Brian Behlendorf <[email protected]>
* Linux Compat: inode->i_mutex/i_semBrian Behlendorf2011-01-112-11/+14
| | | | | | | | Create spl_inode_lock/spl_inode_unlock compability macros to simply access to the inode mutex/sem. This avoids the need to have to ugly up the code with the required #define's at every call site. At the moment the SPL only uses this in one place but higher layers can benefit from the macro.
* Add Thread Specific Data (TSD) Regression TestBrian Behlendorf2010-12-071-1/+184
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To validate the correct behavior of the TSD interfaces it's important that we add a regression test. This test is designed to minimally exercise the fundamental TSD behavior, it does not attempt to validate all potential corner cases. The test will first create 32 keys via tsd_create() and register a common destructor. Next 16 wait threads will be created each of which set/verify a random value for all 32 keys, then block waiting to be released by the control thread. Meanwhile the control thread verifies that none of the destructors have been run prematurely. The next phase of the test is to create 16 exit threads which set/verify a random value for all 32 keys. They then immediately exit. This is is designed to verify tsd_exit() which will be called via thread_exit(). This must result in all registered destructors being run and the memory for the tsd being free'd. After this tsd_destroy() is verified by destroying all 32 keys. Once again we must see the expected number of destructors run and the tsd memory free'd. At this point the blocked threads are released and they exit calling tsd_exit() which should do very little since all the tsd has already been destroyed. If this all goes off without a hitch the test passes. To ensure no memory has been leaked, I have manually verified that after spl module unload no memory is reported leaked.
* Add Thread Specific Data (TSD) ImplementationBrian Behlendorf2010-12-078-3/+700
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thread specific data has implemented using a hash table, this avoids the need to add a member to the task structure and allows maximum portability between kernels. This implementation has been optimized to keep the tsd_set() and tsd_get() times as small as possible. The majority of the entries in the hash table are for specific tsd entries. These entries are hashed by the product of their key and pid because by design the key and pid are guaranteed to be unique. Their product also has the desirable properly that it will be uniformly distributed over the hash bins providing neither the pid nor key is zero. Under linux the zero pid is always the init process and thus won't be used, and this implementation is careful to never to assign a zero key. By default the hash table is sized to 512 bins which is expected to be sufficient for light to moderate usage of thread specific data. The hash table contains two additional type of entries. They first type is entry is called a 'key' entry and it is added to the hash during tsd_create(). It is used to store the address of the destructor function and it is used as an anchor point. All tsd entries which use the same key will be linked to this entry. This is used during tsd_destory() to quickly call the destructor function for all tsd associated with the key. The 'key' entry may be looked up with tsd_hash_search() by passing the key you wish to lookup and DTOR_PID constant as the pid. The second type of entry is called a 'pid' entry and it is added to the hash the first time a process set a key. The 'pid' entry is also used as an anchor and all tsd for the process will be linked to it. This list is using during tsd_exit() to ensure all registered destructors are run for the process. The 'pid' entry may be looked up with tsd_hash_search() by passing the PID_KEY constant as the key, and the process pid. Note that tsd_exit() is called by thread_exit() so if your using the Solaris thread API you should not need to call tsd_exit() directly.
* Refresh autogen.sh productsBrian Behlendorf2010-11-302-6/+6
| | | | | | | | | Refresh the autogen.sh products based on the versions which are installed by default in the GA RHEL6.0 release. autoconf (GNU Autoconf) 2.63 automake (GNU automake) 1.11.1 ltmain.sh (GNU libtool) 2.2.6b
* Make kmutex_t typesafe in all cases.Ricardo M. Correia2010-11-291-13/+19
| | | | | | | | | | | | | | | | | | When HAVE_MUTEX_OWNER and CONFIG_SMP are defined, kmutex_t is just a typedef for struct mutex. This is generally OK but has the downside that it can make mistakes such as mutex_lock(&kmutex_var) to pass by unnoticed until someone compiles the code without HAVE_MUTEX_OWNER or CONFIG_SMP (in which case kmutex_t is a real struct). Note that the correct API to call should have been mutex_enter() rather than mutex_lock(). We prevent these kind of mistakes by making kmutex_t a real structure with only one field. This makes kmutex_t typesafe and it shouldn't have any impact on the generated assembly code. Signed-off-by: Ricardo M. Correia <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Clear cv->cv_mutex when not in useBrian Behlendorf2010-11-292-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | For debugging purposes the condition varaibles keep track of the mutex used during a wait. The idea is to validate that all callers always use the same mutex. Unfortunately, we have seen cases where the caller reuses the condition variable with a different mutex but in a way which is known to be safe. My reading of the man pages suggests you should not do this and always cv_destroy()/cv_init() a new mutex. However, there is overhead in doing this and it does appear to be allowed under Solaris. To accomidate this behavior cv_wait_common() and __cv_timedwait() have been modified to clear the associated mutex when the last waiter is dropped. This ensures that while the condition variable is in use the incorrect mutex case is detected. It also allows the condition variable to be safely recycled without requiring the overhead of a cv_destroy()/cv_init() as long as it isn't currently in use. Finally, spin lock cv->cv_lock was removed because it is not required. When the condition variable is used properly the caller will always be holding the mutex so the spin lock is redundant. The lock was originally added because I expected to need to protect more than just the cv->cv_mutex. It turns out that was not the case. Signed-off-by: Brian Behlendorf <[email protected]>
* Give ENOTSUP a valid user space error valueNed Bass2010-11-101-1/+1
| | | | | | | | | | | | | | | | The ZFS module returns ENOTSUP for several error conditions where an operation is not (yet) supported. The SPL defined ENOTSUP in terms of ENOTSUPP, but that is an internal Linux kernel error code that should not be seen by user programs. As a result the zfs utilities print a confusing error message if an unsupported operation is attempted: internal error: Unknown error 524 Aborted This change defines ENOTSUP in terms of EOPNOTSUPP which is consistent with user space. Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 2.6.36 compat, use fops->unlocked_ioctl()Brian Behlendorf2010-11-102-10/+10
| | | | | | | | | As of linux-2.6.36 the last in-tree consumer of fops->ioctl() has been removed and thus fops()->ioctl() has also been removed. The replacement hook is fops->unlocked_ioctl() which has existed in kernel since 2.6.12. Since the SPL only contains support back to 2.6.18 vintage kernels, I'm not adding an autoconf check for this and simply moving everything to use fops->unlocked_ioctl().
* Linux 2.6.36 compat, fs_struct->lock type changeBrian Behlendorf2010-11-094-10/+178
| | | | | | | | | | In the linux-2.6.36 kernel the fs_struct lock was changed from a rwlock_t to a spinlock_t. If the kernel would export the set_fs_pwd() symbol by default this would not have caused us any issues, but they don't. So we're forced to add a new autoconf check which sets the HAVE_FS_STRUCT_SPINLOCK define when a spinlock_t is used. We can then correctly use either spin_lock or write_lock in our custom set_fs_pwd() implementation.
* Linux 2.6.36 compat, wrap RLIM64_INFINITYBrian Behlendorf2010-11-091-2/+3
| | | | | | As of linux-2.6.36 RLIM64_INFINITY is defined in linux/resource.h. This is handled by conditionally defining RLIM64_INFINITY in the SPL only when the kernel does not provide it.
* Fix incorrect krw_type_t typeBrian Behlendorf2010-11-091-1/+1
| | | | | | | | | Flagged by the default compile options on archlinux 2010.05, we should be using the krw_t type not the krw_type_t type in the private data. module/splat/splat-rwlock.c: In function ‘splat_rwlock_test4_func’: module/splat/splat-rwlock.c:432:6: warning: case value ‘1’ not in enumerated type ‘krw_type_t’
* Prep for 0.5.2 tagBrian Behlendorf2010-11-051-1/+1
| | | | Update META file to prep for 0.5.2 tag.
* Clear owner after dropping mutexBrian Behlendorf2010-11-051-2/+2
| | | | | | | It's important to clear mp->owner after calling mutex_unlock() because when CONFIG_DEBUG_MUTEXES is defined the mutex owner is verified in mutex_unlock(). If we set it to NULL this check fails and the lockdep support is immediately disabled.
* Fix 2.6.35 shrinker callback API changeBrian Behlendorf2010-10-224-3/+196
| | | | | | | | | | | | | As of linux-2.6.35 the shrinker callback API now takes an additional argument. The shrinker struct is passed to the callback so that users can embed the shrinker structure in private data and use container_of() to access it. This removes the need to always use global state for the shrinker. To handle this we add the SPL_AC_3ARGS_SHRINKER_CALLBACK autoconf check to properly detect the API. Then we simply setup a callback function with the correct number of arguments. For now we do not make use of the new 3rd argument.
* atomic_*_*_nv() functions need to return the new value atomically.Ricardo M. Correia2010-09-171-12/+32
| | | | | | | | A local variable must be used for the return value to avoid a potential race once the spin lock is dropped. Signed-off-by: Ricardo M. Correia <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Fix markdown renderingBrian Behlendorf2010-09-151-2/+2
| | | | | | | | | | These two lines were being rendered incorrectly on the GitHub site. To fix the issue there needs to be leading whitespace before each line to ensure each command is rendered on its own line. $ ./configure $ make pkg
* Reference new zfsonlinux.org websiteBrian Behlendorf2010-09-141-1/+1
| | | | | | The wiki contents have been converted to html and made available at their new home http://zfsonlinux.org. The wiki has also been disabled the html pages are now the official documentation.
* Support custom build directoriesBrian Behlendorf2010-09-0516-88/+177
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the neat tricks an autoconf style project is capable of is allow configurion/building in a directory other than the source directory. The major advantage to this is that you can build the project various different ways while making changes in a single source tree. For example, this project is designed to work on various different Linux distributions each of which work slightly differently. This means that changes need to verified on each of those supported distributions perferably before the change is committed to the public git repo. Using nfs and custom build directories makes this much easier. I now have a single source tree in nfs mounted on several different systems each running a supported distribution. When I make a change to the source base I suspect may break things I can concurrently build from the same source on all the systems each in their own subdirectory. wget -c http://github.com/downloads/behlendorf/spl/spl-x.y.z.tar.gz tar -xzf spl-x.y.z.tar.gz cd spl-x-y-z ------------------------- run concurrently ---------------------- <ubuntu system> <fedora system> <debian system> <rhel6 system> mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6 cd ubuntu cd fedora cd debian cd rhel6 ../configure ../configure ../configure ../configure make make make make make check make check make check make check This is something the project has almost supported for a long time but finishing this support should save me lots of time.
* Remove spl-x.y.z.zip creation in 'make dist'Brian Behlendorf2010-09-022-5/+4
| | | | | Do no create a spl-x.y.z.zip file as part of 'make dist'. Simply create the standard spl-x.y.z.tar.gz file.
* Move vendor check to spl-build.m4Brian Behlendorf2010-09-0212-728/+1371
| | | | | | | | This check was previously done with a hack in config.guess. However, since a new config.guess is copied in to place when forcing a full autoreconf this change was easily lost and never a good idea. This commit also updates all of the autoconf style support scripts in config.
* Prep for spl-0.5.1 tagBrian Behlendorf2010-09-011-1/+1
|
* Add quick build instructionsBrian Behlendorf2010-09-011-2/+7
| | | | | | | Full update to date build information will stay on the wiki for now, but there is no harm in adding the bare bones instructions to the README. They shouldn't change and are a reasonable quick start.
* Add list_link_replace() functionBrian Behlendorf2010-08-271-0/+13
| | | | | | The list_link_replace() function with swap a new item it to the place of an old item in a list. It is the callers responsibility to ensure all lists involved are locked properly.
* Add MUTEX_NOT_HELD() functionBrian Behlendorf2010-08-271-1/+3
| | | | | Simply implement the missing MUTEX_NOT_HELD() function using the !MUTEX_HELD construct.
* Stub out kmem cache defrag APIBrian Behlendorf2010-08-272-5/+30
| | | | | | | | | At some point we are going to need to implement the kmem cache move callbacks to allow for kmem cache defragmentation. This commit simply lays a small part of the API ground work, it does not actually implement any of this feature. This is safe for now because the move callbacks are just an optimization. Even if they are registered we don't ever really have to call them.
* Add missing atomic functionsBrian Behlendorf2010-08-271-0/+44
| | | | | | | | | | These functions were not previous needed so they were not added. Now they are so add the full set. atomic_inc_32_nv() atomic_dec_32_nv() atomic_inc_64_nv() atomic_dec_64_nv()
* Prep for spl-0.5.0 tagBrian Behlendorf2010-08-131-0/+7
|
* Fix stack overflow in vn_rdwr() due to memory reclaimLi Wei2010-08-122-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Unless __GFP_IO and __GFP_FS are removed from the file mapping gfp mask we may enter memory reclaim during IO. In this case shrink_slab() entered another file system which is notoriously hungry for stack. This additional stack usage may cause a stack overflow. This patch removes __GFP_IO and __GFP_FS from the mapping gfp mask of each file during vn_open() to avoid any reclaim in the vn_rdwr() IO path. The original mask is then restored at vn_close() time. Hats off to the loop driver which does something similiar for the same reason. [...] shrink_slab+0xdc/0x153 try_to_free_pages+0x1da/0x2d7 __alloc_pages+0x1d7/0x2da do_generic_mapping_read+0x2c9/0x36f file_read_actor+0x0/0x145 __generic_file_aio_read+0x14f/0x19b generic_file_aio_read+0x34/0x39 do_sync_read+0xc7/0x104 vfs_read+0xcb/0x171 :spl:vn_rdwr+0x2b8/0x402 :zfs:vdev_file_io_start+0xad/0xe1 [...] Signed-off-by: Brian Behlendorf <[email protected]>
* Correctly handle rwsem_is_locked() behaviorNed Bass2010-08-105-31/+172
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A race condition in rwsem_is_locked() was fixed in Linux 2.6.33 and the fix was backported to RHEL5 as of kernel 2.6.18-190.el5. Details can be found here: https://bugzilla.redhat.com/show_bug.cgi?id=526092 The race condition was fixed in the kernel by acquiring the semaphore's wait_lock inside rwsem_is_locked(). The SPL worked around the race condition by acquiring the wait_lock before calling that function, but with the fix in place it must not do that. This commit implements an autoconf test to detect whether the fixed version of rwsem_is_locked() is present. The previous version of rwsem_is_locked() was an inline static function while the new version is exported as a symbol which we can check for in module.symvers. Depending on the result we correctly implement the needed compatibility macros for proper spinlock handling. Finally, we do the right thing with spin locks in RW_*_HELD() by using the new compatibility macros. We only only acquire the semaphore's wait_lock if it is calling a rwsem_is_locked() that does not itself try to acquire the lock. Some new overhead and a small harmless race is introduced by this change. This is because RW_READ_HELD() and RW_WRITE_HELD() now acquire and release the wait_lock twice: once for the call to rwsem_is_locked() and once for the call to rw_owner(). This can't be avoided if calling a rwsem_is_locked() that takes the wait_lock, as it will in more recent kernels. The other case which only occurs in legacy kernels could be optimized by taking the lock only once, as was done prior to this commit. However, I decided that the performance gain probably wasn't significant enough to justify the messy special cases required. The function spl_rw_get_owner() was only used to enable the afore-mentioned optimization. Since it is no longer used, I removed it. Signed-off-by: Brian Behlendorf <[email protected]>
* Correctly detect atomic64_cmpxchg supportNed Bass2010-08-082-0/+3
| | | | | | | | | | | | | | | The RHEL5 2.6.18-194.7.1.el5 kernel added atomic64_cmpxchg to asm-x86_64/atomic.h. That macro is defined in terms of cmpxchg which is provided by asm/system.h. However, asm/system.h is not #included by atomic.h in this kernel nor by the autoconf test for atomic64_cmpxchg, so the test failed with "implicit declaration of function 'cmpxchg'". This leads the build system to erroneously conclude that the kernel does not define atomic64_cmpxchg and enable the built-in definition. This in turn produces a '"atomic64_cmpxchg" redefined' build warning which is fatal when building with --enable-debug. This commit fixes this by including asm/system.h in the autoconf test. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix taskq code to not drop tasks when TQ_SLEEP is used.Ricardo M. Correia2010-08-022-28/+48
| | | | | | | | | | | | | | | | When TQ_SLEEP is used, taskq_dispatch() should always succeed even if the number of pending tasks is above tq->tq_maxalloc. This semantic is similar to KM_SLEEP in kmem allocations, which also always succeed. However, we cannot block forever otherwise there is a risk of deadlock. Therefore, we still allow the number of pending tasks to go above tq->tq_maxalloc with TQ_SLEEP, but we may sleep up to 1 second per task dispatch, thereby throttling the task dispatch rate. One of the existing splat tests was also augmented to test for this scenario. The test would fail with the previous implementation but now it succeeds. Signed-off-by: Brian Behlendorf <[email protected]>
* Strfree() should call kfree() not kmem_free()Brian Behlendorf2010-07-301-1/+1
| | | | | | | | | | | | Using kmem_free() results in deducting X bytes from the memory accounting when --enable-debug is set. Unfortunately, currently the counterpart kmem_asprintf() and friends do not properly account for memory allocated, so we must do the same on free. If we don't then we end up with a negative number of lost bytes reported when the module is unloaded. A better long term fix would be to add the accounting in to the allocation side but that's a project for another day.
* Add uninstall Makefile targetsBrian Behlendorf2010-07-285-6/+26
| | | | | | | | Extend the Makefiles with an uninstall target to cleanly remove a package which was installed with 'make install'. Additionally, ensure a 'depmod -a' is run as part of the install to update the module dependency information.