aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* Linux 2.6.39 compat, invalidate_inodes()Brian Behlendorf2011-04-195-3/+181
| | | | | | | | To resolve a potiential filesystem corruption issue a second argument was added to invalidate_inodes(). This argument controls whether dirty inodes are dropped or treated as busy when invalidating a super block. When only the legacy API is available the second argument will be dropped for compatibility.
* Fix rebuildable RPMs for el6/ch5Brian Behlendorf2011-04-081-2/+6
| | | | | | | When rebuilding the source RPM under el5 you need to append the target_cpu. However, under el6/ch5 things are packaged correctly and the arch is already part of kver. For this reason it also needs to be stripped from kver when setting kverpkg.
* Prep spl-0,6,0-rc3 tagBrian Behlendorf2011-04-061-1/+1
| | | | Create the third 0.6.0 release candidate tag (rc3).
* Add dnlc_reduce_cache() supportBrian Behlendorf2011-04-066-1/+283
| | | | | | | | | | | | | Provide the dnlc_reduce_cache() function which attempts to prune cached entries from the dcache and icache. After the entries are pruned any slabs which they may have been using are reaped. Note the API takes a reclaim percentage but we don't have easy access to the total number of cache entries to calculate the reclaim count. However, in practice this doesn't need to be exactly correct. We simply need to reclaim some useful fraction (but not all) of the cache. The caller can determine if more needs to be done.
* Decrease target objects per slabBrian Behlendorf2011-04-061-1/+1
| | | | | | | | By decreasing the number of target objects per slab we increase the likelyhood that a slab can be freed. This reduces the level of fragmentation in the slab which has been observed to be a problem for certain workloads. The penalty for this is that we also decrease the speed which need objects can be allocated.
* Add slab usage summeries to /procBrian Behlendorf2011-04-062-1/+125
| | | | | | | | | | | | | | | | | | One of the most common things you want to know when looking at the slab is how much memory is being used. This information was available in /proc/spl/kmem/slab but only on a per-slab basis. This commit adds the following /proc/sys/kernel/spl/kmem/slab* entries to make total slab usage easily available at a glance. slab_kmem_total - Total kmem slab size slab_kmem_avail - Alloc'd kmem slab size slab_kmem_max - Max observed kmem slab size slab_vmem_total - Total vmem slab size slab_vmem_avail - Alloc'd vmem slab size slab_vmem_max - Max observed vmem slab size NOTE: The slab_*_max values are expected to over report because they show maximum values since boot, not current values.
* Update /proc/spl/kmem/slab outputBrian Behlendorf2011-04-061-22/+24
| | | | | | | | | | | | | The 'slab_fail', 'slab_create', and 'slab_destroy' columns in the slab output have been removed because they are virtually always zero and not very useful. The much more useful 'size' and 'alloc' columns have been added which show the total slab size and how much of the total size has been allocated to objects. Finally, the formatting has been updated to be much more human readable while still being friendly for tool like awk to parse.
* Linux shrinker compatBrian Behlendorf2011-04-062-41/+55
| | | | | | | | | | | | | | The Linux shrinker has gone through three API changes since 2.6.22. Rather than force every caller to understand all three APIs this change consolidates the compatibility code in to the mm-compat.h header. The caller then can then use a single spl provided shrinker API which does the right thing for your kernel. SPL_SHRINKER_CALLBACK_PROTO(shrinker_callback, cb, nr_to_scan, gfp_mask); SPL_SHRINKER_DECLARE(shrinker_struct, shrinker_callback, seeks); spl_register_shrinker(&shrinker_struct); spl_unregister_shrinker(&&shrinker_struct); spl_exec_shrinker(&shrinker_struct, nr_to_scan, gfp_mask);
* Add .va_dentry helperBrian Behlendorf2011-04-061-0/+1
| | | | | | While this extra structure memory does not exist under Solaris it is needed under Linux to pass the dentry. This allows the dentry to be easily instantiated before the inode is unlocked.
* Update CHAOS 5 PackagingBrian Behlendorf2011-03-311-4/+4
| | | | | | The CHAOS 5 kernels are now packaged identially to the RHEL6 kernels. Therefore we can simply use the RHEL6 rules in the spec file when building packages.
* Add crgetfsuid()/crgetfsgid() helpersBrian Behlendorf2011-03-222-78/+60
| | | | | | | | | | | | Solaris credentials don't have an fsuid/fsguid field but Linux credentials do. To handle this case the Solaris API is being modestly extended to include the crgetfsuid()/crgetfsgid() helper functions. Addititionally, because the crget*() helpers are implemented identically regardless of HAVE_CRED_STRUCT they have been moved outside the #ifdef to common code. This simplification means we only have one version of the helper to keep to to date.
* Load zlib_inflate.koBrian Behlendorf2011-03-221-0/+1
| | | | | | | Certain stock kernels (Debian Lenny) are built with zlib_inflate.ko as a kernel module. To ensure 'make check' works in-tree load this module before loading the spl module. This is now required for the zlib splat regression test.
* Disable vmalloc() direct reclaimBrian Behlendorf2011-03-201-2/+22
| | | | | | | | | | | | | | | | | | | As part of vmalloc() a __pte_alloc_kernel() allocation may occur. This internal allocation does not honor the gfp flags passed to vmalloc(). This means even when vmalloc(GFP_NOFS) is called it is possible that a synchronous reclaim will occur. This reclaim can trigger file IO which can result in a deadlock. This issue can be avoided by explicitly setting PF_MEMALLOC on the process to subvert synchronous reclaim when vmalloc() is called with !__GFP_FS. An example stack of the deadlock can be found here (1), along with the upstream kernel bug (2), and the original bug discussion on the linux-mm mailing list (3). This code can be properly autoconf'ed when the upstream bug is fixed. 1) http://github.com/behlendorf/zfs/issues/labels/Vmalloc#issue/133 2) http://bugzilla.kernel.org/show_bug.cgi?id=30702 3) http://marc.info/?l=linux-mm&m=128942194520631&w=4
* Remove default GFP_NOFS allocationsBrian Behlendorf2011-03-191-7/+6
| | | | | | | | | | | | | | | As originally described in commit 82b8c8fa64737edfb203156b245b48840139d2c1 this was done to prevent certain deadlocks from occuring in the system. However, as suspected the price for doing this proved to be too high. The VM is having a hard time effectively reclaiming memory thus we are reverting this change. However, we still need to fundamentally handle the issue. Under Solaris the KM_PUSHPAGE mask is used commonly in I/O paths to ensure a memory allocations will succeed. We leverage this fact and redefine KM_PUSHPAGE to include GFP_NOFS. This ensures that in these common I/O path we don't trigger additional reclaim. This minimizes the change to the Solaris code.
* Prep spl-0.6.0-rc2 tagBrian Behlendorf2011-03-091-1/+1
| | | | Create the second 0.6.0 release candidate tag (rc2).
* Linux 2.6.31 compat, include linux/seq_file.hBrian Behlendorf2011-03-071-0/+1
| | | | | | Explicitly include the linux/seq_file.h header in vfs.h. This header is required for the sequence handlers and is included indirectly in newer kernels.
* Make Missing Modules.symvers FatalBrian Behlendorf2011-03-072-0/+36
| | | | | | | | | Detect early on in configure if the Modules.symvers file is missing. Without this file there will be build failures later and it's best to catch this early and provide a useful error. In this case the most likely problem is the kernel-devel packages are not installed. It may also be possible that they are using an unbuilt custom kernel in which case they must build the kernel first.
* Make CONFIG_PREEMPT FatalBrian Behlendorf2011-03-072-0/+154
| | | | | | Until support is added for preemptible kernels detect this at configure time and make it fatal. Otherwise, it is possible to have a successful build and kernel modules with flakey behavior.
* Remove xvattr supportBrian Behlendorf2011-03-022-72/+28
| | | | | | | | | | | | | | | | | | | The xvattr support in the spl has always simply consisted of defining a couple structures and a few #defines. This was enough to enable compilation of code which just passed xvattr types around but not enough to effectively manipulate them. This change removes even this minimal support leaving it up to packages which leverage the spl to prove the full xvattr support. By removing it from the spl we ensure not conflict with the higher level packages. This just leaves minimal vnode support for basical manipulation of files. This code is does have the proper support functions in the spl and a set of regression tests. Additionally, this change removed the unused 'caller_context_t *' type and replaces it with a 'void *'.
* Add TIMESPEC_OVERFLOW helperBrian Behlendorf2011-03-021-0/+3
| | | | | Add the TIMESPEC_OVERFLOW helper macro to allow easy checking of timespec overflow.
* Add zlib regression testBrian Behlendorf2011-02-256-0/+172
| | | | | | | | | A zlib regression test has been added to verify the correct behavior of z_compress_level() and z_uncompress. The test case simply takes a 128k buffer, it compresses the buffer, it them uncompresses the buffer, and finally it compares the buffers after the transform. If the buffers match then everything is fine and no data was lost. It performs this test for all 9 zlib compression levels.
* Fix zlib compressionBrian Behlendorf2011-02-256-100/+238
| | | | | | | | | | | | | | | | While portions of the code needed to support z_compress_level() and z_uncompress() where in place. In reality the current implementation was non-functional, it just was compilable. The critical missing component was to setup a workspace for the compress/uncompress stream structures to use. A kmem_cache was added for the workspace area because we require a large chunk of memory. This avoids to need to continually alloc/free this memory and vmap() the pages which is very slow. Several objects will reside in the per-cpu kmem_cache making them quick to acquire and release. A further optimization would be to adjust the implementation to additional ensure the memory is local to the cpu. Currently that may not be the case.
* Use Linux flock structBrian Behlendorf2011-02-231-7/+5
| | | | | | | Rather than defining our own structure which will conflict with Linux's version when building 32-bit. Simply setup a typedef to always use the correct Linux version for both 32 ad 64-bit builds.
* Linux compat 2.6.37, invalidate_inodes()Brian Behlendorf2011-02-235-0/+128
| | | | | | | | | | | | | | | | | | In the 2.6.37 kernel the function invalidate_inodes() is no longer exported for use by modules. This memory management functionality is needed to invalidate the inodes attached to a super block without unmounting the filesystem. Because this function still exists in the kernel and the prototype is available is a common header all we strictly need is the symbol address. The address is obtained using spl_kallsyms_lookup_name() and assigned to the variable invalidate_inodes_fn. Then a #define is used to replace all instances of invalidate_inodes() with a call to the acquired address. All the complexity is hidden behind HAVE_INVALIDATE_INODES and invalidate_inodes() can be used as usual. Long term we should try to get this, or another, interface made available to modules again.
* Prep spl-0.6.0-rc1 tagBrian Behlendorf2011-02-181-1/+1
| | | | Create the first 0.6.0 release candidate tag (rc1).
* Prefer /lib/modules/$(uname -r)/ linksBrian Behlendorf2011-02-102-9/+24
| | | | | | | | Preferentially use the /lib/modules/$(uname -r)/source and /lib/modules/$(uname -r)/build links. Only if neither of these links exist fallback to alternate methods for deducing which kernel to build with. This resolves the need to manually specify --with-linux= and --with-linux-obj= on Debian systems.
* Update META to 0.6.0Brian Behlendorf2011-02-071-1/+1
| | | | | | Roll the version forward to 0.6.0. While no major changes really warrant this I want to keep the version in step with ZFS for now which is the only SPL consumer.
* Block in cv_destroy() on all waitersBrian Behlendorf2011-02-042-3/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we would ASSERT in cv_destroy() if it was ever called with active waiters. However, I've now seen several instances in OpenSolaris code where they do the following: cv_broadcast(); cv_destroy(); This leaves no time for active waiters to be woken up and scheduled and we trip the ASSERT. This has not been observed to be an issue on OpenSolaris because their cv_destroy() basically does nothing. They still do run the risk of the memory being free'd after the cv_destroy() and hitting a bad paging request. But in practice this race is so small and unlikely it either doesn't happen, or is so unlikely when it does happen the root cause has not yet been identified. Rather than risk the same issue in our code this change updates cv_destroy() to block until all waiters have been woken and scheduled. This may take some time because each waiter must acquire the mutex. This change may have an impact on performance for frequently created and destroyed condition variables. That however is a price worth paying it avoid crashing your system. If performance issues are observed they can be addressed by the caller.
* Minor policy interfaceBrian Behlendorf2011-01-271-5/+18
| | | | | | | | | Simply add the policy function wrappers. They are completely non-functional and always return that everything is OK, but once again they simplify compilation of dependent packages for now. These can/should be removed once the security policy of the dependent application is completely understood and intergrade as appropriate with Linux.
* Add missing headersBrian Behlendorf2011-01-274-0/+99
| | | | | | Dependent packages require the following missing headers to simplify compilation. The headers are basically just stubbed out with minimal content required.
* Add VSA_ACE_* and MAX_ACL_ENTRIES definesBrian Behlendorf2011-01-271-0/+7
| | | | | | | | | | The following flags are use to get the proper mask when getting and setting ACLs. I'm hopeful this can all largely go away at some point. We also add a define for the maximum number of ACL entries. MAX_ACL_ENTRIES is used as the maximum number of entries for each type.
* Add MAXUID defineBrian Behlendorf2011-01-271-0/+2
| | | | | | | | For Linux the maximum uid can vary depending on how your kernel is built. The Linux kernel still can be compiled with 16 but uids and gids, although I'm not aware of a major distribution which does this (maybe an embedded one?). Given that caviot it is reasonably safe to define the MAXUID as 2147483647.
* Add FIGNORECASE defineBrian Behlendorf2011-01-271-1/+2
| | | | | The FIGNORECASE case define is now needed, place it with the related flags.
* Add ksid_index_t and ksid_t typesBrian Behlendorf2011-01-271-0/+9
| | | | | Add the ksid_index_t enum and ksid_t type for use. These types are now used by packages which depend on the SPL.
* Minimal VFS additionsBrian Behlendorf2011-01-271-4/+10
| | | | | | This patch simply removes the place holder vfs_t type and includes some generic Linux VFS headers. It also makes some minor fid_t additions for compatibility.
* Remove VN_HOLD/VN_RELE/VOP_PUTPAGEBrian Behlendorf2011-01-123-22/+1
| | | | | | Previously these were defined to noops but rather than give the misleading impression that these are actually implemented I'm removing the type entirely for clarity.
* Add a few additional vnode #definesBrian Behlendorf2011-01-121-0/+8
| | | | These additional constants now have users in dependant packages.
* Make vn_cache|vn_file_cache kmem cachesBrian Behlendorf2011-01-121-2/+2
| | | | | | | Both of these caches were previously allowed to be either a vmem or kmem cache based on the size of the object involved. Since we know the object won't be to large and performce is much better for a kmem cache for them to be kmem backed.
* Clean vattr_t and vsecattr_t typesBrian Behlendorf2011-01-122-18/+16
| | | | Minor cleanup for the vattr_t and vsecattr_t types.
* FRSYNC Should Use O_SYNCBrian Behlendorf2011-01-121-1/+1
| | | | | The Solaris FRSYNC maps most logically to the Linux O_SYNC. There is no O_RSYNC on Linux but this wasn't noticed until just recently.
* Add vn_mode_to_vtype/vn_vtype to_mode helpersBrian Behlendorf2011-01-123-6/+41
| | | | | Add simple helpers to convert a vnode->v_type to a inode->i_mode. These should be used sparingly but they are handy to have.
* Add cv_timedwait_interruptible() functionNeependra Khare2011-01-112-16/+33
| | | | | | | | | | | | | | | | | | The cv_timedwait() function by definition must wait unconditionally for cv_signal()/cv_broadcast() before waking. This causes processes to go in the D state which increases the load average. The load average is the summation of processes in D state and run queue. To avoid this it can be desirable to sleep interruptibly. These processes do not count against the load average but may be woken by a signal. It is up to the caller to determine why the process was woken it may be for one of three reasons. 1) cv_signal()/cv_broadcast() 2) the timeout expired 3) a signal was received Signed-off-by: Brian Behlendorf <[email protected]>
* Linux Compat: inode->i_mutex/i_semBrian Behlendorf2011-01-112-11/+14
| | | | | | | | Create spl_inode_lock/spl_inode_unlock compability macros to simply access to the inode mutex/sem. This avoids the need to have to ugly up the code with the required #define's at every call site. At the moment the SPL only uses this in one place but higher layers can benefit from the macro.
* Add Thread Specific Data (TSD) Regression TestBrian Behlendorf2010-12-071-1/+184
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To validate the correct behavior of the TSD interfaces it's important that we add a regression test. This test is designed to minimally exercise the fundamental TSD behavior, it does not attempt to validate all potential corner cases. The test will first create 32 keys via tsd_create() and register a common destructor. Next 16 wait threads will be created each of which set/verify a random value for all 32 keys, then block waiting to be released by the control thread. Meanwhile the control thread verifies that none of the destructors have been run prematurely. The next phase of the test is to create 16 exit threads which set/verify a random value for all 32 keys. They then immediately exit. This is is designed to verify tsd_exit() which will be called via thread_exit(). This must result in all registered destructors being run and the memory for the tsd being free'd. After this tsd_destroy() is verified by destroying all 32 keys. Once again we must see the expected number of destructors run and the tsd memory free'd. At this point the blocked threads are released and they exit calling tsd_exit() which should do very little since all the tsd has already been destroyed. If this all goes off without a hitch the test passes. To ensure no memory has been leaked, I have manually verified that after spl module unload no memory is reported leaked.
* Add Thread Specific Data (TSD) ImplementationBrian Behlendorf2010-12-078-3/+700
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thread specific data has implemented using a hash table, this avoids the need to add a member to the task structure and allows maximum portability between kernels. This implementation has been optimized to keep the tsd_set() and tsd_get() times as small as possible. The majority of the entries in the hash table are for specific tsd entries. These entries are hashed by the product of their key and pid because by design the key and pid are guaranteed to be unique. Their product also has the desirable properly that it will be uniformly distributed over the hash bins providing neither the pid nor key is zero. Under linux the zero pid is always the init process and thus won't be used, and this implementation is careful to never to assign a zero key. By default the hash table is sized to 512 bins which is expected to be sufficient for light to moderate usage of thread specific data. The hash table contains two additional type of entries. They first type is entry is called a 'key' entry and it is added to the hash during tsd_create(). It is used to store the address of the destructor function and it is used as an anchor point. All tsd entries which use the same key will be linked to this entry. This is used during tsd_destory() to quickly call the destructor function for all tsd associated with the key. The 'key' entry may be looked up with tsd_hash_search() by passing the key you wish to lookup and DTOR_PID constant as the pid. The second type of entry is called a 'pid' entry and it is added to the hash the first time a process set a key. The 'pid' entry is also used as an anchor and all tsd for the process will be linked to it. This list is using during tsd_exit() to ensure all registered destructors are run for the process. The 'pid' entry may be looked up with tsd_hash_search() by passing the PID_KEY constant as the key, and the process pid. Note that tsd_exit() is called by thread_exit() so if your using the Solaris thread API you should not need to call tsd_exit() directly.
* Refresh autogen.sh productsBrian Behlendorf2010-11-302-6/+6
| | | | | | | | | Refresh the autogen.sh products based on the versions which are installed by default in the GA RHEL6.0 release. autoconf (GNU Autoconf) 2.63 automake (GNU automake) 1.11.1 ltmain.sh (GNU libtool) 2.2.6b
* Make kmutex_t typesafe in all cases.Ricardo M. Correia2010-11-291-13/+19
| | | | | | | | | | | | | | | | | | When HAVE_MUTEX_OWNER and CONFIG_SMP are defined, kmutex_t is just a typedef for struct mutex. This is generally OK but has the downside that it can make mistakes such as mutex_lock(&kmutex_var) to pass by unnoticed until someone compiles the code without HAVE_MUTEX_OWNER or CONFIG_SMP (in which case kmutex_t is a real struct). Note that the correct API to call should have been mutex_enter() rather than mutex_lock(). We prevent these kind of mistakes by making kmutex_t a real structure with only one field. This makes kmutex_t typesafe and it shouldn't have any impact on the generated assembly code. Signed-off-by: Ricardo M. Correia <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Clear cv->cv_mutex when not in useBrian Behlendorf2010-11-292-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | For debugging purposes the condition varaibles keep track of the mutex used during a wait. The idea is to validate that all callers always use the same mutex. Unfortunately, we have seen cases where the caller reuses the condition variable with a different mutex but in a way which is known to be safe. My reading of the man pages suggests you should not do this and always cv_destroy()/cv_init() a new mutex. However, there is overhead in doing this and it does appear to be allowed under Solaris. To accomidate this behavior cv_wait_common() and __cv_timedwait() have been modified to clear the associated mutex when the last waiter is dropped. This ensures that while the condition variable is in use the incorrect mutex case is detected. It also allows the condition variable to be safely recycled without requiring the overhead of a cv_destroy()/cv_init() as long as it isn't currently in use. Finally, spin lock cv->cv_lock was removed because it is not required. When the condition variable is used properly the caller will always be holding the mutex so the spin lock is redundant. The lock was originally added because I expected to need to protect more than just the cv->cv_mutex. It turns out that was not the case. Signed-off-by: Brian Behlendorf <[email protected]>
* Give ENOTSUP a valid user space error valueNed Bass2010-11-101-1/+1
| | | | | | | | | | | | | | | | The ZFS module returns ENOTSUP for several error conditions where an operation is not (yet) supported. The SPL defined ENOTSUP in terms of ENOTSUPP, but that is an internal Linux kernel error code that should not be seen by user programs. As a result the zfs utilities print a confusing error message if an unsupported operation is attempted: internal error: Unknown error 524 Aborted This change defines ENOTSUP in terms of EOPNOTSUPP which is consistent with user space. Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 2.6.36 compat, use fops->unlocked_ioctl()Brian Behlendorf2010-11-102-10/+10
| | | | | | | | | As of linux-2.6.36 the last in-tree consumer of fops->ioctl() has been removed and thus fops()->ioctl() has also been removed. The replacement hook is fops->unlocked_ioctl() which has existed in kernel since 2.6.12. Since the SPL only contains support back to 2.6.18 vintage kernels, I'm not adding an autoconf check for this and simply moving everything to use fops->unlocked_ioctl().