aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Switch KM_SLEEP to KM_PUSHPAGEBrian Behlendorf2012-08-273-7/+6
| | | | | | | | | | Under certain circumstances the following functions may be called in a context where KM_SLEEP is unsafe and can result in a deadlocked system. To avoid this problem the unconditional KM_SLEEPs are converted to KM_PUSHPAGEs. This will prevent them from attempting to initiate any I/O during direct reclaim. Signed-off-by: Brian Behlendorf <[email protected]>
* Mutex ASSERT on self deadlockBrian Behlendorf2012-08-271-2/+7
| | | | | | | | | | | | Generate an assertion if we're going to deadlock the system by attempting to acquire a mutex the process is already holding. There are currently no known instances of this under normal operation, but it _might_ be possible when using a ZVOL as a swap device. I want to ensure we catch this immediately if it were to occur. Signed-off-by: Brian Behlendorf <[email protected]>
* Add PF_NOFS debugging flagBrian Behlendorf2012-08-271-0/+49
| | | | | | | | | | | | | | | | | | | | PF_NOFS is a per-process debug flag which is set in current->flags to detect when a process is performing an unsafe allocation. All tasks with PF_NOFS set must strictly use KM_PUSHPAGE for allocations because if they enter direct reclaim and initiate I/O they may deadlock. When debugging is disabled, any incorrect usage will be detected and a call stack with a warning will be printed to the console. The flags will then be automatically corrected to allow for safe execution. If debugging is enabled this will be treated as a fatal condition. To avoid any risk of conflicting with the existing PF_ flags. The PF_NOFS bit shadows the rarely used PF_MUTEX_TESTER bit. Only when CONFIG_RT_MUTEX_TESTER is not set, and we know this bit is unused, will the PF_NOFS bit be valid. Happily, most existing distributions ship a kernel with CONFIG_RT_MUTEX_TESTER disabled. Signed-off-by: Brian Behlendorf <[email protected]>
* Revert "Disable vmalloc() direct reclaim"Brian Behlendorf2012-08-271-22/+2
| | | | | | | | | This reverts commit 2092cf68d89a51eb0d6193aeadabb579dfc4b4a0. The use of the PF_MEMALLOC flag was always a hack to work around memory reclaim deadlocks. Those issues are believed to be resolved so this workaround can be safely reverted. Signed-off-by: Brian Behlendorf <[email protected]>
* Revert "Fix NULL deref in balance_pgdat()"Brian Behlendorf2012-08-271-8/+5
| | | | | | | | | This reverts commit b8b6e4c453929596b630fa1cca1ee26a532a2ab4. The use of the PF_MEMALLOC flag was always a hack to work around memory reclaim deadlocks. Those issues are believed to be resolved so this workaround can be safely reverted. Signed-off-by: Brian Behlendorf <[email protected]>
* Revert "Detect kernels that honor gfp flags passed to vmalloc()"Brian Behlendorf2012-08-273-43/+0
| | | | | | | | | | This reverts commit 36811b4430aaea8c8b91bbe7d812a26799865499. Which is no longer required because there is now SPL code in place to safely handle the deadlocks the kernel patch was designed to address. Therefore we can unconditionally use vmalloc() and drop all the PF_MEMALLOC code. Signed-off-by: Brian Behlendorf <[email protected]>
* Revert "Add TASKQ_NORECLAIM flag"Brian Behlendorf2012-08-272-5/+0
| | | | | | | | | This reverts commit 372c2572336468cbf60272aa7e735b7ca0c3807c. The use of the PF_MEMALLOC flag was always a hack to work around memory reclaim deadlocks. Those issues are believed to be resolved so this workaround can be safely reverted. Signed-off-by: Brian Behlendorf <[email protected]>
* Emergency slab objectsBrian Behlendorf2012-08-273-44/+216
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is designed to resolve a deadlock which can occur with __vmalloc() based slabs. The issue is that the Linux kernel does not honor the flags passed to __vmalloc(). This makes it unsafe to use in a writeback context. Unfortunately, this is a use case ZFS depends on for correct operation. Fixing this issue in the upstream kernel was pursued and patches are available which resolve the issue. https://bugs.gentoo.org/show_bug.cgi?id=416685 However, these changes were rejected because upstream felt that using __vmalloc() in the context of writeback should never be done. Their solution was for us to rewrite parts of ZFS to accomidate the Linux VM. While that is probably the right long term solution, and it is something we want to pursue, it is not a trivial task and will likely destabilize the existing code. This work has been planned for the 0.7.0 release but in the meanwhile we want to improve the SPL slab implementation to accomidate this expected ZFS usage. This is accomplished by performing the __vmalloc() asynchronously in the context of a work queue. This doesn't prevent the posibility of the worker thread from deadlocking. However, the caller can now safely block on a wait queue for the slab allocation to complete. Normally this will occur in a reasonable amount of time and the caller will be woken up when the new slab is available,. The objects will then get cached in the per-cpu magazines and everything will proceed as usual. However, if the __vmalloc() deadlocks for the reasons described above, or is just very slow, then the callers on the wait queues will timeout out. When this rare situation occurs they will attempt to kmalloc() a single minimally sized object using the GFP_NOIO flags. This allocation will not deadlock because kmalloc() will honor the passed flags and the caller will be able to make forward progress. As long as forward progress can be maintained then even if the worker thread is deadlocked the critical thread will make progress. This will eventually allow the deadlocked worker thread to complete and normal operation will resume. These emergency allocations will likely be slow since they require contiguous pages. However, their use should be rare so the impact is expected to be minimal. If that turns out not to be the case in practice further optimizations are possible. One additional concern is if these emergency objects are long lived. Right now they are simply tracked on a list which must be walked when an object is freed. Is they accumulate on a system and the list grows freeing objects will become more expensive. This could be handled relatively easily by using a hash instead of a list, but that optimization (if needed) is left for a follow up patch. Additionally, these emeregency objects could be repacked in to existing slabs as objects are freed if the kmem_cache_set_move() functionality was implemented. See issue https://github.com/zfsonlinux/spl/issues/26 for full details. This work would also help reduce ZFS's memory fragmentation problems. The /proc/spl/kmem/slab file has had two new columns added at the end. The 'emerg' column reports the current number of these emergency objects in use for the cache, and the following 'max' column shows the historical worst case. These value should give us a good idea of how often these objects are needed. Based on these values under real use cases we can tune the default behavior. Lastly, as a side benefit using a single work queue for the slab allocations should reduce cpu contention on the global virtual address space lock. This should manifest itself as reduced cpu usage for the system. Signed-off-by: Brian Behlendorf <[email protected]>
* Remove SPL_LINUX_CONFIG autoconf macroPrakash Surya2012-08-271-20/+0
| | | | | | | | | | Since removing the check for CONFIG_PREEMPT, there are no consumers of the SPL_LINUX_CONFIG macro. As such, there is no reason to keep it around. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #164
* Revert "Make CONFIG_PREEMPT Fatal"Prakash Surya2012-08-271-8/+0
| | | | | | | This reverts commit 7731d46b69bd893d515c55e87ffa8a9bd2ddfb38. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Remove autotools productsBrian Behlendorf2012-08-2725-47064/+14
| | | | | | | | Remove all of the generated autotools products from the repository and update the .gitignore files accordingly. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#718
* Add kpreempt_[dis|en]able macros in <sys/disp.h>Prakash Surya2012-08-241-0/+5
| | | | | | | | | | | To support preempt enabled kernels in ZFS on Linux, there are a couple places where the ZFS code needs to disable interrupts. This change adds the Solaris preempt functions and maps them to the equivalent ZFS functions, allowing the ZFS to make use of them. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #98
* Avoid calling smp_processor_id in spl_magazine_agePrakash Surya2012-08-242-6/+7
| | | | | | | | | | | | | The spl_magazine_age function had the implied assumption that it will remain on its current cpu through its execution. In order to support preempt enabled kernels, this assumption had to be removed. The spl_kmem_magazine structure now holds the cpu id of the cpu it is local to. This allows spl_magazine_age to use this field when scheduling work to be done by the magazine's local cpu. Signed-off-by: Brian Behlendorf <[email protected]> Issue #98
* Remove Makefile from non-toplevel .gitignore filesRichard Yao2012-08-238-7/+1
| | | | | | | | | | | | | | When building SPL support into the kernel, ./copy-builtin will copy non-toplevel .gitignore files. These files list /Makefile, which causes git-archive to omit ./module/{spl,splat}/Makefile. The absence of these files result in build failures when SPL is selected. ZFS is unaffected because it puts Makefile in the toplevel .gitignore, which is not copied. We fix SPL by emulating that behavior. Reported-by: Fabio Erculiani <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #152
* Wrap trace_set_debug_header in trace_[get|put]_tcdPrakash Surya2012-08-231-2/+1
| | | | | | | | | | | | To properly support CONFIG_PREEMPT enabled kernels, we must refrain from using a CPU index when preemption is enabled. As a result, this change moves the trace_set_debug_header call (which calls smp_processor_id) within trace_get_tcd and trace_put_tcd (which disable and enable preemption respectively). Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #160
* Add copy-builtin to EXTRA_DISTBrian Behlendorf2012-08-232-2/+2
| | | | | | | | The copy-builtin script was accidentally not being included in the tarballs. Signed-off-by: Brian Behlendorf <[email protected]> Closes #159
* SPL 0.6.0-rc10Brian Behlendorf2012-08-131-1/+1
|
* Cleanly remove spl-modules-devel headersBrian Behlendorf2012-08-131-3/+11
| | | | | | | | | | | | | | Add the /usr/src/spl-<version>-<release>/<kernel> directory to the spl-modules-devel package. This ensures that this directory will be removed when the package is removed. We do not include the higher level /usr/src/spl-<version>-<release> directory since there may be builds for multiple kernels. Instead, a %postun rmdir is added which attempts to remove this directory. It will only succeed when the last spl-modules-devel-* package for this specific release is removed. Signed-off-by: Brian Behlendorf <[email protected]>
* Support building a spl-modules-dkms sub packagePrakash Surya2012-08-088-10/+201
| | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds support for building a spl-modules-dkms sub package built around Dynamic Kernel Module Support. This is to allow building packages using the DKMS infrastructure which is intended to ease the burden of kernel version changes, upgrades, etc. By default spl-modules-dkms-* sub package will be built as part of the 'make rpm' target. Alternately, you can build only the DKMS module package using the 'make rpm-dkms' target. Examples: # To build packaged binaries as well as a dkms packages $ ./configure && make rpm # To build only the packaged binary utilities and dkms packages $ ./configure && make rpm-utils rpm-dkms Note: Only the RHEL 5/6, CHAOS 5, and Fedora distributions are supported for building the dkms sub package. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#535
* Handle any invalidate_inodes_check prototype.Etienne Dechamps2012-08-062-7/+10
| | | | | | | | | | | | | | | In the comments of commit 723aa3b0c2eed070f7eeadd2ce2d87f46da6d0f8, mmatuska reported that the test for invalidate_inodes_check() is broken if invalidate_inodes() takes two arguments. This patch fixes the issue by resorting to another approach for detecting invalidate_inodes_check(): is simply checks if invalidate_inodes is defined as a macro. If it is, then it concludes that invalidate_inodes_check() is available. This will continue to work even if the prototype of invalidate_inodes_check() changes over time. Signed-off-by: Brian Behlendorf <[email protected]> Closes #148
* Fix incorrect type in spl_kmem_cache_set_move() parameterRichard Yao2012-08-012-2/+2
| | | | | | | A preprocessor definition renders this harmless. However, it is a good idea to change this to be consistent. Signed-off-by: Richard Yao <[email protected]>
* Merge branch 'builtin-clean'Brian Behlendorf2012-07-2615-1315/+4897
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support in-tree builtin module building. These commits add support for compiling the SPL module as a built-in kernel module by copying the module code into the kernel source tree. Here's the procedure: - Create your kernel configuration (`.config` file) as usual. This has to be done first so that SPL's configure script is able to detect kernel features correctly. - Run `make prepare scripts` inside the kernel source tree. - Run `./configure --enable-linux-builtin --with-linux=/usr/src/linux-...` inside the SPL directory. - Run `./copy-builtin /usr/src/linux-...` inside the SPL directory. - In the kernel source tree, enable the `CONFIG_SPL` option (e.g. using `make menuconfig`). - Build the kernel as usual. SPL module parameters can be set at boot time using the following syntax on the kernel command line: `spl.parameter_name=parameter_value`. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#851
| * Determine the hostid on demand.Etienne Dechamps2012-07-261-13/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the SPL tries to determine the hostid at module load. The hostid is usually determined by running the userland program "hostid" during module initialization. Unfortunately, when the module initializes, it may be way too soon to be able to run any userland programs. This is especially true when the module is compiled directly inside the kernel (built-in); in that case, the SPL would try to run hostid when the kernel is still initializing, which of course is doomed to fail. This patch fixes the issue by deferring hostid generation until something actually needs the hostid (that is, when zone_get_hostid() is called), thus switching to a "on-initialization" model to a "on-demand" (lazy loading) model. ZFS only needs the hostid when some pool operations are requested, and this always happens way after the kernel has finished initialization, thus solving the problem. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#851
| * Add script for builtin module building.Etienne Dechamps2012-07-265-9/+137
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit introduces a "copy-builtin" script designed to prepare a kernel source tree for building SPL as a builtin module. The script makes a full copy of all needed files, thus making the kernel source tree fully independent of the spl source package. To achieve that, some compilation flags (-include, -I) have been moved to module/Makefile. This Makefile is only used when compiling external modules; when compiling builtin modules, a Kbuild file generated by the configure-builtin script is used instead. This makes sure Makefiles inside the kernel source tree does not contain references to the spl source package. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#851
| * When checking for symbol exports, try compiling.Etienne Dechamps2012-07-263-956/+3994
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new autoconf function: SPL_LINUX_TRY_COMPILE_SYMBOL. This new function does the following: - Call LINUX_TRY_COMPILE with the specified parameters. - If unsuccessful, return false. - If successful and we're configuring with --enable-linux-builtin, return true. - Else, call CHECK_SYMBOL_EXPORT with the specified parameters and return the result. All calls to CHECK_SYMBOL_EXPORT are converted to LINUX_TRY_COMPILE_SYMBOL so that the tests work even when configuring for builtin on a kernel which doesn't have loadable module support, or hasn't been built yet. The only exception are: - AC_GET_VMALLOC_INFO, because we don't even have a public header to include in the test case, but that's okay considering this symbol can be ignored just fine. - SPL_AC_DEVICE_CREATE, which is legacy API for 2.6.18 kernels. Since kernels this old are no longer supported it should arguably just be removed entirely from the build system. Note that we're also checking for the correct prototype with an actual call, which was not the case with CHECK_SYMBOL_EXPORT. However, for "complicated" test cases like with multiple symbol versions (e.g. vfs_fsync), we stick with the original behavior and only check for the function's existence. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#851
| * Fake modpost stage for LINUX_COMPILE.Etienne Dechamps2012-07-262-200/+398
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when building a test case, we're compiling an entire Linux module from beginning to end. This includes the MODPOST stage, which generates a "conftest.mod.c" file with some boilerplate module declaration code. This poses a problem when configuring for built-in on kernels which have loadable module support disabled. In this case conftest.mod.c is referencing disabled code, resulting in a compilation failure, thus breaking the tests. This patch fixes the issue by faking the modpost stage when the --enable-linux-builtin option is provided. It does so by forcing the modpost command to be /bin/true, and using an empty conftest.mod.c file. The test module still compiles fine, although the result isn't loadable, but we don't really care at this point. Note it is important to preserve the modpost stage when building out of tree. This allows for the posibility of configure checks to leverage this phase to identify GPL-only symbols. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#851
| * Make configure builtin-aware.Etienne Dechamps2012-07-262-64/+259
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new option to configure: --enable-linux-builtin. When this option is used, the following happens: - Compilation of kernel modules is disabled. - A failure to find UTS_RELEASE is followed by a suggestion to run "make prepare" on the kernel source tree. This patch also adds a new test which tries to compile an empty module as a basic toolchain sanity test. If it fails and the option was specified, the error is followed by a suggestion to run "make scripts" on the kernel source tree. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#851
| * Fix undefined reference on spl_mutex_spin_max().Etienne Dechamps2012-07-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3160d4f56bf35492e9c400094f8c1ff2066d4459 changed the set of conditions under which spl_mutex_spin_max would be implemented as a function by changing an #if in sys/mutex.h. The corresponding implementation file spl-mutex.c, however, has not been updated to reflect the change. This results in undefined reference errors on spl_mutex_spin_max under the following condition: ((!CONFIG_SMP || CONFIG_DEBUG_MUTEXES) && HAVE_MUTEX_OWNER && HAVE_TASK_CURR) This patch fixes the issue by using the same #if in sys/mutex.h and spl-mutex.c. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#851
| * Don't build packages that haven't been selected.Etienne Dechamps2012-07-265-38/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when configure --with-config is used, selective compilation is only effective for the simple "make" case. Package builders (e.g. make rpm) still build everything (utils and modules). This patch fixes that. This patch also drops the duplicate rpm-modules build target. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Issue zfsonlinux/zfs#851
| * Use MODULE variable in module Makefile like zfs.Etienne Dechamps2012-07-262-40/+40
|/ | | | | | | | | | | | | | In zfs, each module Makefile contains a MODULE variable which contains the name of the module, and the following declarations reference this variable. In spl, there is a MODULES variable which is never used. Rename it to MODULE and use it like in zfs. This improves consistency between the two build systems. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#851
* 32-bit compat, hostid_read()Brian Behlendorf2012-07-201-2/+2
| | | | | | | | | | | Explicitly cast the sizeof in hostid_read() to prevent the following compiler warning on 32-bit systems. module/spl/spl-generic.c:490:10: error: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Werror=format] Signed-off-by: Brian Behlendorf <[email protected]>
* Optimize spl_rwsem_is_locked()Brian Behlendorf2012-07-131-42/+25
| | | | | | | | | | | | | | | | | The spl_rwsem_is_locked() compatibility function has been observed to be a hot spot. The root cause of this is that we must check the rwsem activity under the rwsem->wait_lock to avoid a race. When the lock is busy significant contention can occur. The upstream kernel fix for this race had the insight that by using spin_trylock_irqsave() this contention could be avoided. When the lock is contended it's reasonable to return that it is locked. This change updates the SPLs implemention to be like the upstream kernel. Since the kernel code has been in use for years now this a low risk change. Signed-off-by: Brian Behlendorf <[email protected]>
* Move spl.release generation to configure stepPrakash Surya2012-07-125-11/+14
| | | | | | | | | | | | | | | Previously, the spl.release file was created at 'make install' time. This is slightly problematic when the file is needed without running 'make install'. Because of this, the step creating the file was removed from 'make install' and replaced with a more appropriate spl.release.in file. As a result, the spl.release file will now be created earlier as part of the 'configure' step as opposed to the 'make install' step. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #135
* Detect kernels that honor gfp flags passed to vmalloc()Richard Yao2012-07-114-0/+199
| | | | | | | | | | | | | | | | | | | | | | zfsonlinux/spl@2092cf68d89a51eb0d6193aeadabb579dfc4b4a0 used PF_MEMALLOC to workaround a bug in the Linux kernel where allocations did not honor the gfp flags passed to vmalloc(). Unfortunately, PF_MEMALLOC has the side effect of permitting allocations to allocate pages outside of ZONE_NORMAL. This has been observed to result in the depletion of ZONE_DMA32. A kernel patch is available in the Gentoo bug tracker for this issue. https://bugs.gentoo.org/show_bug.cgi?id=416685 This negates any benefit PF_MEMALLOC provides, so we introduce an autotools check to disable the use of PF_MEMALLOC on systems with patched kernels. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #126
* Constify memory management functionsRichard Yao2012-07-032-9/+9
| | | | | | | | | | This prevents warnings in ZFS that were caused by changes necessary to support PaX patched kernels. When debugging is enabled, these warnings become build failures. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #131
* Remove Chaos 4.x RPM supportBrian Behlendorf2012-07-021-41/+0
| | | | | | | The Chaos 4.x distribution is based on RHEL 5.x which is no longer supported by ZoL since it uses a 2.6.18 kernel. Signed-off-by: Brian Behlendorf <[email protected]>
* Support debug and debug-devel sub packagesPrakash Surya2012-07-021-62/+290
| | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds support for building debug and debug-devel sub packages of the spl-modules main package. This is to allow building packages which are built against a debug kernel. By default, only packages are built against a regular non-debug kernel. This can be toggled by passing the '--with kernel-debug' parameter to rpmbuild. Examples: # To build packages against only the non-debug kernel $ rpmbuild --rebuild --with kernel --without kernel-debug $SRPM # To build packages against only the debug kernel $ rpmbuild --rebuild --without kernel --with kernel-debug $SRPM # To build packages against debug and non-debug kernel $ rpmbuild --rebuild --with kernel --with kernel-debug $SRPM Note: Only the RHEL 5/6, CHAOS 5, and Fedora distributions are supported for building the debug and debug-devel packages. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #115
* PowerPC CompatibilityBrian Behlendorf2012-07-023-3/+3
| | | | | | | | | Usage of get_current() is not supported across all architectures. The correct interface to use is the '#define current' which will map to the appropriate function, usually current_thread_info(). Signed-off-by: Brian Behlendorf <[email protected]> Closes #119
* SPL 0.6.0-rc9Brian Behlendorf2012-06-141-1/+1
|
* Linux 3.4 compat, __clear_close_on_exec replaces FD_CLRRichard Yao2012-06-135-1/+167
| | | | | | | | | | | | | | | | | | | | torvalds/linux@1dce27c5aa6770e9d195f2bb7db1db3d4dde5591 introduced __clear_close_on_exec() as a replacement for FD_CLR. Further commits appear to have removed FD_CLR from the Linux source tree. This causes the following failure: error: implicit declaration of function '__FD_CLR' [-Werror=implicit-function-declaration] To correct this we update the code to use the current __clear_close_on_exec() interface for readability. Then we introduce an autotools check to determine if __clear_close_on_exec() is available. If it isn't then we define some compatibility logic which used the older FD_CLR() interface. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #124
* Fix uninit variable in slab reclaim testBrian Behlendorf2012-06-131-1/+1
| | | | | | | | Gcc version 4.7.0 reports the delta.tv_sec in the slab reclaim test as potentially unitialized. In practice this will never occur but to keep gcc happy we initialize the variable to zero. Signed-off-by: Brian Behlendorf <behlendo@fedora-17-amd64.(none)>
* Fix invalid context bugBrian Behlendorf2012-06-111-2/+1
| | | | | | | | | | | | | | In the module unload path the vm_file_cache was being destroyed under a spin lock. Because this operation might sleep it was possible, although very very unlikely, that this could result in a deadlock. This issue was indentified by using a Linux debug kernel and has been fixed by moving the kmem_cache_destroy() out from under the spin lock. There is no need to lock this operation here. Signed-off-by: Brian Behlendorf <[email protected]> Closes zfsonlinux/zfs#771
* Fix ARM 64-bit divisionJorgen Lundman2012-05-221-8/+43
| | | | | | | | | | | | Correctly implementating 64-bit division for ARM requires more than just providing the __aeabi_uldivmod() and __aeabi_ldivmod() symbols. They are need to be implemented is such a way that the quotient and remainder and left in specific registers after the division operation completes. This change updates the wrapper functions to accomplish this according to the official ARM Run-time ABI. Signed-off-by: Brian Behlendorf <[email protected]> Closes zfsonlinux/zfs#706
* Remove Solaris module emulationBrian Behlendorf2012-05-183-628/+10
| | | | | | | | | | Originally I believed that these interfaces would be needed. However, in practice it turned out that it was more straight forward and maintainable to use the native Linux interfaces. As such, this is all dead code and can be safely removed. Signed-off-by: Brian Behlendorf <[email protected]> Closes #109
* Modify KM_PUSHPAGE to use GFP_NOIO instead of GFP_NOFSRichard Yao2012-05-071-1/+1
| | | | | | | | | | | | | | | | | The resolution of issue #31 made KM_PUSHPAGE imply GFP_NOFS. This was done to prevent situations where filesystem operations which are holding locks enter direct reclaim and attempt to reaquire those same locks. This clearly will result in a deadlock. This works for datasets which are implemented in terms for filesystem operations. But unfortunately, swapping to a zvol will encounter many of the same deadlocks and GFP_NOFS will not prevent this. As such, it is appropriate to extend KM_PUSHPAGE to use the broader GFP_NOIO mask to handle these non-filesystem cases. Signed-off-by: Brian Behlendorf <[email protected]> Issue zfsonlinux/zfs#342 Closes #105
* Add SPLAT test to exercise slab direct reclaimPrakash Surya2012-05-071-30/+165
| | | | | | | | | | | | | | | | | | | | | | This test is designed to verify that direct reclaim is functioning as expected. We allocate a large number of objects thus creating a large number of slabs. We then apply memory pressure and expect that the direct reclaim path can easily recover those slabs. The registered reclaim function will free the objects and the slab shrinker will call it repeatedly until at least a single slab can be freed. Note it may not be possible to reclaim every last slab via direct reclaim without a failure because the shrinker_rwsem may be contended. For this reason, quickly reclaiming 3/4 of the slabs is considered a success. This should all be possible within 10 seconds. For reference, on a system with 2G of memory this test takes roughly 0.2 seconds to run. It may take longer on larger memory systems but should still easily complete in the alloted 10 seconds. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #107
* Ensure a minimum of one slab is reclaimedBrian Behlendorf2012-05-071-2/+32
| | | | | | | | | | | | | | | | | To minimize the chance of triggering an OOM during direct reclaim. The kmem caches have been improved to make a best effort to reclaim at least one slab when a reclaim function is registered. This helps avoid the case where objects are released but they are spread over multiple slabs so no memory gets reclaimed. Care has been taken to avoid deadlocking if the reclaim function is unable to make forward progress. Additionally, the reclaim function may be skipped entirely if there are already free slabs which can be safely reaped. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #107
* Ensure direct reclaim forward progressBrian Behlendorf2012-05-071-0/+10
| | | | | | | | | | | | | | | | | | The Linux direct reclaim path uses this out of band value to determine if forward progress is being made. Normally this is incremented by kmem_freepages() which is part of the various Linux slab implementations. However, since we are using none of that infrastructure we're responsible for incrementing this count. If no forward progress is detected and a subsequent allocation fails the OOM killer will be invoked. If there was forward progress additional reclaim will be attempted via the page cache and registerd shrinker until the allocation succeeds. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #107
* Ignore slab cache age and delay in direct reclaimPrakash Surya2012-05-071-1/+2
| | | | | | | | | | | | | | | When memory pressure triggers direct memory reclaim, a slabs age and delay should not prevent it from being freed. This patch ensures these values are ignored, allowing an empty slab to be freed in this code path no matter the value of its age and delay. This prevents needless scanning of the partial slabs and has been observed to significantly reduce the total cpu usage. In addition, it should allow for snappier reclaim under memory pressure. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #102
* Throttle number of freed slabs based on nr_to_scanPrakash Surya2012-05-072-8/+11
| | | | | | | | | | | | | | | | Previously, the SPL tried to maintain Solaris semantics by freeing all available (empty) slabs from its slab caches when the shrinker was called. This is not desirable when running on Linux. To make the SPL shrinker more Linux friendly, the actual number of freed slabs from each of the slab caches is now derived from nr_to_scan and skc_slab_objs. Additionally, an accounting bug was fixed in spl_slab_reclaim() which could cause us to reclaim one more slab than requested. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #101