summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Change KM_PUSHPAGE -> KM_SLEEPBrian Behlendorf2015-01-1665-281/+269
| | | | | | | | | | | | | | | By marking DMU transaction processing contexts with PF_FSTRANS we can revert the KM_PUSHPAGE -> KM_SLEEP changes. This brings us back in line with upstream. In some cases this means simply swapping the flags back. For others fnvlist_alloc() was replaced by nvlist_alloc(..., KM_PUSHPAGE) and must be reverted back to fnvlist_alloc() which assumes KM_SLEEP. The one place KM_PUSHPAGE is kept is when allocating ARC buffers which allows us to dip in to reserved memory. This is again the same as upstream. Signed-off-by: Brian Behlendorf <[email protected]>
* Retire KM_NODEBUGBrian Behlendorf2015-01-1617-30/+26
| | | | | | | | | | | | | | | Callers of kmem_alloc() which passed the KM_NODEBUG flag to suppress the large allocation warning have been replaced by vmem_alloc() as appropriate. The updated vmem_alloc() call will not print a warning regardless of the size of the allocation. A careful reader will notice that not all callers have been changed to vmem_alloc(). Some have only had the KM_NODEBUG flag removed. This was possible because the default warning threshold has been increased to 32k. This is desirable because it minimizes the need for Linux specific code changes. Signed-off-by: Brian Behlendorf <[email protected]>
* Use is_vmalloc_addr() in vdev_disk.cRichard Yao2015-01-161-1/+1
| | | | | | | | | | | | The initial port of ZFS to Linux required a way to identify virtual memory to make IO to virtual memory backed slabs work, so kmem_virt() was created. Linux 2.6.25 introduced is_vmalloc_addr(), which is logically equivalent to kmem_virt(). Support for kernels before 2.6.26 was later dropped and more recently, support for kernels before Linux 2.6.32 has been dropped. We retire kmem_virt() in favor of is_vmalloc_addr() to cleanup the code. Signed-off-by: Brian Behlendorf <[email protected]>
* Mark IO pipeline with PF_FSTRANSBrian Behlendorf2015-01-167-45/+69
| | | | | | | | | | In order to avoid deadlocking in the IO pipeline it is critical that pageout be avoided during direct memory reclaim. This ensures that the pipeline threads can always make forward progress and never end up blocking on a DMU transaction. For this very reason Linux now provides the PF_FSTRANS flag which may be set in the process context. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix zfs_putpage() lock inversion (again)Brian Behlendorf2015-01-081-2/+15
| | | | | | | | | | | | | | | This is a follow up commit to 74328ee which correctly resolved a lock inversion between zfs_putpage() and zfs_free_range(). Unfortunately, in the process it accidentally introduced another inversion between zfs_putpage() and zfs_read(). The page must be unlocked before taking the range lock. This patch corrects that issue. In addition, because the locking rules here are subtle a block comment has been added clearly explaining why the ordering here is critical. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Issue #2976
* Document zfs_flags module parameterNed Bass2015-01-072-3/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a table describing the debugging flags that can be set in the zfs_flags module parameter. Also change the module_param type to 'uint' so users aren't shown a negative value. The updated man page text is reproduced below for convenience. zfs_flags (int) Set additional debugging flags. The following flags may be bitwise-or'd together. +-------------------------------------------------------+ |Value Symbolic Name | | Description | +-------------------------------------------------------+ | 1 ZFS_DEBUG_DPRINTF | | Enable dprintf entries in the debug log. | +-------------------------------------------------------+ | 2 ZFS_DEBUG_DBUF_VERIFY * | | Enable extra dbuf verifications. | +-------------------------------------------------------+ | 4 ZFS_DEBUG_DNODE_VERIFY * | | Enable extra dnode verifications. | +-------------------------------------------------------+ | 8 ZFS_DEBUG_SNAPNAMES | | Enable snapshot name verification. | +-------------------------------------------------------+ | 16 ZFS_DEBUG_MODIFY | | Check for illegally modified ARC buffers. | +-------------------------------------------------------+ | 32 ZFS_DEBUG_SPA | | Enable spa_dbgmsg entries in the debug log. | +-------------------------------------------------------+ | 64 ZFS_DEBUG_ZIO_FREE | | Enable verification of block frees. | +-------------------------------------------------------+ | 128 ZFS_DEBUG_HISTOGRAM_VERIFY | | Enable extra spacemap histogram verifications. | +-------------------------------------------------------+ * Requires debug build. Default value: 0. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2988
* Don't use AC_LANG_SOURCE for conftest.h sourceNed Bass2015-01-061-1/+1
| | | | | | | | | | | | | | | Using AC_LANG_SOURCE with some versions of autoconf is problematic if the given source is to be written to a header file. Such versions assume the contents are to be written to conftest.c and generate shell code to that effect. The contents of the test program to detect support for Linux tracepoints were consequently malformed (containing the source for conftest.h) so the build system incorrectly disabled tracepoints support. Fix this in ZFS_LINUX_TRY_COMPILE_HEADER by passing the header source directly to ZFS_LINUX_COMPILE_IFELSE. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2953
* Remove duplicate typedefs from trace.hNed Bass2015-01-0622-984/+1383
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Older versions of GCC (e.g. GCC 4.4.7 on RHEL6) do not allow duplicate typedef declarations with the same type. The trace.h header contains some typedefs to avoid 'unknown type' errors for C files that haven't declared the type in question. But this causes build failures for C files that have already declared the type. Newer versions of GCC (e.g. v4.6) allow duplicate typedefs with the same type unless pedantic error checking is in force. To support the older versions we need to remove the duplicate typedefs. Removal of the typedefs means we can't built tracepoints code using those types unless the required headers have been included. To facilitate this, all tracepoint event declarations have been moved out of trace.h into separate headers. Each new header is explicitly included from the C file that uses the events defined therein. The trace.h header is still indirectly included form zfs_context.h and provides the implementation of the dprintf(), dbgmsg(), and SET_ERROR() interfaces. This makes those interfaces readily available throughout the code base. The macros that redefine DTRACE_PROBE* to use Linux tracepoints are also still provided by trace.h, so it is a prerequisite for the other trace_*.h headers. These new Linux implementation-specific headers do introduce a small divergence from upstream ZFS in several core C files, but this should not present a significant maintenance burden. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2953
* Fix zfs_putpage() lock inversionBrian Behlendorf2014-12-221-3/+2
| | | | | | | | | | | | | | | | | | | | | There exists a lock inversions involving the zfs range lock and the individual page writeback bits which can result in a deadlock. To prevent this we must always manipulate the writeback bit while holding the range lock. The exact deadlock is as follows: ------ Process A ------ ------ Process B ------ zpl_writepages zpl_fallocate write_cache_pages zpl_fallocate_common zpl_putpage zfs_space zfs_putpage (set bit) zfs_freesp zfs_range_lock (wait on lock) zfs_free_range (take lock) [has not yet initiated I/O, truncate_inode_pages_range the bit will not be cleared] wait_on_page_writeback (wait on bit) Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Richard Yao <[email protected]> Issue #2976
* vdev_id: use mawk-compatible regular expressionNed Bass2014-12-191-1/+1
| | | | | | | | | | | | | Slot mapping in vdev_id doesn't work on systems using mawk as the 'awk' alternative. A regular expression in map_slot() contains an unquoted empty string following the alternation (|) operator, which results in an "missing operand" error with mawk. The solution is to rearrange the expression so the alternation has two operands. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes zfsonlinux/pkg-zfs#136 Closes zfsonlinux/zfs#2965
* Fix cstyle issue from c66989bBrian Behlendorf2014-12-191-2/+2
| | | | | | | Commit c66989b accidentally introduced a cstyle issue which went unnoticed. This tiny patch corrects that oversight. Signed-off-by: Brian Behlendorf <[email protected]>
* Correct error returns to unify cross-pool operation error handlingBoris Protopopov2014-12-191-2/+2
| | | | | | Signed-off-by: Boris Protopopov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2911
* Fix typo in %post scriptlet linesAndy Bakun2014-12-181-4/+4
| | | | | | | | | Missing space made the %post directive be part of the package %description and not have a %post scriptlet defined. Signed-off-by: Andy Bakun <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2961
* zpool upgrade return errors to stderr instead of stdoutJacek FefliƄski2014-12-181-2/+2
| | | | | | Signed-off-by: Jacek Feflinski <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2955
* Improve systemd script to not leave stale sharetabDan Swartzendruber2014-12-181-0/+1
| | | | | | | | | | The systemd script zfs-share.service does 'zfs share -a' to share any required datasets. Unfortunately, /etc/dfs/sharetab is stale from the previous boot. Delete it before we share. Signed-off-by: Dan Swartzendruber <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2883
* Fix snapshots with dirty inodesBrian Behlendorf2014-11-201-0/+3
| | | | | | | | | | Filesystems which are mounted read-only or are immutable because they are snapshots must not be allowed to dirty and inode. This will result in a write which will correctly cause a kernel panic because these filesystem are (and must be) immutable. Signed-off-by: Brian Behlendorf <[email protected]> Closes #2812
* Fix systemd config for zfs-share.serviceDan Swartzendruber2014-11-191-0/+2
| | | | | | | | | | The zfs-share.service rule needs to be modified to ensure that it does not execute before zfs-mount.service. Signed-off-by: Dan Swartzendruber <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ralf Ertzinger <[email protected]> Closes #2893
* bio_alloc() with __GFP_WAIT never returns NULLIsaac Huang2014-11-191-3/+5
| | | | | | | | | | | | Mark the error handling branch as unlikely() because the current kernel interface can never return NULL. However, we want to keep the error handling in case this behavior changes in the futre. Plus fix a small style issue. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Isaac Huang <[email protected]> Closes #2703
* Explicitly include SPL compat headersNed Bass2014-11-198-4/+12
| | | | | | | | | | | Inclusion of SPL compatibility headers was moved out of the public header sys/types.h to avoid conflicts with external packages. Include a few compatiblity headers explicitly to cope with that change. Also, sort some linux-specific inclusions alphabetically. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2898
* Fix improper null-byte termination handlingNed Bass2014-11-172-16/+6
| | | | | | | | | | | | | | | | | | | | | | | Fix a few cases where null-byte termination of strings was done unnecessarily or incorrectly. - The snprintf() function always produces a null-byte terminated string for non-negative return values, so it is not necessary to write out a null-byte as a separate step. - Also, it is unsafe to use the return value of snprintf() as an offset for placing a null-byte, because if the output was truncated the return value is the number of bytes that _would_ have been written had enough space been available. Therefore the return value may index beyond the array boundaries. - Finally, snprintf() accounts for the null-byte when limiting its output size, so there is no need to pass it a size parameter that is one less than the buffer size. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2875
* Prevent ZFS leaking pool free spacesmh2014-11-171-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When processing async destroys ZFS would leak space every txg timeout (5 seconds by default), if no writes occurred, until the pool is totally full. At this point it would be unfixable without a pool recreation. In addition if the machine was rebooted with the pool in this situation would fail to import on boot, hanging indefinitely, as the import process requires the ability to write data to the pool. Any attempts to query the pool status during the hung import would not return as the import holds the pool lock. The only way to import such a pool would be to specify -o readonly=on to the zpool import. zdb -bb <pool> can be used to check for "deferred free" size which is where this lost space will be counted. References: https://github.com/freebsd/freebsd/commit/48431b7 http://svnweb.freebsd.org/base?view=revision&revision=273158 https://reviews.csiden.org/r/132/ Porting notes: This issue was filed as illumos 5347 and a more comprehensive fix is under review. Once that change is finalized it will be integrated, in the meanwhile the FreeBSD fix has been merged to prevent the issue. Ported by: Tim Chase <[email protected]> Signed-off-by: Matthew Ahrens [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #2896
* Undirty freed spill blocks.Tim Chase2014-11-171-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | If a spill block's dbuf hasn't yet been written when a spill block is freed, the unwritten version will still be written. This patch handles the case in which a spill block's dbuf is freed and undirties it to prevent it from being written. The most common case in which this could happen is when xattr=sa is being used and a long xattr is immediately replaced by a short xattr as in: setfattr -n user.test -v very_very_very..._long_value <file> setfattr -n user.test -v short_value <file> The first value must be sufficiently long that a spill block is generated and the second value must be short enough to not require a spill block. In practice, this would typically happen due to internal xattr operations as a result of setting acltype=posixacl. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2663 Closes #2700 Closes #2701 Closes #2717 Closes #2863 Closes #2884
* Merge branch 'b_tracepoints'Brian Behlendorf2014-11-1722-285/+1530
|\ | | | | | | | | | | | | Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2874
| * Swap DTRACE_PROBE* with Linux tracepointsPrakash Surya2014-11-1715-161/+1361
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch leverages Linux tracepoints from within the ZFS on Linux code base. It also refactors the debug code to bring it back in sync with Illumos. The information exported via tracepoints can be used for a variety of reasons (e.g. debugging, tuning, general exploration/understanding, etc). It is advantageous to use Linux tracepoints as the mechanism to export this kind of information (as opposed to something else) for a number of reasons: * A number of external tools can make use of our tracepoints "automatically" (e.g. perf, systemtap) * Tracepoints are designed to be extremely cheap when disabled * It's one of the "accepted" ways to export this kind of information; many other kernel subsystems use tracepoints too. Unfortunately, though, there are a few caveats as well: * Linux tracepoints appear to only be available to GPL licensed modules due to the way certain kernel functions are exported. Thus, to actually make use of the tracepoints introduced by this patch, one might have to patch and re-compile the kernel; exporting the necessary functions to non-GPL modules. * Prior to upstream kernel version v3.14-rc6-30-g66cc69e, Linux tracepoints are not available for unsigned kernel modules (tracepoints will get disabled due to the module's 'F' taint). Thus, one either has to sign the zfs kernel module prior to loading it, or use a kernel versioned v3.14-rc6-30-g66cc69e or newer. Assuming the above two requirements are satisfied, lets look at an example of how this patch can be used and what information it exposes (all commands run as 'root'): # list all zfs tracepoints available $ ls /sys/kernel/debug/tracing/events/zfs enable filter zfs_arc__delete zfs_arc__evict zfs_arc__hit zfs_arc__miss zfs_l2arc__evict zfs_l2arc__hit zfs_l2arc__iodone zfs_l2arc__miss zfs_l2arc__read zfs_l2arc__write zfs_new_state__mfu zfs_new_state__mru # enable all zfs tracepoints, clear the tracepoint ring buffer $ echo 1 > /sys/kernel/debug/tracing/events/zfs/enable $ echo 0 > /sys/kernel/debug/tracing/trace # import zpool called 'tank', inspect tracepoint data (each line was # truncated, they're too long for a commit message otherwise) $ zpool import tank $ cat /sys/kernel/debug/tracing/trace | head -n35 # tracer: nop # # entries-in-buffer/entries-written: 1219/1219 #P:8 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss: hdr... z_rd_int/0-30156 [003] .... 91344.200611: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.201173: zfs_arc__miss: hdr... z_rd_int/1-30157 [003] .... 91344.201756: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.201795: zfs_arc__miss: hdr... z_rd_int/2-30158 [003] .... 91344.202099: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.202126: zfs_arc__hit: hdr ... lt-zpool-30132 [003] .... 91344.202130: zfs_arc__hit: hdr ... lt-zpool-30132 [003] .... 91344.202134: zfs_arc__hit: hdr ... lt-zpool-30132 [003] .... 91344.202146: zfs_arc__miss: hdr... z_rd_int/3-30159 [003] .... 91344.202457: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.202484: zfs_arc__miss: hdr... z_rd_int/4-30160 [003] .... 91344.202866: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.202891: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.203034: zfs_arc__miss: hdr... z_rd_iss/1-30149 [001] .... 91344.203749: zfs_new_state__mru... lt-zpool-30132 [001] .... 91344.203789: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.203878: zfs_arc__miss: hdr... z_rd_iss/3-30151 [001] .... 91344.204315: zfs_new_state__mru... lt-zpool-30132 [001] .... 91344.204332: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204337: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204352: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204356: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204360: zfs_arc__hit: hdr ... To highlight the kind of detailed information that is being exported using this infrastructure, I've taken the first tracepoint line from the output above and reformatted it such that it fits in 80 columns: lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss: hdr { dva 0x1:0x40082 birth 15491 cksum0 0x163edbff3a flags 0x640 datacnt 1 type 1 size 2048 spa 3133524293419867460 state_type 0 access 0 mru_hits 0 mru_ghost_hits 0 mfu_hits 0 mfu_ghost_hits 0 l2_hits 0 refcount 1 } bp { dva0 0x1:0x40082 dva1 0x1:0x3000e5 dva2 0x1:0x5a006e cksum 0x163edbff3a:0x75af30b3dd6:0x1499263ff5f2b:0x288bd118815e00 lsize 2048 } zb { objset 0 object 0 level -1 blkid 0 } For the specific tracepoint shown here, 'zfs_arc__miss', data is exported detailing the arc_buf_hdr_t (hdr), blkptr_t (bp), and zbookmark_t (zb) that caused the ARC miss (down to the exact DVA!). This kind of precise and detailed information can be extremely valuable when trying to answer certain kinds of questions. For anybody unfamiliar but looking to build on this, I found the XFS source code along with the following three web links to be extremely helpful: * http://lwn.net/Articles/379903/ * http://lwn.net/Articles/381064/ * http://lwn.net/Articles/383362/ I should also node the more "boring" aspects of this patch: * The ZFS_LINUX_COMPILE_IFELSE autoconf macro was modified to support a sixth paramter. This parameter is used to populate the contents of the new conftest.h file. If no sixth parameter is provided, conftest.h will be empty. * The ZFS_LINUX_TRY_COMPILE_HEADER autoconf macro was introduced. This macro is nearly identical to the ZFS_LINUX_TRY_COMPILE macro, except it has support for a fifth option that is then passed as the sixth parameter to ZFS_LINUX_COMPILE_IFELSE. These autoconf changes were needed to test the availability of the Linux tracepoint macros. Due to the odd nature of the Linux tracepoint macro API, a separate ".h" must be created (the path and filename is used internally by the kernel's define_trace.h file). * The HAVE_DECLARE_EVENT_CLASS autoconf macro was introduced. This is to determine if we can safely enable the Linux tracepoint functionality. We need to selectively disable the tracepoint code due to the kernel exporting certain functions as GPL only. Without this check, the build process will fail at link time. In addition, the SET_ERROR macro was modified into a tracepoint as well. To do this, the 'sdt.h' file was moved into the 'include/sys' directory and now contains a userspace portion and a kernel space portion. The dprintf and zfs_dbgmsg* interfaces are now implemented as tracepoint as well. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
| * cstyle: allow right paren on its own lineNed Bass2014-11-171-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the style checker script accept right parentheses on their own lines. This is motivated by the Linux tracepoints macro DECLARE_EVENT_CLASS. The code within TP_fast_assign() (a parameter of DECLARE_EVENT_CLASS) is normal C assignments terminated by semicolons. But the style checker forbids us from following a semicolon with a non-blank and from preceding a right parenthesis with white space. Therefore the closing parenthesis must go on the next line, yet the style checker foribs us from indenting it for readability. Relaxing the no-non-blank-after-semicolon rule would open the door to too many bad style practices. So instead we relax the no-white-space-before-right-paren rule if the parenthesis is on its own line. The relaxation is overriden with the -p option so we still have a way to catch misuse of this style. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
| * Fix dprintf format specifiersNed Bass2014-11-175-6/+7
| | | | | | | | | | | | | | | | | | | | | | Fix a few dprintf format specifiers that disagreed with their argument types. These came to light as compiler errors when converting dprintf to use the Linux trace buffer. Previously this wasn't a problem, presumably because the SPL debug logging uses vsnprintf which must perform automatic type conversion. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
| * Move a few internal ARC strucutres to arc_impl.hNed Bass2014-11-173-116/+159
|/ | | | | | | | | | | Add a new file named arc_impl.h and move a few internal ARC structure definitions into this file. This is needed in order to allow the Linux tracepoint functions to grub around in the internals of these structures. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Fix small spelling mistakeRandall Mason2014-11-141-1/+1
| | | | | | | | recieve becomes receive Signed-off-by: Randall Mason <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2877
* Illumos 5213 - panic in metaslab_init due to space_map_open returning ENXIOPrakash Surya2014-11-143-26/+40
| | | | | | | | | | | | | | | | | | 5213 panic in metaslab_init due to space_map_open returning ENXIO Reviewed by: Matthew Ahrens [email protected] Reviewed by: George Wilson [email protected] References: https://www.illumos.org/issues/5213 https://reviews.csiden.org/r/110 Porting notes: For the Linux port, KM_SLEEP was replaced with KM_PUSHPAGE. Ported by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2745
* Print header properly when terminal resizesIsaac Huang2014-11-141-6/+15
| | | | | | | | | Added a handler for SIGWINCH, so that one header is printed per screen even when the terminal resizes. Signed-off-by: Isaac Huang <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2847
* Fix inaccurate field descriptionsIsaac Huang2014-11-141-5/+5
| | | | | | | | | | The field descriptions from arcstat.py -v for the demand accesses are inaccurate. They all begin with "Demand Data" yet the fields actually covered both demand data and demand meta-data accesses. Signed-off-by: Isaac Huang <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2842
* Reduce buf/dbuf mutex contentionChris Wedgwood2014-11-142-2/+2
| | | | | | | | | | | | | | | | | Due to evidence of contention both the buf_hash_table and the dbuf_hash_table sizes have been increased from 256 to 8192. This increase in hash table size adds approximating 0.5M to our fixed memory footprint. This relatively small increase is not expected to cause problems even on low memory machines. This footprint will also become dynamic when the persistent L2ARC support is finalized. In the meanwhile, this small change significantly reduces contention for certain workloads. Signed-off-by: Chris Wedgwood <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Closes #1291
* Export symbols for ZIL interfaceAlex Zhuravlev2014-11-144-1/+27
| | | | | | | | | | | These symbols are needed by consumers (i.e. Lustre) who wish to integrate with the ZIL. In addition the zil_rollback_destroy() prototype was removed because the implementation of this function was removed long ago. Signed-off-by: Alex Zhuravlev <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2892
* Improve zvol symlink handling.Dan Swartzendruber2014-11-061-0/+8
| | | | | | | | | | | | | | | | Change the zvol helper program to replace any embedded spaces in the pool or dataset names with '+' to ensure we have valid symlinks. The '+' character was choosen because it is not a valid character for a dataset name but it is allowed by udev. This ensures that all dataset names with an embedded space will be translated to a unique /dev/zvol/ symlink. Signed-off-by: Dan Swartzendruber <[email protected]> Signed-off-by: Darik Horn <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2834
* Add config/compile to config/.gitignoreMarcel Wysocki2014-10-311-0/+1
| | | | | | | | | | | | This file may be added by automake and therefore should be added to config/.gitignore. For the full list of possible auxiliary programs see the full automake documentation. http://www.gnu.org/software/automake/manual/automake.html#Auxiliary-Programs Signed-off-by: Marcel Wysocki <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2848
* Fix modules installation directoryAlexander Pyhalov2014-10-281-1/+2
| | | | | | | | | When building zfs modules with kernel, compiled from deb.src, the packaging process ends up installing the modules in the wrong place. Signed-off-by: Alexander Pyhalov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2822
* Make systemd-modules-load.service file directory configurableRichard Yao2014-10-281-1/+7
| | | | | | | | | | | | | | | | | | | | | Installing outside of the prefix is not permissible under Gentoo Prefix. The package manager will cause the installation process to fail if/when it sees this. We could handle this by disabling systemd support on prefix because systemd does not check these paths, but the Gentoo Council decided that small files such as these should be installed. That means disabling systemd support on prefix is not an acceptable workaround. As a consequence, we need some way of control the directory into which these files are installed. Making this configurable increases our compliance with the freedesktop.org specification, which allows these files to be installed into /etc/modules-load.d: http://www.freedesktop.org/software/systemd/man/modules-load.d.html Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2641
* Make directory into which mount.zfs is installed configurableRichard Yao2014-10-283-1/+10
| | | | | | | | | | | Installing outside of the prefix is not permissible under Gentoo Prefix. The package manager will cause the installation process to fail if/when it sees this. I could script a workaround inside the ebuild, but it seemed to make more sense to make this more configurable. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2641
* Search /usr/local/src for SPL Object DirectoryRichard Yao2014-10-281-4/+10
| | | | | | | | | | Since we changed the default location for the kernel headers to respect --prefix in the SPL, we must search that location to prevent user builds from breaking. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2641
* Kernel header installation should respect --prefixRichard Yao2014-10-287-8/+8
| | | | | | | | | | This is the upstream component of work that enables preliminary support for building Gentoo's ZFS packaging on other Linux systems via Gentoo Prefix. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2641
* Linux 3.12 compat: shrinker semanticsTim Chase2014-10-281-6/+23
| | | | | | | | | | | | | | The new shrinker API as of Linux 3.12 modifies "struct shrinker" by replacing the @shrink callback with the pair of @count_objects and @scan_objects. It also requires the return value of @count_objects to return the number of objects actually freed whereas the previous @shrink callback returned the number of remaining freeable objects. This patch adds support for the new @scan_objects return value semantics. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #2837
* Illumos 5164-5165 - space map fixesMatthew Ahrens2014-10-234-110/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | 5164 space_map_max_blksz causes panic, does not work 5165 zdb fails assertion when run on pool with recently-enabled space map_histogram feature Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5164 https://www.illumos.org/issues/5165 https://github.com/illumos/illumos-gate/commit/b1be289 Porting Notes: The metaslab_fragmentation() hunk was dropped from this patch because it was already resolved by commit 8b0a084. The comment modified in metaslab.c was updated to use the correct variable name, space_map_blksz. The upstream commit incorrectly used space_map_blksize. Ported by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2697
* Illumos 4958 zdb trips assert on pools with ashift >= 0xeAlex Reece2014-10-238-46/+131
| | | | | | | | | | | | | | | | | | | | | | 4958 zdb trips assert on pools with ashift >= 0xe Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Max Grossman <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Christopher Siden <[email protected]> Approved by: Garrett D'Amore <[email protected]> References: https://www.illumos.org/issues/4958 https://github.com/illumos/illumos-gate/commit/2a104a5 Porting notes: Keep the ZIO_FLAG_FASTWRITE define. This is for a feature present in Linux but not yet in *BSD. Ported by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2697
* Fix zdb segfaultBrian Behlendorf2014-10-231-0/+2
| | | | | | | | | | | | On 32-bit systems setting 'zfs_arc_max = 256M' in zdb results in the following segmentation fault. Rather than reverting 0ec0724 which introduced this flaw this code is only used for 64-bit builds. Segmentation fault (core dumped) ztest: '/sbin/zdb -bcc -d -U /var/tmp/zpool.cache ztest' exit code 139 child exited with code 3 Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 5169-5171 - zdb fixesMatthew Ahrens2014-10-231-3/+21
| | | | | | | | | | | | | | | | | | | | | 5169 zdb should limit its ARC size 5170 zdb -c should create more scrub i/os by default 5171 zdb should print status while loading metaslabs for leak detection Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Bayard Bell <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://www.illumos.org/issues/5169 https://www.illumos.org/issues/5170 https://www.illumos.org/issues/5171 https://github.com/illumos/illumos-gate/commit/06be980 Ported by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2707
* Illumos 5178 - zdb -vvvvv on old-format pool fails in dump_deadlist()Matthew Ahrens2014-10-231-0/+5
| | | | | | | | | | | | | | | | | | | 5178 zdb -vvvvv on old-format pool fails in dump_deadlist() Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Richard Lowe <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Alek Pinchuk <[email protected]> Approved by: Garrett D'Amore <[email protected]> References: https://www.illumos.org/issues/5178 https://github.com/illumos/illumos-gate/commit/90c76c6 Ported by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2713
* Fix zpool create -t ENOENT bug.ilovezfs2014-10-231-2/+4
| | | | | | | | | | | In userland we need to switch over to the temporary name once the pool has been created, otherwise the root dataset won't mount and the error "cannot open 'the_real_name': dataset does not exist" is printed. Signed-off-by: ilovezfs <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2760
* Handle block pointers with a corrupt logical sizeBrian Behlendorf2014-10-233-14/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | The general strategy used by ZFS to verify that blocks are valid is to checksum everything. This has the advantage of being extremely robust and generically applicable regardless of the contents of the block. If a blocks checksum is valid then its contents are trusted by the higher layers. This system works exceptionally well as long as bad data is never written with a valid checksum. If this does somehow occur due to a software bug or a memory bit-flip on a non-ECC system it may result in kernel panic. One such place where this could occur is if somehow the logical size stored in a block pointer exceeds the maximum block size. This will result in an attempt to allocate a buffer greater than the maximum block size causing a system panic. To prevent this from happening the arc_read() function has been updated to detect this specific case. If a block pointer with an invalid logical size is passed it will treat the block as if it contained a checksum error. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2678
* Remove checks for mandatory locksNed Bass2014-10-222-28/+0
| | | | | | | | | | | | | The Linux VFS handles mandatory locks generically so we shouldn't need to check for conflicting locks in zfs_read(), zfs_write(), or zfs_freesp(). Linux 3.18 removed the lock_may_read() and lock_may_write() interfaces which we were relying on for this purpose. Rather than emulating those interfaces we remove the redundant checks. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2804
* Illumos 5162 - zfs recv should use loaned arc buffer to avoid copyMatthew Ahrens2014-10-212-18/+38
| | | | | | | | | | | | | | | | | | | | | 5162 zfs recv should use loaned arc buffer to avoid copy Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Bayard Bell <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Garrett D'Amore <[email protected]> References: https://www.illumos.org/issues/5162 https://github.com/illumos/illumos-gate/commit/8a90470 Porting notes: Fix spelling error 's/arena/area/' in dmu.c. In restore_write() declare bonus and abuf at the top of the function. Ported by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2696