summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Linux 3.19 compat: file_inode was addedJörg Thalheim2015-02-104-2/+35
| | | | | | | | | struct access f->f_dentry->d_inode was replaced by accessor function file_inode(f) Signed-off-by: Joerg Thalheim <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3084
* Use vmem_alloc() for nvlistsBrian Behlendorf2015-02-103-8/+9
| | | | | | | | | | | | | | | | | | Several of the nvlist functions may perform allocations larger than the 32k warning threshold. Convert them to use vmem_alloc() so the best allocator is used. Commit efcd79a retired KM_NODEBUG which was used to suppress large allocation warnings. Concurrently the large allocation warning threshold was increased from 8k to 32k. The goal was to identify the remaining locations, such as this one, where the allocation can be larger than 32k. This patch is expected fine tuning resulting for the kmem-rework changes, see commit 6e9710f. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3057 Closes #3079 Closes #3081
* Revert "Don't read space maps during import for readonly pools"Brian Behlendorf2015-02-092-9/+2
| | | | | | | | | | This reverts commit 7fc8c33ede10f7104ca0e91d690d3ebb5236887b which accidentally introduced a ztest failure. ztest: '/usr/sbin/zdb -bcc -d -U /var/tmp/zpool.cache ztest' exit code 2 child exited with code 3 Signed-off-by: Brian Behlendorf <[email protected]>
* Produce a full snapshot list for zfs send -pTim Chase2015-02-091-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | In order to accelerate zfs receive operations in the face of many property-containing snapshots, commit 0574855 changed the header nvlist ("fss") of a send stream to exclude snapshots which aren't part of the stream. This, however, would cause zfs receive -F to erroneously remove snapshots; it would remove any snapshot which wasn't listed in the header nvlist. This patch restores the full list of snapshots in fss[<id>[snaps]] but still suppresses the properties of non-sent snapshots and also removes a consistency check in which an error is raised if a listed snapshot does not have any properties in fss[<id>[snapprops]]. The 0574855 commit also introduced a bug in which zfs send -p of a complete stream (zfs send -p pool/fs@snap) would exclude the snapshot properties in fss[<id>[snapprops]]. This patch detects the last snapshot in a series when no "from" snapshot has been specified and includes its properties. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2907
* Don't read space maps during import for readonly poolsBrian Behlendorf2015-02-092-2/+9
| | | | | | | | | | | | | | | | | | Normally when importing a pool the space maps for all top level vdevs are read from disk. The space maps will be required latter when an allocation is performed and free blocks need to be located. However, if the pool is imported readonly then we are guaranteed that no allocations can occur. In this case the space maps need not be loaded.. A similar argument can be made for the DTLs (dirty time logs). Because a pool import will fail if the space maps cannot be read. The ability to safely ignore them makes it more likely that a damaged pool can be imported readonly to recover its contents. Signed-off-by: Brian Behlendorf <[email protected]> Issue #2831
* Fix Dracut scripts to allow for blanks in pool and dataset namesLukas Wunner2015-02-092-4/+16
| | | | | | | | | The ability to use blanks is documented in zpool(8) and implemented in module/zcommon/zfs_namecheck.c:valid_char(). Signed-off-by: Lukas Wunner <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3083
* Fix loop in Dracut shutdown scriptLukas Wunner2015-02-091-1/+1
| | | | | | | | | | | The shell executes each command of a pipeline in a subshell, thus $ret always had the same value after the while loop that it had before the loop (http://mywiki.wooledge.org/BashFAQ/024), signaling success even if some of the zpools could not be exported. Signed-off-by: Lukas Wunner <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3083
* Illumos 5311 - traverse_dnode may report success when it should notJustin T. Gibbs2015-02-061-1/+1
| | | | | | | | | | | | | | | | 5311 traverse_dnode may report success when it should not Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Andriy Gapon <[email protected]> Reviewed by: Will Andrews <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://github.com/illumos/illumos-gate/commit/2a89c2c https://www.illumos.org/issues/5311 Ported by: DHE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2970
* Fix SA header size accountingNed Bass2015-02-061-41/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The functions sa_find_sizes() and sa_build_layout() fail to account for the additional 2 bytes of SA header space when calculating whether a variable size attribute might spill over. They may consequently determine that an attribute will fit in the bonus buffer along with a spill block pointer, when in reality the attribute would be partially overwritten by the spill block pointer if spill over occurs. This also causes an inconsistency between the SA header size and the number of variable size attributes in the layout, tripping an assertion when debugging is on. The following reproducer demonstrates the problem. ln -s $(perl -e 'print "z" x 20') file setfattr -h -n trusted.foo -v $(perl -e 'print "z" x 200') file Even though sa_find_sizes() computes the index of the attribute where spill-over will occur, sa_build_layouts() discards the result and recomputes it itself. As it turns out, both functions get it wrong. Since this computation is awkward and, as history has shown, easy to screw up, let's just do it in one place. This patch fixes the bug in sa_find_sizes() and updates sa_build_layout() to use the result computed there. Also improve the comments in sa_find_sizes(). Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #3070
* Skip evicting dbufs when walking the dbuf hashBrian Behlendorf2015-02-061-3/+5
| | | | | | | | | | When a dbuf is in the DB_EVICTING state it may no longer be on the dn_dbufs list. In which case it's unsafe to call DB_DNODE_ENTER. Therefore, any dbuf which is found in this safe must be skipped. Signed-off-by: Brian Behlendorf <[email protected]> Closes #2553 Closes #2495
* Fix build error when make debChunwei Chen2015-02-061-0/+1
| | | | | | | | | | | | | After 53698a4, the following error occurs when make deb. CCLD zed ../../lib/libzfs/.libs/libzfs.so: undefined reference to `get_system_hostid' Add libzpool.la to zed/Makefile.am to fix this Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3080
* Read spl_hostid module parameter before gethostid()Chunwei Chen2015-02-044-3/+28
| | | | | | | | | | If spl_hostid is set via module parameter, it's likely different from gethostid(). Therefore, the userspace tool should read it first before falling back to gethostid(). Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3034
* Spurious ENOMEM returns when reading dbufs kstatTim Chase2015-02-041-2/+6
| | | | | | | | | | | | | Commit 7b2d78a046aa4695d434478a439a9438521d73af fixed some improper uses of snprintf(), however, in __dbuf_stats_hash_table_data() the return value of snprintf is propagated to the caller. This caused spurious ENOMEM errors when reading the dbufs kstat. This commit causes the actual number of characters written to be returned. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3072
* fix l2arc compression buffers leakavg2015-02-031-10/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit log from FreeBSD: We have observed that arc_release() can be called concurrently with a l2arc in-flight write. Also, we have observed that arc_hdr_destroy() can be called from arc_write_done() for a zio with ZIO_FLAG_IO_REWRITE flag in similar circumstances. Previously the l2arc headers would be freed while leaking their associated compression buffers. Now the buffers are placed on l2arc_free_on_write list for delayed freeing. This is similar to what was already done to arc buffers that were supposed to be freed concurrently with in-flight writes of those buffers. In addition to fixing the discovered leaks this change also adds some protective code to assert that a compression buffer associated with a l2arc header is never leaked. A new kstat l2_cdata_free_on_write is added. It keeps a count of delayed compression buffer frees which previously would have been leaks. Tested by: Vitalij Satanivskij <[email protected]> et al Requested by: many MFC after: 2 weeks Sponsored by: HybridCluster / ClusterHQ References: https://illumos.org/issues/5222 https://github.com/freebsd/freebsd/commit/b98f85d http://thread.gmane.org/gmane.os.freebsd.current/155757/focus=155781 http://lists.open-zfs.org/pipermail/developer/2014-January/000455.html http://lists.open-zfs.org/pipermail/developer/2014-February/000523.html Ported-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3029
* Use zio buffers in zil_itx_create()Brian Behlendorf2015-02-021-10/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The zil_itx_create() function uses the vmem_alloc() allocator for its buffers because when logging a write that buffer may be as large as 64K. This is non-optimal because we may need to allocate many of of these buffers and this interface has the potential to be slow. Instead, use zio_data_buf_alloc() which is specifically designed to be able to efficiently allocate a wide range of buffer sizes. In addition, do some cleanup and use the zil_itx_destroy() function to always free an itx structure. This way we're always sure the right allocation functions are used. Notice that in the current code kmem_free() and vmem_free() were both used. This happened to work because these wrappers map to the same internal SPL function. This was identified as a potential problem when a low-end memory constrained system began logging the following warnings. There was no deadlock here just repeated allocation failures resulting in increased latency. Possible memory allocation deadlock: size=65792 lflags=0x42d0 Pid: 20118, comm: kvm Tainted: P O 3.2.0-0.bpo.4-amd64 Call Trace: [<ffffffffa040b834>] ? spl_kmem_alloc_impl+0x115/0x127 [spl] [<ffffffffa040b84f>] ? spl_kmem_alloc_debug+0x9/0x36 [spl] [<ffffffffa05d8a0b>] ? zil_itx_create+0x2d/0x59 [zfs] [<ffffffffa05c71e6>] ? zfs_log_write+0x13a/0x2f0 [zfs] [<ffffffffa05d41bc>] ? zfs_write+0x85b/0x9bb [zfs] [<ffffffffa05e37ec>] ? zpl_aio_write+0xca/0x110 [zfs] [<ffffffff811088e5>] ? do_sync_readv_writev+0xa3/0xde [<ffffffff81108f41>] ? do_readv_writev+0xaf/0x125 [<ffffffff81109055>] ? sys_pwritev+0x55/0x9a [<ffffffff813721d2>] ? system_call_fastpath+0x16/0x1b Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #3059
* Cleanup _zed_event_add_nvpair()Chris Dunlap2015-01-301-232/+263
| | | | | | | | | | | | | | | | | | When _zed_event_add_var() was updated to be the common routine for adding zedlet environment variables, an additional snprintf() was added to the processing of each nvpair. This commit changes _zed_event_add_nvpair() to directly call _zed_event_add_var() for nvpair non-array types, thereby removing a superfluous call to snprintf(). For consistency, the helper functions for converting nvpair array types are similarly adjusted to add variables. The _zed_event_value_is_hex() and _zed_event_add_var() functions have been moved up in the file since forward declarations are not used, but no changes have been made to these functions. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3042
* Protect against adding duplicate strings in ZEDChris Dunlap2015-01-304-151/+243
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The zed_strings container stores strings in an AVL, but does not check for duplicate strings being added. Within the AVL, strings are indexed by the string value itself. avl_add() requires the node being added must not already exist in the tree, and will assert() if this is not the case. This should not cause problems in practice. ZED uses this container in two places. In zed_conf.c, it is used to store the names of enabled zedlets as zed scans the zedlet directory listing; duplicate entries cannot occur here since duplicate names cannot occur within a directory. In zed_event.c, it is used to store the environment variables (as "NAME=VALUE" strings) that will be passed to zedlets; duplicate strings here should never happen unless there is a bug resulting in a duplicate nvpair or environment variable. This commit protects against adding a duplicate to a zed_strings container by first checking for the string being added, and removing the previous entry should one exist. This implements a "last one wins" policy. This commit also changes the prototype for zed_strings_add() to allow the string key (by which it is indexed in the AVL) to differ from the string value. By adding zedlet environment variables using the variable name as the key, multiple adds for the same variable name will result in only the last value being stored. Finally, this commit routes all additions of zedlet environment variables through the updated _zed_event_add_var(). This ensures all zedlet environment variable names are properly converted. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3042
* Handle closing an unopened ZVOLBrian Behlendorf2015-01-301-5/+5
| | | | | | | | | | | | | | | Thank to commit a4430fce691d492aec382de0dfa937c05ee16500 we're now correctly returning EROFS when opening a zvol on a read-only pool. Unfortunately, it looks like this causes us to trigger some unexpected behavior by __blkdev_get(). In the failure case it's possible __blkdev_get() will call __blkdev_put() for a bdev which was never successfully opened. This results in us trying to close the device again and hitting the NULL dereference. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1343
* Add zvol_open() error handling for readonly propertyBrian Behlendorf2015-01-301-1/+4
| | | | | | | | Rather than ASSERT when for some reason the readonly property of a zvol can't be read cleanly handle the failure. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1343
* Use (void) memcpy(), not (void *) memcpy()Richard Yao2015-01-301-3/+3
| | | | | | | | | This was caught by Clang. Clearly the intent of this code was to explicitly ignore the return value. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3054
* Make `zpool import -d|-c` behave consistentlyBrian Behlendorf2015-01-281-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When importing pools with zpool import -aN there is inconsistent behavior between '-d /dev/disk/by-id' (or another path) and '-c /etc/zfs/zpool.cache'. The difference in behavior is caused by zpool_find_import_cached() returning an empty nvlist_t when there are no pools to import but zpool_find_import_impl() returns NULL for the same situation. The behavior of zpool_find_import_cached() is arguably more correct because it allows returning NULL to be used for an error case and not an empty set. This change resolves the issue by updating get_configs() such that it returns an empty set instead of NULL when no config is found. The updated behavior will now always return 0 for this case. $ zpool import -aN; echo $? no pools available to import 0 $ zpool import -aN -d /var/tmp/; echo $? no pools available to import 0 $ zpool import -aN -c /etc/zfs/zpool.cache; echo $? no pools available to import 0 Signed-off-by: Brian Behlendorf <[email protected]> Closes #2080
* Merge branch 'arc_summary_draft_v2'Brian Behlendorf2015-01-282-0/+1154
|\ | | | | | | | | | | | | | | | | | | | | Add a port of arc_summary.py to ZFS on Linux, arc_summary.py is a standard tool in FreeBSD and Illumos. The version of the script used for the port originally came from FreeNAS. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Kyle Blatter <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #2920
| * Replace sysctl summary with tunables summary.Kyle Blatter2015-01-281-66/+57
| | | | | | | | | | | | | | | | | | | | | | The original script displayed tunable parameters using sysctl calls. This patch modifies this by displaying tunable parameters found in /sys/modules/zfs/parameters/. modinfo calls are used to capture descriptions. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Kyle Blatter <[email protected]> Signed-off-by: Ned Bass <[email protected]>
| * Force all lines to be 80 columnsKyle Blatter2015-01-281-33/+66
| | | | | | | | | | | | | | | | | | Ensure this script conforms to the projects style guidelines by limiting line length to 80 columns. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Kyle Blatter <[email protected]> Signed-off-by: Ned Bass <[email protected]>
| * Add a help option with usage informationKyle Blatter2015-01-281-4/+25
| | | | | | | | | | | | | | | | | | | | Add a basic help option and usage description which is consistent with arcstat.py and dbufstat.py. This also adds support for long opts. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Kyle Blatter <[email protected]> Signed-off-by: Ned Bass <[email protected]>
| * Refactor arc_summary to simplify -p processingKyle Blatter2015-01-281-23/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The -p option is used to specify a specific page of output to be displayed. If omitted, all output pages would be displayed. arc_summary, as it stood, had really kludgy processing code for processing the -p option. It relied on a try-except block which was treated as an if statement and in normal operation would fail any time a user didn't specify the -p option on the command line. When the exception was thrown, the script would then display all output pages. This happened whether the -p option was omitted or malformed. Thus, in the principle use case, an exception would be raised in order to run the script as desired. The same except code would be called regardless of the exception, however, and malformed -p arguments would also cause the script to execute. Additionally, this required the function which handles the case where all output pages were to be displayed, _call_all, to be potentially called from several locations within main. This commit refactors the option processing code to simplify it and make it easier to catch runtime errors in the script. This is done by specializing the try-except block to only have an exception when the -p argument is malformed. When the -p option is correctly selected by the user, it calls a function in the unSub array directly, which will only display one page of output. Finally in the context of this refactoring the page breaks have been removed. Pages seem to have been added into the output in the FreeNAS version of the script. This patch removes pages from the output to more closely resemble the freebsd version of the script. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Kyle Blatter <[email protected]> Signed-off-by: Ned Bass <[email protected]>
| * Modified arc_summary.py to run on linuxcburroughs2015-01-281-287/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | 1) Comment out stat sections whose kstats are not currently available 2) Port most of arc_summary to use spl kstats 3) Enable l2arc stats 5) Include compressed l2size 4) Minor style fixes / cleanup Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: cburroughs <[email protected]> Signed-off-by: Kyle Blatter <[email protected]> Signed-off-by: Ned Bass <[email protected]>
| * Add arc_summary.py from FreeNAScburroughs2015-01-282-0/+1378
|/ | | | | | | | | | | | | | | | | | | | | | | | | | The arc_summary script is a useful utility for administrators on other ZFS platforms. It provides a quick and easy way to get a high level view of the current ARC state. Historically this was a perl script but it was rewritten in python for FreeNAS. We've decided to adopt the python version instead of the perl version for a few reasons. 1) ZoL has no existing perl dependencies, but it does have a python dependency for scripts such as arcstat.py and dbufstat.py. Using python for arc_summary.py helps us minimize dependencies. 2) Most major Linux distributions already depend heavily on python for their core infrastructure. This means it's very likely to be available even very early in the boot process. Original source: https://github.com/freenas/freenas/blob/master/gui/tools/arc_summary.py Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: cburroughs <[email protected]> Signed-off-by: Kyle Blatter <[email protected]> Signed-off-by: Ned Bass <[email protected]>
* Fix removal of SA in sa_modify_attrs()Tim Chase2015-01-211-3/+1
| | | | | | | | | | | | | | The sa_modify_attrs() function can add, remove or replace an SA. The main loop in the function uses the index "i" to iterate over the existing SAs and uses the index "j" for writing them into a new buffer via SA_ADD_BULK_ATTR(). The write index, "j" is incremented on remove (SA_REMOVE) operations which leads to a corruption in the new SA buffer. This patch remove the increment for SA_REMOVE operations. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #3028
* Use kmem_vasprintf() in log_internal()Richard Yao2015-01-211-10/+2
| | | | | | | | | | | | | | | | | An attempt to debug zfsonlinux/zfs#2781 revealed that this code could be simplified by using kmem_asprintf(). It is not clear that switching to kmem_asprintf() addresses zfsonlinux/zfs#2781. However, switching to kmem_asprintf() is cleanup that simplifies debugging such that it would become clear that this is a bug in glibc should the issue persist. It also brings this function almost back in sync with Illumos. This was possible due to the recently reworked kmem code which allows us to use KM_SLEEP in the same fashion as Illumos. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2791 Issue #2781
* Linux 3.12 compat: split shrinker has s_shrinkTim Chase2015-01-202-6/+10
| | | | | | | | | | | | The split count/scan shrinker callbacks introduced in 3.12 broke the test for HAVE_SHRINK, effectively disabling the per-superblock shrinkers. This patch re-enables the per-superblock shrinkers when the split shrinker callbacks have been detected. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2975
* Merge branch 'kmem-rework'Brian Behlendorf2015-01-1674-447/+369
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The core motivation behind these changes is to minimize the memory management differences between ZFS on Linux and other platforms. This simplifies the process of porting changes to Linux from other platforms. This is good for code quality and is expected to reduce the number of defects accidentally introduced due to porting. The following key Linux specific changes have been reverted. * KM_PUSHPAGE changed back to KM_SLEEP. All contexts where it is unsafe to perform IO have been marked with PF_FSTRANS. This context specific mechanism is now used exclusively and the KM_PUSHPAGE mechanism has been retired. * The KM_NODEBUG flag has been retired. Allocations larger than 32K should use vmem_alloc()/vmem_free(). Depending on the size of the allocation either kmalloc() or vmalloc() will be used internally, but no warning will be printed. * Pre-allocated vdev IO buffers and the dedicated SA spill block cache have been retired. It is now safe and reliable to allocate buffers of the needed size without fear of deadlocking. This reduces our memory footprint and paves the way for larger block sizes. Depends on zfsonlinux/spl#414. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #2918
| * Revert "SA spill block cache"Brian Behlendorf2015-01-164-29/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SA spill_cache was originally introduced to avoid the need to perform large kmem or vmem allocations. Instead a small dedicated cache of preallocated SA buffers was kept. This solution was viable while the maximum block size was limited to 128K. But with the planned increase of the maximum block size to 16M callers need to migrate to the zio_buf_alloc(). However, they should be aware this interface is expected to change again once the zio buffers are fully backed by scatter-gather lists. Alternately, if the callers know these buffers will never be large or be infrequently accessed they may kmem_alloc() or vmem_alloc() the needed temporary space. This change has the additional benegit of bringing the code back inline with the upstream Illumos source. Signed-off-by: Brian Behlendorf <[email protected]>
| * Revert "Pre-allocate vdev I/O buffers"Brian Behlendorf2015-01-164-72/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 86dd0fd added preallocated I/O buffers. This is no longer required after the recent kmem changes designed to make our memory allocation interfaces behave more like those found on Illumos. A deadlock in this situation is no longer possible. However, these allocations still have the potential to be expensive. So a potential future optimization might be to perform then KM_NOSLEEP so that they either succeed of fail quicky. Either case is acceptable here because we can safely abort the aggregation. Signed-off-by: Brian Behlendorf <[email protected]>
| * Add kmem_cache.h include to default contextBrian Behlendorf2015-01-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | As part of the spl kmem/vmem refactoring the kmem_cache_* functions were split in to their own kmem_cache.h header. This was done in part so that kmem_* consumers would not be forced to include the kmem_cache_* functions which mask several Linux SLAB/SLAB functions. Because of this we now much explicitly include kmem_cache.h in the zfs_context.h. However, consumers such as Lustre which need access to the KM_FLAGS but not the kmem_cache_* functions can now safely just include kmem.h. Signed-off-by: Brian Behlendorf <[email protected]>
| * Change KM_PUSHPAGE -> KM_SLEEPBrian Behlendorf2015-01-1665-281/+269
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By marking DMU transaction processing contexts with PF_FSTRANS we can revert the KM_PUSHPAGE -> KM_SLEEP changes. This brings us back in line with upstream. In some cases this means simply swapping the flags back. For others fnvlist_alloc() was replaced by nvlist_alloc(..., KM_PUSHPAGE) and must be reverted back to fnvlist_alloc() which assumes KM_SLEEP. The one place KM_PUSHPAGE is kept is when allocating ARC buffers which allows us to dip in to reserved memory. This is again the same as upstream. Signed-off-by: Brian Behlendorf <[email protected]>
| * Retire KM_NODEBUGBrian Behlendorf2015-01-1617-30/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Callers of kmem_alloc() which passed the KM_NODEBUG flag to suppress the large allocation warning have been replaced by vmem_alloc() as appropriate. The updated vmem_alloc() call will not print a warning regardless of the size of the allocation. A careful reader will notice that not all callers have been changed to vmem_alloc(). Some have only had the KM_NODEBUG flag removed. This was possible because the default warning threshold has been increased to 32k. This is desirable because it minimizes the need for Linux specific code changes. Signed-off-by: Brian Behlendorf <[email protected]>
| * Use is_vmalloc_addr() in vdev_disk.cRichard Yao2015-01-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The initial port of ZFS to Linux required a way to identify virtual memory to make IO to virtual memory backed slabs work, so kmem_virt() was created. Linux 2.6.25 introduced is_vmalloc_addr(), which is logically equivalent to kmem_virt(). Support for kernels before 2.6.26 was later dropped and more recently, support for kernels before Linux 2.6.32 has been dropped. We retire kmem_virt() in favor of is_vmalloc_addr() to cleanup the code. Signed-off-by: Brian Behlendorf <[email protected]>
| * Mark IO pipeline with PF_FSTRANSBrian Behlendorf2015-01-167-45/+69
|/ | | | | | | | | | In order to avoid deadlocking in the IO pipeline it is critical that pageout be avoided during direct memory reclaim. This ensures that the pipeline threads can always make forward progress and never end up blocking on a DMU transaction. For this very reason Linux now provides the PF_FSTRANS flag which may be set in the process context. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix zfs_putpage() lock inversion (again)Brian Behlendorf2015-01-081-2/+15
| | | | | | | | | | | | | | | This is a follow up commit to 74328ee which correctly resolved a lock inversion between zfs_putpage() and zfs_free_range(). Unfortunately, in the process it accidentally introduced another inversion between zfs_putpage() and zfs_read(). The page must be unlocked before taking the range lock. This patch corrects that issue. In addition, because the locking rules here are subtle a block comment has been added clearly explaining why the ordering here is critical. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Issue #2976
* Document zfs_flags module parameterNed Bass2015-01-072-3/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a table describing the debugging flags that can be set in the zfs_flags module parameter. Also change the module_param type to 'uint' so users aren't shown a negative value. The updated man page text is reproduced below for convenience. zfs_flags (int) Set additional debugging flags. The following flags may be bitwise-or'd together. +-------------------------------------------------------+ |Value Symbolic Name | | Description | +-------------------------------------------------------+ | 1 ZFS_DEBUG_DPRINTF | | Enable dprintf entries in the debug log. | +-------------------------------------------------------+ | 2 ZFS_DEBUG_DBUF_VERIFY * | | Enable extra dbuf verifications. | +-------------------------------------------------------+ | 4 ZFS_DEBUG_DNODE_VERIFY * | | Enable extra dnode verifications. | +-------------------------------------------------------+ | 8 ZFS_DEBUG_SNAPNAMES | | Enable snapshot name verification. | +-------------------------------------------------------+ | 16 ZFS_DEBUG_MODIFY | | Check for illegally modified ARC buffers. | +-------------------------------------------------------+ | 32 ZFS_DEBUG_SPA | | Enable spa_dbgmsg entries in the debug log. | +-------------------------------------------------------+ | 64 ZFS_DEBUG_ZIO_FREE | | Enable verification of block frees. | +-------------------------------------------------------+ | 128 ZFS_DEBUG_HISTOGRAM_VERIFY | | Enable extra spacemap histogram verifications. | +-------------------------------------------------------+ * Requires debug build. Default value: 0. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2988
* Don't use AC_LANG_SOURCE for conftest.h sourceNed Bass2015-01-061-1/+1
| | | | | | | | | | | | | | | Using AC_LANG_SOURCE with some versions of autoconf is problematic if the given source is to be written to a header file. Such versions assume the contents are to be written to conftest.c and generate shell code to that effect. The contents of the test program to detect support for Linux tracepoints were consequently malformed (containing the source for conftest.h) so the build system incorrectly disabled tracepoints support. Fix this in ZFS_LINUX_TRY_COMPILE_HEADER by passing the header source directly to ZFS_LINUX_COMPILE_IFELSE. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2953
* Remove duplicate typedefs from trace.hNed Bass2015-01-0622-984/+1383
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Older versions of GCC (e.g. GCC 4.4.7 on RHEL6) do not allow duplicate typedef declarations with the same type. The trace.h header contains some typedefs to avoid 'unknown type' errors for C files that haven't declared the type in question. But this causes build failures for C files that have already declared the type. Newer versions of GCC (e.g. v4.6) allow duplicate typedefs with the same type unless pedantic error checking is in force. To support the older versions we need to remove the duplicate typedefs. Removal of the typedefs means we can't built tracepoints code using those types unless the required headers have been included. To facilitate this, all tracepoint event declarations have been moved out of trace.h into separate headers. Each new header is explicitly included from the C file that uses the events defined therein. The trace.h header is still indirectly included form zfs_context.h and provides the implementation of the dprintf(), dbgmsg(), and SET_ERROR() interfaces. This makes those interfaces readily available throughout the code base. The macros that redefine DTRACE_PROBE* to use Linux tracepoints are also still provided by trace.h, so it is a prerequisite for the other trace_*.h headers. These new Linux implementation-specific headers do introduce a small divergence from upstream ZFS in several core C files, but this should not present a significant maintenance burden. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2953
* Fix zfs_putpage() lock inversionBrian Behlendorf2014-12-221-3/+2
| | | | | | | | | | | | | | | | | | | | | There exists a lock inversions involving the zfs range lock and the individual page writeback bits which can result in a deadlock. To prevent this we must always manipulate the writeback bit while holding the range lock. The exact deadlock is as follows: ------ Process A ------ ------ Process B ------ zpl_writepages zpl_fallocate write_cache_pages zpl_fallocate_common zpl_putpage zfs_space zfs_putpage (set bit) zfs_freesp zfs_range_lock (wait on lock) zfs_free_range (take lock) [has not yet initiated I/O, truncate_inode_pages_range the bit will not be cleared] wait_on_page_writeback (wait on bit) Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Richard Yao <[email protected]> Issue #2976
* vdev_id: use mawk-compatible regular expressionNed Bass2014-12-191-1/+1
| | | | | | | | | | | | | Slot mapping in vdev_id doesn't work on systems using mawk as the 'awk' alternative. A regular expression in map_slot() contains an unquoted empty string following the alternation (|) operator, which results in an "missing operand" error with mawk. The solution is to rearrange the expression so the alternation has two operands. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes zfsonlinux/pkg-zfs#136 Closes zfsonlinux/zfs#2965
* Fix cstyle issue from c66989bBrian Behlendorf2014-12-191-2/+2
| | | | | | | Commit c66989b accidentally introduced a cstyle issue which went unnoticed. This tiny patch corrects that oversight. Signed-off-by: Brian Behlendorf <[email protected]>
* Correct error returns to unify cross-pool operation error handlingBoris Protopopov2014-12-191-2/+2
| | | | | | Signed-off-by: Boris Protopopov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2911
* Fix typo in %post scriptlet linesAndy Bakun2014-12-181-4/+4
| | | | | | | | | Missing space made the %post directive be part of the package %description and not have a %post scriptlet defined. Signed-off-by: Andy Bakun <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2961
* zpool upgrade return errors to stderr instead of stdoutJacek Fefliński2014-12-181-2/+2
| | | | | | Signed-off-by: Jacek Feflinski <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2955
* Improve systemd script to not leave stale sharetabDan Swartzendruber2014-12-181-0/+1
| | | | | | | | | | The systemd script zfs-share.service does 'zfs share -a' to share any required datasets. Unfortunately, /etc/dfs/sharetab is stale from the previous boot. Delete it before we share. Signed-off-by: Dan Swartzendruber <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2883