aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Illumos 5134 - if ZFS_DEBUG or debug= is set, libzpool should enable debug ↵Matthew Ahrens2015-04-283-0/+7
| | | | | | | | | | | | | | | | | | | | | | prints 5134 if ZFS_DEBUG or debug= is set, libzpool should enable debug prints Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/projects/illumos-gate/issues/5134 https://github.com/illumos/illumos-gate/commit/7fa49ea Porting notes: Added dprintf_setup() to main in zfs_main.c and zpool_main.c. Ported by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2669
* Fix misuse of input argument in traverse_visitbptuxoko2015-04-281-15/+17
| | | | | | | | | | | | | | | | | | In traverse_visitbp(), the input argument dnp is modified in the middle to point to a temporary buffer. Originally this doesn't matter, because no user of TRAVERSE_POST dereferences it. However, in fbeddd6 a piece of code is added dereferencing dnp after the modification, creating a possible bug. We fix this by creating a new local variable cdnp for the DMU_OT_DNODE case, so we don't modify the input argument. Also we introduce different local variables in the DMU_OT_OBJSET case to prevent confusion between the input argument. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2060
* Merge branch 'zed-pushbullet'Brian Behlendorf2015-04-2719-441/+911
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch stack begins with cleaning up the existing ZEDLETs, refactoring common code blocks into zed-functions.sh, adopting a more consistent coding style, updating exit codes, etc. All scripts now run cleanly through ShellCheck. The old "email" ZEDLETs are replaced with new "notify" ZEDLETs. A notification can now be sent via email and/or Pushbullet. Additional notification methods will likely be added in the future. Pushbullet notifications are enabled by setting the ZED_PUSHBULLET_ACCESS_TOKEN and (optionally) ZED_PUSHBULLET_CHANNEL_TAG in zed.rc. The Pushbullet implementation requires awk, curl, and sed executables to be installed in the standard PATH. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3327
| * Combine data-notify.sh with io-notify.shChris Dunlap2015-04-272-51/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The data-notify.sh ZEDLET serves a very similar purpose to io-notify.sh, namely, to generate a notification in response to a particular error event. Initially, data-notify.sh was separated from io-notify.sh since the "data" zevent does not (as I understand it) pertain to a specific vdev device. This stands in contrast to the "checksum" and "io" zevents (both handled by io-notify.sh) that can be attributed to a specific vdev. At the time, it seemed simpler to handle these two cases in separate scripts. This commit adds support for the "data" zevent to io-notify.sh, and symlinks io-notify.sh to data-notify.sh. It also adds the counts for vdev_read_errors, vdev_write_errors, and vdev_cksum_errors to the notification message. Signed-off-by: Chris Dunlap <[email protected]>
| * Add notification to io-spare.shChris Dunlap2015-04-271-4/+52
| | | | | | | | | | | | | | | | | | | | | | The io-spare.sh ZEDLET does not generate a notification when a failing device is replaced with a hot spare. Maybe it should tell someone. This commit adds a notification message to the io-spare.sh ZEDLET. This notification is triggered when a failing device is successfully replaced with a hot spare after encountering a checksum or io error. Signed-off-by: Chris Dunlap <[email protected]>
| * Add support for Pushbullet notificationsChris Dunlap2015-04-273-1/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds the zed_notify_pushbullet() function and hooks it into zed_notify(), thereby integrating it with the existing "notify" ZEDLETs. This enables ZED to push notifications to your desktop computer and/or mobile device(s). It is configured with the ZED_PUSHBULLET_ACCESS_TOKEN and ZED_PUSHBULLET_CHANNEL_TAG variables in zed.rc. https://www.pushbullet.com/ The Makefile install-data-local target has been replaced with install-data-hook. With the "-local" target, there is no particular guarantee of execution order. But with the zed.rc now potentially containing sensitive information (i.e., the Pushbullet access token), the recommended permissions have changed to 0600. The "-hook" target is always executed after the main rule's work is done; thus, the chmod will always take place after the zed.rc file has been installed. https://www.gnu.org/software/automake/manual/automake.html#Extending Signed-off-by: Chris Dunlap <[email protected]>
| * Replace "email" ZEDLETs with "notify" ZEDLETsChris Dunlap2015-04-2715-258/+311
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several ZEDLETs already exist for sending email in reponse to a particular zevent. While email is ubiquitous, alternative methods may be better suited for some configurations. Instead of duplicating the "email" ZEDLETs for every future notification method, it is preferable to abstract the notification method into a function. This has the added benefit of reducing the amount of code duplicated between ZEDLETs, and allowing related bugs to be fixed in a single location. This commit replaces the existing "email" ZEDLETs with corresponding "notify" ZEDLETs. In addition, the ZEDLET code for sending an email message has been moved into the zed_notify_email() function. And this zed_notify_email() has been added to a generic zed_notify() function for sending notifications via all available methods that have been configured. This commit also changes a couple of related zed.rc variables. ZED_EMAIL_INTERVAL_SECS is changed to ZED_NOTIFY_INTERVAL_SECS, and ZED_EMAIL_VERBOSE is changed to ZED_NOTIFY_VERBOSE. Note that ZED_EMAIL remains unchanged as its use is solely for the email notification method. Signed-off-by: Chris Dunlap <[email protected]>
| * Cleanup ZEDLETsChris Dunlap2015-04-2711-337/+630
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit factors out several common ZEDLET code blocks into zed-functions.sh. This shortens the length of the scripts, thereby (hopefully) making them easier to understand and maintain. In addition, this commit revamps the coding style used by the scripts to be more consistent and (again, hopefully) maintainable. It now mostly follows the Google Shell Style Guide. I've tried to assimilate the following resources: Google Shell Style Guide https://google-styleguide.googlecode.com/svn/trunk/shell.xml Dash as /bin/sh https://wiki.ubuntu.com/DashAsBinSh Filenames and Pathnames in Shell: How to do it Correctly http://www.dwheeler.com/essays/filenames-in-shell.html Common shell script mistakes http://www.pixelbeat.org/programming/shell_script_mistakes.html Finally, this commit updates the exit codes used by the ZEDLETs to be more consistent with one another. All scripts run cleanly through ShellCheck <http://www.shellcheck.net/>. All scripts have been tested on bash and dash. Signed-off-by: Chris Dunlap <[email protected]>
* Remove useless variable spa_active_countIsaac Huang2015-04-272-8/+2
| | | | | | | | | | This isn't required for the Linux port because the kernel tracks if a module is busy. The prototype for spa_busy() is also removed since its definition was already removed. Signed-off-by: Isaac Huang <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3262
* Memory leak in make_root_vdev()Isaac Huang2015-04-271-1/+3
| | | | | | | | | The newroot nvlist should be freed before returning. Signed-off-by: Isaac Huang <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3264
* 5313 Allow I/Os to be aggregated across ZIO priority classesJustin T. Gibbs2015-04-243-31/+51
| | | | | | | | | | | | | | | | Reviewed by: Andriy Gapon <[email protected]> Reviewed by: Will Andrews <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://www.illumos.org/issues/5313 https://github.com/illumos/illumos-gate/commit/fe319232 Ported-by: DHE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3280
* 5410 Document -S option to zfs inheritPaul B. Henson2015-04-242-6/+31
| | | | | | | | | | | | | | | 5410 Document -S option to zfs inherit 5412 Mention -S option when zfs inherit fails on quota Reviewed by: Matthew Ahrens <[email protected]> Approved by: Richard Lowe <[email protected]> References: https://www.illumos.org/issues/5410 https://github.com/illumos/illumos-gate/commit/5ff8cfa9 Ported-by: DHE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3279
* Serialize access to spa->spa_feat_stats nvlistNed Bass2015-04-243-1/+9
| | | | | | | | | | The function spa_add_feature_stats() manipulates the shared nvlist spa->spa_feat_stats in an unsafe concurrent manner. Add a mutex to protect the list. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3335
* align zfs_autoimport_disable manpage with realitycburroughs2015-04-241-1/+1
| | | | | | | | The default was changed in #2820. Signed-off-by: cburroughs <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3341
* Fix kernel panic due to tsd_exit in ZFS_EXIT(zsb)Chunwei Chen2015-04-242-1/+2
| | | | | | | | | | | | | | | | | | | | | | The following panic would occur under certain heavy load: [ 4692.202686] Kernel panic - not syncing: thread ffff8800c4f5dd60 terminating with rrw lock ffff8800da1b9c40 held [ 4692.228053] CPU: 1 PID: 6250 Comm: mmap_deadlock Tainted: P OE 3.18.10 #7 The culprit is that ZFS_EXIT(zsb) would call tsd_exit() every time, which would purge all tsd data for the thread. However, ZFS_ENTER is designed to be reentrant, so we cannot allow ZFS_EXIT to blindly purge tsd data. Instead, we rely on the new behavior of tsd_set. When NULL is passed as the new value to tsd_set, it will automatically remove the tsd entry specified the the key for the current thread. rrw_tsd_key and zfs_allow_log_key already calls tsd_set(key, NULL) when they're done. The zfs_fsyncer_key relied on ZFS_EXIT(zsb) to call tsd_exit() to do clean up. Now we explicitly call tsd_set(key, NULL) on them. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3247
* Support the "version" property on volumes via the zfs_prop_get_int() APITim Chase2015-04-241-0/+3
| | | | | | | | | | | | | | | | | As of this commit, volumes do not possess the version property in any existing OpenZFS implementation. The zpool upgrade code, however, uses zfs_prop_get_int() to fetch the version property of all children in a pool. The semantics of the function, however, demand that it only be used for known valid properties so it returns a garbage value for volumes. This patch causes the version of a volume to appear to callers using the zfs_prop_get_int() API to be that of the default ZPL version for the implementation. In the future, should volumes gain the property, its actual value will be used. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Fixes #3313
* Extend PF_FSTRANS critical regionsBrian Behlendorf2015-04-242-6/+14
| | | | | | | | | | | | Additional testing has shown that the region covered by PF_FSTRANS needs to be extended to cover the zpl_xattr_security_init() and init_acl() functions. The zpl_mark_dirty() function can also recurse and therefore must always be protected. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #3331
* Fix formatting error in zfs(8)DHE2015-04-231-0/+2
| | | | | | | | | | Commit b1a3e93217e6e474e86345010469994c066cf875 accidentally introduced an intentation error between the 'zfs receive' and 'zfs allow' detailed documentation sections. Signed-off-by: DHE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3312
* Fix io-spare.sh to work with disk vdevsChris Dunlap2015-04-171-1/+1
| | | | | | | | | | | | | | | | | | The "zpool status" output shows the full pathname for file-type vdevs, but only the basename component for disk-type vdevs. In commit bee6665, the "basename" command was dropped from altering the vdev name used when searching the "zpool status" output. Consequently, hot-disk sparing for disk vdevs broke since "zpool status" output was now being searched for the full pathname to the disk vdev. Parsing the "zpool status" output in this manner is rather brittle. It would be preferable to search for the vdev based on its guid. But until that happens, this commit adds back the "basename" command to fix the vdev name breakage. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3310
* Mark additional functions as PF_FSTRANSBrian Behlendorf2015-04-174-4/+53
| | | | | | | | | Prevent deadlocks by disabling direct reclaim during all NFS, xattr, ctldir, and super function calls. This is related to 40d06e3. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Issue #3225
* Document bookmarks a little better in zfs(8)Turbo Fredriksson2015-04-141-0/+18
| | | | | | | | Add a basic summary to zfs(8) describing bookmarks. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3268
* Rebuild init scripts on source file updatesDHE2015-04-141-1/+1
| | | | | | | | | | The resulting script is not removed by 'make clean' or rebuilt when the source files are changed. Users with long standing git trees may find their init script is out of date. Signed-off-by: DHE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3273
* Allocate zfs_znode_cache on the Linux slabTim Chase2015-04-141-2/+4
| | | | | | | | | | | | | | | | | | | The Linux slab, in general, performs better than the SPl slab in cases where a lot of objects are allocated and fragmentation is likely present. This patch fixes pathologically bad behavior in cases where the ARC is filled with mostly metadata and a user program needs to allocate and dirty enough memory which would require an insignificant amount of the ARC to be reclaimed. If zfs_znode_cache is on the SPL slab, the system may spin for a very long time trying to reclaim sufficient memory. If it is on the Linux slab, the behavior has been observed to be much more predictible; the memory is reclaimed more efficiently. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3283
* Tag zfs-0.6.4zfs-0.6.4Brian Behlendorf2015-04-083-1/+5
| | | | | | META file and release log updated. Signed-off-by: Brian Behlendorf <[email protected]>
* Use vmem_alloc() in spa_config_write()Brian Behlendorf2015-04-071-2/+2
| | | | | | | | | The packed nvlist allocated in spa_config_write() may exceed the warning threshold for large configurations. Use the vmem interfaces for this short lived allocation. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3251
* Mark all ZPL and ioctl functions as PF_FSTRANSTim Chase2015-04-037-19/+96
| | | | | | | | | | | | Prevent deadlocks by disabling direct reclaim during all ZPL and ioctl calls as well as the l2arc and adapt ARC threads. This obviates the need for MUTEX_FSTRANS so its previous uses and definition have been eliminated. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3225
* Update zfs_pd_bytes_max default in zfs(8)Brian Behlendorf2015-03-311-1/+1
| | | | | | | Commit b738bc5 should have updated the default value of zfs_pd_bytes_max in the zfs(8) man page. The correct default value is 50*1024*1024. Signed-off-by: Brian Behlendorf <[email protected]>
* Illumus 5693 - ztest fails in dbuf_verify: buf[i] == 0, due to dedup and ↵Matthew Ahrens2015-03-271-2/+0
| | | | | | | | | | | | | | | | | | bp_override 5693 ztest fails in dbuf_verify: buf[i] == 0, due to dedup and bp_override Reviewed by: George Wilson <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Bayard Bell <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5693 https://github.com/illumos/illumos-gate/commit/7f7ace3 Ported-by: Chris Dunlop <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3231
* Illumos 5694 - traverse_prefetcher does not prefetch enoughGeorge Wilson2015-03-272-14/+13
| | | | | | | | | | | | | | | | | | 5694 traverse_prefetcher does not prefetch enough Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Alex Reece <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Josef 'Jeff' Sipek <[email protected]> Reviewed by: Bayard Bell <[email protected]> Approved by: Garrett D'Amore <[email protected]> References: https://www.illumos.org/issues/5694 https://github.com/illumos/illumos-gate/commit/34d7ce05 Ported-by: Chris Dunlop <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3230
* Align code with IllumosChris Dunlop2015-03-271-10/+9
| | | | | | | | | | | | | Align code in traverse_visitbp() with that in Illumos in preparation for applying Illumos-5694. No functional change: use a temporary variable pd to replace multiple occurrences of td->td_pfd. This increases our stack use slightly more then normal because the function is called recursively. Signed-off-by: Chris Dunlop <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3230
* Illumos 5695 - dmu_sync'ed holes do not retain birth timePrakash Surya2015-03-273-8/+23
| | | | | | | | | | | | | | | | | 5695 dmu_sync'ed holes do not retain birth time Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Bayard Bell <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5695 https://github.com/illumos/illumos-gate/commit/70163ac Ported-by: Chris Dunlop <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3229
* zpool import should honor overlay propertyNed Bass2015-03-272-14/+14
| | | | | | | | | | | | Make the 'zpool import' command honor the overlay property to allow filesystems to be mounted on a non-empty directory. As it stands now this property is only checked by the 'zfs mount' command. Move the check into 'zfs_mount()` in libzpool so the property is honored for all callers. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3227
* Add NULL guard in zfs_zrlock_class event classNed Bass2015-03-271-9/+9
| | | | | | | | | The owner field could be NULL in some cases, so add a guard. Shorten __entry field names to fit assignment statements in 80 columns. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Fixes #3220
* Add RHEL style kmod packagesBrian Behlendorf2015-03-278-2/+98
| | | | | | | | | | | | | | | Provide a Redhat specific zfs-kmod.spec file which uses the old style kmods (not kmods2) packaging. By using the provided kmodtool script packages can be built which support weak modules. This allows for the kernel to be updated without having to rebuild the ZFS kernel modules. Packages for RHEL/Centos/SL/TOSS which use this spec file can by built as follows: $ ./configure --with-spec=redhat $ make rpms Signed-off-by: Brian Behlendorf <[email protected]>
* Remove rpm/fedora directoryBrian Behlendorf2015-03-278-13/+2
| | | | | | | | | Originally it was thought that custom spec files might be required for Fedora. Happily that has turns out not to be the case. Since this directory just contains symlinks to the generic spec files it can be removed. Signed-off-by: Brian Behlendorf <[email protected]>
* Check all vdev labels in 'zpool import'Brian Behlendorf2015-03-254-19/+61
| | | | | | | | | | | | | | | | | | When using 'zpool import' to scan for available pools prefer vdev names which reference vdevs with more valid labels. There should be two labels at the start of the device and two labels at the end of the device. If labels are missing then the device has been damaged or is in some other way incomplete. Preferring names with fully intact labels helps weed out bad paths and improves the likelihood of being able to import the pool. This behavior only applies when scanning /dev/ for valid pools. If a cache file exists the pools described by the cache file will be used. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlap <[email protected]> Closes #3145 Closes #2844 Closes #3107
* dbuf_free_range() overzealously frees dbufsNed Bass2015-03-251-1/+6
| | | | | | | | | | | | | | | | | When called to free a spill block from a dnode, dbuf_free_range() has a bug that results in all dbufs for the dnode getting freed. A variety of problems may result from this bug, but a common one was a zap lookup tripping an ASSERT because the zap buffers had been zeroed out. This could happen on a dataset with xattr=sa set when extended attributes are written and removed on a directory concurrently with I/O to files in that directory. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Fixes #3195 Fixes #3204 Fixes #3222
* Set the maximum ZVOL transfer size correctlyTim Chase2015-03-252-2/+2
| | | | | | | | | | | | ZoL had been setting max_sectors to UINT_MAX, but until Linux 3.19, it the kernel artifically capped it at 1024 (BLK_DEF_MAX_SECTORS). This cap was removed in torvalds/linux@34b48db. This patch changes it to DMU_MAX_ACCESS (in sectors) and also changes the ASSERT in dmu_tx_hold_write() to allow the maximum transfer size. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3212
* Execute udevadm settle before trying to import poolsGordan Bobic2015-03-241-0/+3
| | | | | | | | | | | Execute udevadm settle before trying to import pools. Otherwise the disk device nodes may not be ready before import time. This is analogous to the behavior of the init scripts and systemd units. Signed-off-by: Gordan Bobic <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3213
* zio_injection_enabled should not be a module optionIsaac Huang2015-03-243-21/+7
| | | | | | | | | | | | | The zio_inject.c keeps zio_injection_enabled as a counter of fault handlers, so it should not be exported to user space as a module option. Several EXPORT_SYMBOLs are moved from zio.c to zio_inject.c, where the symbols are defined. Signed-off-by: Isaac Huang <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3199
* Reduce size of zfs_sb_t: allocate z_hold_mtx separatelyChris Dunlop2015-03-243-1/+10
| | | | | | | | | | | | | | | | zfs_sb_t has grown to the point where using kmem_zalloc() for allocations is triggering the 32k warning threshold. We can't safely convert this entire allocation to use vmem_alloc() instead of kmem_alloc() because the backing_dev_info structure is embedded here. It depends on the bit_waitqueue() function which won't behave properly when given a virtual address. Instead, use vmem_alloc() to allocate the z_hold_mtx array separately. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlop <[email protected]> Closes #3178
* Fix arc_adjust_meta() behaviorBrian Behlendorf2015-03-202-23/+80
| | | | | | | | | | | | | | | | | | | | | | | | The goal of this function is to evict enough meta data buffers from the ARC in order to enforce the arc_meta_limit. Achieving this is slightly more complicated than it appears because it is common for data buffers to have holds on meta data buffers. In addition, dnode meta data buffers will be held by the dnodes in the block preventing them from being freed. This means we can't simply traverse the ARC and expect to always find enough unheld meta data buffer to release. Therefore, this function has been updated to make alternating passes over the ARC releasing data buffers and then newly unheld meta data buffers. This ensures forward progress is maintained and arc_meta_used will decrease. Normally this is sufficient, but if required the ARC will call the registered prune callbacks causing dentry and inodes to be dropped from the VFS cache. This will make dnode meta data buffers available for reclaim. The number of total restarts in limited by zfs_arc_meta_adjust_restarts to prevent spinning in the rare case where all meta data is pinned. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Issue #3160
* Restructure per-filesystem reclaimBrian Behlendorf2015-03-206-43/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally when the ARC prune callback was introduced the idea was to register a single callback for the ZPL. The ARC could invoke this call back if it needed the ZPL to drop dentries, inodes, or other cache objects which might be pinning buffers in the ARC. The ZPL would iterate over all ZFS super blocks and perform the reclaim. For the most part this design has worked well but due to limitations in 2.6.35 and earlier kernels there were some problems. This patch is designed to address those issues. 1) iterate_supers_type() is not provided by all kernels which makes it impossible to safely iterate over all zpl_fs_type filesystems in a single callback. The most straight forward and portable way to resolve this is to register a callback per-filesystem during mount. The arc_*_prune_callback() functions have always supported multiple callbacks so this is functionally a very small change. 2) Commit 050d22b removed the non-portable shrink_dcache_memory() and shrink_icache_memory() functions and didn't replace them with equivalent functionality. This meant that for Linux 3.1 and older kernels the ARC had no mechanism to drop dentries and inodes from the caches if needed. This patch adds that missing functionality by calling shrink_dcache_parent() to release dentries which may be pinning inodes. This will result in all unused cache entries being dropped which is a bit heavy handed but it's the only interface available for old kernels. 3) A zpl_drop_inode() callback is registered for kernels older than 2.6.35 which do not support the .evict_inode callback. This ensures that when the last reference on an inode is dropped it is immediately removed from the cache. If this isn't done than inode can end up on the global unused LRU with no mechanism available to ZFS to drop them. Since the ARC buffers are not dropped the hottest inodes can still be recreated without performing disk IO. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Issue #3160
* Fix arc_meta_max accountingBrian Behlendorf2015-03-201-3/+4
| | | | | | | | | The arc_meta_max value should be increased when space it consumed not when it is returned. This ensure's that arc_meta_max is always up to date. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Issue #3160
* Increase Linux pipe buffer size on 'zfs receive'Richard Yao2015-03-201-0/+43
| | | | | | | | | | | | | | | | | | | | | I noticed when reviewing documentation that it is possible for user space to use fctnl(fd, F_SETPIPE_SZ, (unsigned long) size) to change the kernel pipe buffer size on Linux to increase the pipe size up to the value specified in /proc/sys/fs/pipe-max-size. There are users using mbuffer to improve zfs recv performance when piping over the network, so it seems advantageous to integrate such functionality directly into the zfs recv tool. This avoids the addition of two buffers and two copies (one for the buffer mbuffer adds and another for the additional pipe), so it should be more efficient. This could have been made configurable and/or this could have changed the value back to the original after we were done with the file descriptor, but I do not see a strong case for doing either, so I went with a simple implementation. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #1161
* Move duplicate information about the 'zfs send -e' option.Turbo Fredriksson2015-03-191-18/+18
| | | | | | | | | The extra one was under the 'zfs receive' command (which isn't relevant). Instead, it should have been further up (still in the 'zfs send' option). Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3194
* Use MUTEX_FSTRANS on l2arc_buflist_mtxChunwei Chen2015-03-181-1/+1
| | | | | | | | | | | | | Use MUTEX_FSTRANS on l2arc_buflist_mtx to prevent the following deadlock scenario: 1. arc_release() -> hash_lock -> l2arc_buflist_mtx 2. l2arc_write_buffers() -> l2arc_buflist_mtx -> (direct reclaim) -> arc_buf_remove_ref() -> hash_lock Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Signed-off-by: Tim Chase <[email protected]> Issue #3160
* Fix warning about AM_INIT_AUTOMAKE argumentsHajo Möller2015-03-181-2/+3
| | | | | | | | | | | | | | | | | | | As of automake 1.14.2, currently shipped with Ubuntu 14.04, automake warns about AM_INIT_AUTOMAKE having more than one argument: configure.ac:41: warning: AM_INIT_AUTOMAKE: two- and three-arguments forms are deprecated. For more info, see: configure.ac:41: http://www.gnu.org/software/automake/manual/automake.html#Modernize-AM_005fINIT_005fAUTOMAKE-invocation This commit fixes the warnings by following above link's advice, so AM_INIT gets called with the package's name and version. As both are defined in the META file we're parsing it with `grep`, `cut` and `tr`. NOTE: autoconf < 1.14 not supporting m4_esyscmd_s so m4_esyscmd was used and modified `tr` to truncate newlines, too. Signed-off-by: Hajo M<C3><B6>ller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3174
* Linux 4.0 compat: bdi_setup_and_register() __must_checkBill McGonigle2015-03-161-2/+4
| | | | | | | | | | | Explicitly disable the unused by variable warnings by setting __attribute__((unused)) for bdi_setup_and_register(). This is required because the function is defined with the __must_check attribute. Signed-off-by: Bill McGonigle <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3141
* Illumos 5630 - stale bonus buffer in recycled dnode_t leads to data corruptionJustin T. Gibbs2015-03-124-9/+62
| | | | | | | | | | | | | | | | | | 5630 stale bonus buffer in recycled dnode_t leads to data corruption Author: Justin T. Gibbs <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Will Andrews <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://www.illumos.org/issues/5630 https://github.com/illumos/illumos-gate/commit/cd485b4 Ported-by: Chris Dunlop <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Issue #3172