summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Illumos 5175 - implement dmu_read_uio_dbuf() to improve cached read performanceMatthew Ahrens2015-06-293-11/+78
| | | | | | | | | | | | | | | | | | | | | | | | 5175 implement dmu_read_uio_dbuf() to improve cached read performance Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Alex Reece <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://www.illumos.org/issues/5175 https://github.com/illumos/illumos-gate/commit/f8554bb Porting notes: This patch doesn't include the changes for the COMSTAR (Common Multiprotocol SCSI Target) - since it's not available for ZoL. http://thegreyblog.blogspot.co.at/2010/02/setting-up-solaris-comstar-and.html Ported by: kernelOfTruth <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3392
* Add /dev/mapper to the list of possible sources for pool devices.Turbo Fredriksson2015-06-291-0/+9
| | | | | | | | This is especially needed when using LUKS backed pools. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3536
* Remove l2ad_evict from zfs_l2arc_evict_classTim Chase2015-06-291-5/+2
| | | | | | | | | | Illumos 5701 (zpool list reports incorrect "alloc" value for cache devices) removed l2ad_evict from l2arc_dev_t. It should also be removed from the zfs_l2arc_evict_class event class. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3534
* Add ziltest.shBrian Behlendorf2015-06-262-0/+302
| | | | | | | | | | | | | | | The ziltest.sh script is a test case designed to verify the correct functioning of the ZIL. It's being added to the scripts directory so it can be easily added to the automated regression testing. The general idea is to build up an intent log from a bunch of diverse user commands without actually committing them to the file system. Then copy the file system, replay the intent log and compare the file system and the copy. Ported-by: Don Brady <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3531
* Fix for recent zdb -h | -i crashes (seg fault)Don Brady2015-06-262-7/+20
| | | | | | | | | | Allocating SPA_MAXBLOCKSIZE on the stack is a bad idea (even with the old 128K size). Use malloc instead when allocating temporary block buffer memory. Signed-off-by: Don Brady <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3522
* zdb -d has false positive warning when feature@large_blocks=disabledDon Brady2015-06-261-11/+16
| | | | | | | | Skip large blocks feature refcount checking if feature is disabled. Signed-off-by: Don Brady <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3468
* Additional SYSV init script fixes (3).Turbo Fredriksson2015-06-252-54/+22
| | | | | | | | | | | | | | | | | | * In read_mtab(), fix problems (!?) in the mounts file. It will record 'rpool 1' as 'rpool\0401' instead of 'rpool\00401' which seems to be the correct (at least as far as 'printf' is concerned). Use this using the external 'echo' command (and not the one built in to the shell) because the internal one would interpret the backslash code (incorrectly), giving us a  instead. * Remove reregister_mounts() - no longer needed. * For Gentoo, the zfs_log_failure_msg() should use eend(), not eerror() (which requires an error message, which we don't have). Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3488 Closes #3509 Closes #3514
* Revert "Additional SYSV init script fixes."Turbo Fredriksson2015-06-251-10/+7
| | | | | | | | | | | This reverts commit 036391c980c1e6504352b770eb385806a951b1cb. Because #3509 came just after this commit was accepted and is related to the original problem the commit was supposed to fix, we need to solve the problem in another way. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 5368 - ARC should cache more metadataMatthew Ahrens2015-06-251-0/+6
| | | | | | | | | | | | | | | | | | | | 5368 ARC should cache more metadata Reviewed by: Alex Reece <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5368 https://github.com/illumos/illumos-gate/commit/3a5286a Porting Notes: The vast majority of this patch was already merged in the context of the 06358ea changes. This is just a small hunk which was missed. Ported-by: Brian Behlendorf <[email protected]>
* Illumos 5163 - arc should reap range_seg_cacheGeorge Wilson2015-06-254-2/+9
| | | | | | | | | | | | | | | | | | | | 5163 arc should reap range_seg_cache Reviewed by: Christopher Siden <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5163 https://github.com/illumos/illumos-gate/commit/83803b5 Porting Notes: Added umem_cache_reap_now() wrapped to suppress unused variable warning for user space build in arc_kmem_reap_now(). Ported-by: Brian Behlendorf <[email protected]>
* Update all default taskq settingsBrian Behlendorf2015-06-2510-37/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Over the years the default values for the taskqs used on Linux have differed slightly from illumos. In the vast majority of cases this was done to avoid creating an obnoxious number of idle threads which would pollute the process listing. With the addition of support for dynamic taskqs all multi-threaded queues should be created as dynamic taskqs. This allows us to get the best of both worlds. * The illumos default values for the I/O pipeline can be restored. These values are known to work well for most workloads. The only exception is the zio write interrupt taskq which is changed to ZTI_P(12, 8). At least under Linux more threads has been shown to improve performance, see commit 7e55f4e. * Reduces the number of idle threads on the system when it's not under heavy load. The maximum number of threads will only be created when they are required. * Remove the vdev_file_taskq and rely on the system_taskq instead which is now dynamic and may have up to 64-threads. Again this brings us back inline with upstream. * Tasks dispatched with taskq_dispatch_ent() are allowed to use dynamic taskqs. The Linux taskq implementation supports this. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #3507
* Account for ashift when gathering buffers to be written to l2arc deviceAndriy Gapon2015-06-251-13/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we don't account for that, then we might end up overwriting disk area of buffers that have not been evicted yet, because l2arc_evict operates in terms of disk addresses. The discrepancy between the write size calculation and the actual increment to l2ad_hand was introduced in commit 3a17a7a9. The change that introduced l2ad_hand alignment was almost correct as the write size was accumulated as a sum of rounded buffer sizes. See commit illumos/illumos-gate@e14bb32. Also, we now consistently use asize / a_sz for the allocated size and psize / p_sz for the physical size. The latter accounts for a possible size reduction because of the compression, whereas the former accounts for a possible subsequent size expansion because of the alignment requirements. The code still assumes that either underlying storage subsystems or hardware is able to do read-modify-write when an L2ARC buffer size is not a multiple of a disk's block size. This is true for 4KB sector disks that provide 512B sector emulation, but may not be true in general. In other words, we currently do not have any code to make sure that an L2ARC buffer, whether compressed or not, which is used for physical I/O has a suitable size. Note that currently the cache device utilization is calculated based on the physical size, not the allocated size. The same applies to l2_asize kstat. That is wrong, but this commit does not fix that. The accounting problem was introduced partially in commit 3a17a7a9 and partially in 3038a2b (accounting became consistent but in favour of the wrong size). Porting Notes: Reworked to be C90 compatible and the 'write_psize' variable was removed because it is now unused. References: https://reviews.csiden.org/r/229/ https://reviews.freebsd.org/D2764 Ported-by: kernelOfTruth <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3400 Closes #3433 Closes #3451
* Illumos 5701 - zpool list reports incorrect "alloc" value for cache devicesPrakash Surya2015-06-252-43/+140
| | | | | | | | | | | | | | | | | | | | 5701 zpool list reports incorrect "alloc" value for cache devices Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Alek Pinchuk <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5701 https://github.com/illumos/illumos-gate/commit/a52fc31 Porting Notes: arc_space_return(HDR_L2ONLY_SIZE, ARC_SPACE_L2HDRS); correctly placed at arc_hdr_l2hdr_destroy(arc_buf_hdr_t *hdr). Ported by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Add IMPLY() and EQUIV() macrosBrian Behlendorf2015-06-241-1/+17
| | | | | | | | | Added for upstream compatibility, they are of the form: * IMPLY(a, b) - if (a) then (b) * EQUIV(a, b) - if (a) then (b) *AND* if (b) then (a) Signed-off-by: Brian Behlendorf <[email protected]>
* zfsdev_getminor() should check for invalid file handlesRichard Yao2015-06-224-9/+39
| | | | | | | | | | | | | | | | | | | | | | Unit testing at ClusterHQ found that passing an invalid file handle to zfs_ioc_hold results in a NULL pointer dereference on a system without assertions: IP: [<ffffffffa0218aa0>] zfsdev_getminor+0x10/0x20 [zfs] Call Trace: [<ffffffffa021b4b0>] zfs_onexit_fd_hold+0x20/0x40 [zfs] [<ffffffffa0214043>] zfs_ioc_hold+0x93/0xd0 [zfs] [<ffffffffa0215890>] zfsdev_ioctl+0x200/0x500 [zfs] An assertion would have caught this had they been enabled, but this is something that the kernel module should handle without failing. We resolve this by searching the linked list to ensure that the file handle's private_data points to a valid zfsdev_state_t. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Andriy Gapon <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3506
* Make metaslab_aliquot a module parameter.Etienne Dechamps2015-06-222-1/+18
| | | | | | | | | | This seems generally useful. metaslab_aliquot is the ZFS allocation granularity, which is roughly equivalent to what is called the stripe size in traditional RAID arrays. It seems relevant to performance tuning. Signed-off-by: Etienne Dechamps <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Document metaslab_aliquot.Etienne Dechamps2015-06-221-0/+7
| | | | | Signed-off-by: Etienne Dechamps <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Allocate disk space fairly in the presence of vdevs of unequal size.Etienne Dechamps2015-06-221-11/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The metaslab allocator device selection algorithm contains a bias mechanism whose goal is to achieve roughly equal disk space usage across all top-level vdevs. It seems that the initial rationale for this code was to allow newly added (empty) vdevs to "come up to speed" faster in an attempt to make the pool quickly converge to a steady state where all vdevs are equally utilized. While the code seems to work reasonably well for this use case, there is another scenario in which this algorithm fails miserably: the case where top-level vdevs don't have the same sizes (capacities). ZFS allows this, and it is a good feature to have, so that users who simply want to build a pool with the disks they happen to have lying around can do so even if the disks have heteregenous sizes. Here's a script that simulates a pool with two vdevs, with one 4X larger than the other: dd if=/dev/zero of=/tmp/d1 bs=1 count=1 seek=134217728 dd if=/dev/zero of=/tmp/d2 bs=1 count=1 seek=536870912 zpool create testspace /tmp/d1 /tmp/d2 dd if=/dev/zero of=/testspace/foobar bs=1M count=256 zpool iostat -v testspace Before this commit, the script would output the following: capacity pool alloc free ---------- ----- ----- testspace 252M 375M /tmp/d1 104M 18.5M /tmp/d2 148M 356M ---------- ----- ----- This demonstrates that the current code handles this situation very poorly: d1 shows 85% usage despite the pool itself being only 40% full. d1 is quite saturated at this point, and is slowing down the entire pool due to saturation, fragmentation and the like. In contrast, here's the result with the code in this commit: capacity pool alloc free ---------- ----- ----- testspace 252M 375M /tmp/d1 56.7M 66.3M /tmp/d2 195M 309M ---------- ----- ------ This looks much better. d1 is 46% used, which is close to the overall pool utilization (40%). The code still doesn't result in perfectly balanced allocation, probably because of the way mg_bias is applied which does not guarantee perfect accuracy, but this is still much better than before. Signed-off-by: Etienne Dechamps <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3389
* Add zfs_sb_prune_aliases() functionBrian Behlendorf2015-06-223-11/+84
| | | | | | | | | | | | | | | | | | | | | | | For kernels which do not implement a per-suberblock shrinker, those older than Linux 3.1, the shrink_dcache_parent() function was used to attempt to reclaim dentries. This was found not be entirely reliable and could lead to performance issues on older kernels running meta-data heavy workloads. To address this issue a zfs_sb_prune_aliases() function has been added to implement this functionality. It relies on traversing the list of znodes for a filesystem and adding them to a private list with a reference held. The private list can then be safely walked outside the z_znodes_lock to prune dentires and drop the last reference so the inode can be freed. This provides the same synchronous behavior as the per-filesystem shrinker and has the advantage of depending on only long standing interfaces. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #3501
* Increase the number of iput taskq threadsBrian Behlendorf2015-06-221-2/+2
| | | | | | | | | | | | | The number of threads in the iput taskq has been increased to speed up the number of iputs which can be handled. This has been observed to improve the meta data reclaim regardless of zfs_sb_prune() implementation in use. The taskq has also been renamed z_iput to for consistency with the rest of the I/O pipeline taskqs which are all named z_*. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]>
* Linux 4.1 compat: use read_iter() / write_iter()Matus Kral2015-06-183-8/+70
| | | | | | | | | | | | | Linux 3.15 commit torvalds/linux@293bc98 introduced two new methods. The ->read_iter() and ->write_iter() methods were designed to replace the ->aio_read() and ->aio_write() interfaces. Both interfaces were preserved for several kernel releases in order to migrate all existing consumers to the new interfaces. But as of Linux 4.1 the legacy interface has been retired and the ZFS code must be updated to use the new interfaces. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3352
* Update dracut READMESören Tempel2015-06-181-5/+19
| | | | | | | | | Include information about zfs-lib.sh.in and mention that it is possible to set the bootfs attribute for an entire pool. Signed-off-by: Sören Tempel <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3109
* Only source '/lib/dracut-lib.sh' if it wasn't so farSören Tempel2015-06-172-2/+1
| | | | | | Signed-off-by: Sören Tempel <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3109
* Force export if it failed the first timeSören Tempel2015-06-171-1/+1
| | | | | | Signed-off-by: Sören Tempel <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3109
* Refactor dracut moduleSören Tempel2015-06-178-100/+150
| | | | | | | | Provide '/lib/dracut-zfs-lib.sh' with utility functions. Signed-off-by: Sören Tempel <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3109
* Additional SYSV init script fixes.Turbo Fredriksson2015-06-171-7/+10
| | | | | | | | | | | | | | | | | | | | Use the 'mount' command instead of /proc/mounts to get a list of matching filesystems. This because /proc/mounts reports a pool with a space 'rpool 1' as 'rpool\0401'. The space is encoded as 3-digit octal which is legal. However 'printf "%b"', which we use to filter out other illegal characters (such as slash, space etc) can't properly interpret this because it expects 4-digit octal. We get a  instead of the space we expected. The correct value should have been 'rpool\00401' (note the additional leading zero). So use 'mount', which interprets all backslash-escapes correctly, instead. Signed-off-by: Turbo Fredriksson [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3488
* Unify mount and share for 'zfs create/clone'Brian Behlendorf2015-06-171-52/+47
| | | | | | | | | Both the 'zfs create' and 'zfs clone' commands are expected to automatically mount and share new filesystems. Since this is common functionality it has been moved in to a shared helper function. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3459
* 3.12 compat, NUMA-aware per-superblock shrinkerTim Chase2015-06-173-2/+34
| | | | | | | | | | Kernels >= 3.12 have a NUMA-aware superblock shrinker which is used in ZoL by zfs_sb_prune(). This patch calls the shrinker for each on-line NUMA node in order that memory be freed for each one. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3495
* Add -y option to `zpool iostat`Hajo Möller2015-06-172-10/+36
| | | | | | | | | | sysstat's iostat omits the first report when the -y option is used. This patch adds that functionality and omits the first report with statistics since system boot. Signed-off-by: Hajo Möller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3439
* Wait interruptibly in prefetch threadBrian Behlendorf2015-06-161-3/+3
| | | | | | | | | | | | | | The Linux kernel watchdog will automatically dump a backtrace for any process while sleeps for over 120s in an uninterruptible state. The solution is for the prefetch thread to sleep in an interruptible state. The way the existing code was written this is safe because when woken it will always reevaluate its conditional. As a general rule it is preferable to sleep in an interruptible when possible. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3450 Closes #3402
* Rename cv_wait_interruptible() to cv_wait_sig()Brian Behlendorf2015-06-114-8/+8
| | | | | | | | | | | This is the counterpart to zfsonlinux/spl@2345368 which replaces the cv_wait_interruptible() function with cv_wait_sig(). There is no functional change to patch merely brings the function names in to sync to maximize portability. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3450 Issue #3402
* Merge branch 'lock-contention-on-arcs_mtx-final'Brian Behlendorf2015-06-1133-1477/+3011
|\ | | | | | | | | | | | | Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf Closes #3115 Closes #3481
| * Increase arc_c_min to allow safe operation of arc_adapt()Tim Chase2015-06-111-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | ZoL had lowered the minimum ARC size to 4MiB to better accommodate tiny systems such as the raspberry pi, however, as of addition of large block support, the arc_adapt() function depends on arc_c being >= 32MiB (2 * SPA_MAXBLOCKSIZE). This patch raises the minimum ARC size to 32MiB and adds a VERIFY test to arc_adapt() for future-proofing. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
| * Make arc_prune() asynchronousBrian Behlendorf2015-06-112-39/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As described in the comment above arc_adapt_thread() it is critical that the arc_adapt_thread() function never sleep while holding a hash lock. This behavior was possible in the Linux implementation because the arc_prune() logic was implemented to be synchronous. Under illumos the analogous dnlc_reduce_cache() function is asynchronous. To address this the arc_do_user_prune() function is has been reworked in to two new functions as follows: * arc_prune_async() is an asynchronous implementation which dispatches the prune callback to be run by the system taskq. This makes it suitable to use in the context of the arc_adapt_thread(). * arc_prune() is a synchronous implementation which depends on the arc_prune_async() implementation but blocks until the outstanding callbacks complete. This is used in arc_kmem_reap_now() where it is safe, and expected, that memory will be freed. This patch additionally adds the zfs_arc_meta_strategy module option while allows the meta reclaim strategy to be configured. It defaults to a balanced strategy which has been proved to work well under Linux but the illumos meta-only strategy can be enabled. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
| * Use taskq_wait_outstanding() functionBrian Behlendorf2015-06-113-6/+6
| | | | | | | | | | | | | | | | | | | | | | Replace taskq_wait() with taskq_wait_oustanding(). This way callers will only block until previously submitted tasks have been completed. This was the previous behavior of task_wait() prior to the introduction of taskq_wait_outstanding() so this isn't really a functionalty change for these callers. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
| * Add taskq_wait_outstanding() functionBrian Behlendorf2015-06-112-0/+7
| | | | | | | | | | | | | | | | | | SPL commit behlendorf/spl@9cef1b5 adds the taskq_wait_outstanding() interface. See the commit log for the full justification for this addition. This patch adds the required user space counterpart. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]>
| * Illumos 5497 - lock contention on arcs_mtxPrakash Surya2015-06-1117-680/+1862
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reviewed by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Dan McDonald <[email protected]> Porting notes and other significant code changes: The illumos 5368 patch (ARC should cache more metadata), which was never picked up by ZoL, is mostly reverted by this patch. Since ZoL relies on the kernel asynchronously calling the shrinker to actually reap memory, the shrinker wakes up arc_reclaim_waiters_cv every time it runs. The arc_adapt_thread() function no longer calls arc_do_user_evicts() since the newly-added arc_user_evicts_thread() calls it periodically. Notable conflicting ZoL commits which conflicted with this patch or whose effects are either duplicated or un-done by this patch: 302f753 - Integrate ARC more tightly with Linux 39e055c - Adjust arc_p based on "bytes" in arc_shrink f521ce1 - Allow "arc_p" to drop to zero or grow to "arc_c" 77765b5 - Remove "arc_meta_used" from arc_adjust calculation 94520ca - Prune metadata from ghost lists in arc_adjust_meta Trace support for multilist_insert() and multilist_remove() has been added and produces the following output: fio-12498 [077] .... 112936.448324: zfs_multilist__insert: ml { offset 240 numsublists 80 sublistidx 63 } fio-12498 [077] .... 112936.448347: zfs_multilist__remove: ml { offset 240 numsublists 80 sublistidx 29 } The following arcstats have been removed: recycle_miss - Used by arcstat.py and arc_summary.py, both of which have been updated appropriately. l2_writes_hdr_miss The following arcstats have been added: evict_not_enough - Number of times arc_evict_state() was unable to evict enough buffers to reach its target amount. evict_l2_skip - Number of times arc_evict_hdr() skipped eviction because it was being written to the l2arc. l2_writes_lock_retry - Replaces l2_writes_hdr_miss. Number of times l2arc_write_done() failed to acquire hash_lock (and re-tries). arc_meta_min - Shows the value of the zfs_arc_meta_min module parameter (see below). The "index" column of the "dbuf" kstat has been removed since it doesn't have a direct analog in the new multilist scheme. Additional multilist- related stats could be added in the future but would likely require extensions to the mulilist API. The following module parameters have been added: zfs_arc_evict_batch_limit - Number of ARC headers to free per sub-list before moving on to the next sub-list. zfs_arc_meta_min - Enforce a floor on the amount of metadata in the ARC. zfs_arc_num_sublists_per_state - Number of multilist sub-lists per ARC state. zfs_arc_overflow_shift - Controls amount by which the ARC must exceed the target size to be considered "overflowing". Ported-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]
| * Illumos 5408 - managing ZFS cache devices requires lots of RAMChris Williamson2015-06-115-632/+948
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5408 managing ZFS cache devices requires lots of RAM Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Don Brady <[email protected]> Reviewed by: Josef 'Jeff' Sipek <[email protected]> Approved by: Garrett D'Amore <[email protected]> Porting notes: Due to the restructuring of the ARC-related structures, this patch conflicts with at least the following existing ZoL commits: 6e1d7276c94cbd7c2e19f9232f6ba4bafa62dbe0 Fix inaccurate arcstat_l2_hdr_size calculations The ARC_SPACE_HDRS constant no longer exists and has been somewhat equivalently replaced by HDR_L2ONLY_SIZE. e0b0ca983d6897bcddf05af2c0e5d01ff66f90db Add visibility in to cached dbufs The new layering of l{1,2}arc_buf_hdr_t within the arc_buf_hdr struct requires additional structure member names to be used when referencing the inner items. Also, the presence of L1 or L2 inner member is indicated by flags using the new HDR_HAS_L{1,2}HDR macros. Ported by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
| * Illumos 5369 - arc flags should be an enumGeorge Wilson2015-06-1113-389/+398
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5369 arc flags should be an enum 5370 consistent arc_buf_hdr_t naming scheme Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Alex Reece <[email protected]> Reviewed by: Sebastien Roy <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Richard Lowe <[email protected]> Porting notes: ZoL has moved some ARC definitions into arc_impl.h. Signed-off-by: Brian Behlendorf <[email protected]> Ported by: Tim Chase <[email protected]>
| * Partially revert "Add ddt, ddt_entry, and l2arc_hdr caches"Tim Chase2015-06-111-10/+6
| | | | | | | | | | | | | | | | | | | | | | This reverts only the l2arc_hdr part of commit ecf3d9b8e63e5659269e15db527380c65780f71a in preparation for the illumos 5497 "lock contention on arcs_mtx" patch which does the same thing but uses the newer two-level ARC structure following the Illumos 5408 "managing ZFS cache devices requires lots of RAM" patch. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
| * Revert "Allow arc_evict_ghost() to only evict meta data"Tim Chase2015-06-111-15/+11
| | | | | | | | | | | | | | | | Illumos 5497 "lock contention on arcs_mtx" reworks eviction and obviates the need for this. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
| * Revert "fix l2arc compression buffers leak"Tim Chase2015-06-111-55/+10
| | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 037763e44e0f6d7284e9328db988a89fdc975a4e in preparation for the illumos 5497 "lock contention on arcs_mtx" patch which includes a fix for this very problem. ZoL had picked up a subset of the illumos 5497 patch to deal with the l2arc compression buffer leak. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
| * Revert "arc_evict, arc_evict_ghost: reduce stack usage using kmem_zalloc"Tim Chase2015-06-111-20/+12
|/ | | | | | | | | This reverts commit 16fcdea36340c658b4557fd34a74915fd618f7a6 in preparation for the illumos 5497 "lock contention on arcs_mtx" patch which eliminates "marker" within the ARC code. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Remove unused variable in vdev_add_child()Brian Behlendorf2015-06-111-2/+1
| | | | | | | | | | | | Commit c3520e7 restructured vdev_add_child() in such a way that the spa variable was unused during non-debug builds. This is consistent with the upstream illumos code but because ZoL, unlike illumos, is built with all compiler warnings enabled this causes a legitimate warning. Revert this hunk of the patch to keep the build clean. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3432
* Illumos 5818 - zfs {ref}compressratio is incorrect with 4k sector sizeMatthew Ahrens2015-06-107-33/+45
| | | | | | | | | | | | | | | | | 5818 zfs {ref}compressratio is incorrect with 4k sector size Reviewed by: Alex Reece <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Steven Hartland <[email protected]> Approved by: Albert Lee <[email protected]> References: https://www.illumos.org/issues/5818 https://github.com/illumos/illumos-gate/commit/81cd5c5 Ported-by: Don Brady <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3432
* Illumos 5269 - zpool import slowArne Jansen2015-06-0910-60/+240
| | | | | | | | | | | | | | | | 5269 zpool import slow Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Dan McDonald <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5269 https://github.com/illumos/illumos-gate/commit/12380e1e Ported-by: DHE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3396
* Improve on the ZFS events documentationTurbo Fredriksson2015-06-093-12/+212
| | | | | | | | | | | * Add information about the 'zpool events' command in zpool(8). * More events and payloads defined in zfs-events(5). * I/O Stages and I/O Flags sections added. * Remove unused legacy "zio_deadline" payload define. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3467
* dmu_objset_userquota_get_ids uses dn_bonus unsafelyNed Bass2015-06-051-3/+17
| | | | | | | | | | | | | | The function dmu_objset_userquota_get_ids() checks and uses dn->dn_bonus outside of dn_struct_rwlock. If the dnode is being freed then the bonus dbuf may be in the process of getting evicted. In this case there is a race that may cause dmu_objset_userquota_get_ids() to access the dbuf after it has been destroyed. To prevent this, ensure that when we are using the bonus dbuf we are either holding a reference on it or have taken dn_struct_rwlock. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3443
* dbuf_try_add_ref minor bug fixesNed Bass2015-06-051-2/+2
| | | | | | | | | | | | | - Don't check db->bb_blkid, but use the blkid argument instead. Checking db->db_blkid may be unsafe since we doesn't yet have a hold on the dbuf so its validity is unknown. - Call mutex_exit() on found_db, not db, since it's not certain that they point to the same dbuf, and the mutex was taken on found_db. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3443
* SYSV init script fixes.Turbo Fredriksson2015-06-054-17/+35
| | | | | | | | | | | | | | | | | | | | | * Change the order of the function library check/load. Redhat based system _can_ have a /lib/lsb/init-functions file (from the redhat-lsb-core package), but it's only partially what we can use. Instead, look for that file last, giving the script a chance to catch the 'real' distribution file. * Filter out dashes and dots in dataset name in read_mtab(). * Get rid of 'awk' entirely. This is usually in /usr, which might not be availible. * Get rid of the 'find /dev/disk/by-*' (find is on /usr, which might not be availible). Instead use echo in a for loop. * Rebuild scripts if any of the *.in files changed. * Move the sed part that filters out duplicates inside the check fo valid variable. Signed-off-by: Turbo Fredriksson [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3463 Closes #3457