| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4745 fix AVL code misspellings
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Richard Lowe <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
References:
https://github.com/illumos/illumos-gate/commit/6907ca4
https://www.illumos.org/issues/4745
Ported-by: kernelOfTruth [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3565
|
|
|
|
|
|
|
|
|
| |
Reclaim during metaslab preloading can cause deadlocks involving znode
z_lock and ARC buffer header ht_lock.
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3532.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5008 lock contention (rrw_exit) while running a read only load
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Alex Reece <[email protected]>
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: Richard Yao <[email protected]>
Reviewed by: Saso Kiselkov <[email protected]>
Approved by: Garrett D'Amore <[email protected]>
Porting notes:
This patch ported perfectly cleanly to ZoL. During testing 100% cached
small-block reads, extreme contention was noticed on rrl->rr_lock from
rrw_exit() due to the frequent entering and leaving ZPL. Illumos picked
up this patch from FreeBSD and it also helps under Linux.
On a 1-minute 4K cached read test with 10 fio processes pinned to a single
socket on a 4-socket (10 thread per socket) NUMA system, contentions on
rrl->rr_lock were reduced from 508799 to 43085.
Ported-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3555
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5911 ZFS "hangs" while deleting file
Reviewed by: Bayard Bell <[email protected]>
Reviewed by: Alek Pinchuk <[email protected]>
Reviewed by: Simon Klinkert <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Approved by: Richard Lowe <[email protected]>
References:
https://www.illumos.org/issues/5911
https://github.com/illumos/illumos-gate/commit/46e1baa
Porting notes:
Resolved ISO C90 forbids mixed declarations and code wanting in
the dnode_free_range() function.
Ported-by: kernelOfTruth [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3554
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5981 Deadlock in dmu_objset_find_dp
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
References:
https://www.illumos.org/issues/5981
https://github.com/illumos/illumos-gate/commit/1d3f896
Ported-by: kernelOfTruth [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3553
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5946 zfs_ioc_space_snaps must check that firstsnap and lastsnap refer to snapshots
5945 zfs_ioc_send_space must ensure that fromsnap refers to a snapshot
Reviewed by: Steven Hartland <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Approved by: Gordon Ross <[email protected]>
References:
https://www.illumos.org/issues/5946
https://www.illumos.org/issues/5945
https://github.com/illumos/illumos-gate/commit/24218be
Ported-by: Andriy Gapon <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3552
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in drc_force branch
5870 dmu_recv_end_check() leaks origin_head hold if error happens in drc_force branch
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Andrew Stormont <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/5870
https://github.com/illumos/illumos-gate/commit/beddaa9
Ported-by: Andriy Gapon <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3551
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
promotion
5909 ensure that shared snap names don't become too long after promotion
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/5909
https://github.com/illumos/illumos-gate/commit/cb5842f
Ported-by: Andriy Gapon <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3550
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
has a snapshot
5912 full stream can not be force-received into a dataset if it has a snapshot
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/5912
https://github.com/illumos/illumos-gate/commit/5bae108
Ported-by: Andriy Gapon <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3549
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
6033 arc_adjust() should search MFU lists for oldest
buffer when adjusting MFU size
Reviewed by: Saso Kiselkov <[email protected]>
Reviewed by: Xin Li <[email protected]>
Reviewed by: Prakash Surya <[email protected]>
Approved by: Matthew Ahrens <[email protected]>
References:
https://www.illumos.org/issues/6033
https://github.com/illumos/illumos-gate/commit/31c46cf
Ported-by: kernelOfTruth [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3545
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5175 implement dmu_read_uio_dbuf() to improve cached read performance
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Alex Reece <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
References:
https://www.illumos.org/issues/5175
https://github.com/illumos/illumos-gate/commit/f8554bb
Porting notes:
This patch doesn't include the changes for the COMSTAR (Common
Multiprotocol SCSI Target) - since it's not available for ZoL.
http://thegreyblog.blogspot.co.at/2010/02/setting-up-solaris-comstar-and.html
Ported by: kernelOfTruth <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3392
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5368 ARC should cache more metadata
Reviewed by: Alex Reece <[email protected]>
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/5368
https://github.com/illumos/illumos-gate/commit/3a5286a
Porting Notes:
The vast majority of this patch was already merged in the context
of the 06358ea changes. This is just a small hunk which was missed.
Ported-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5163 arc should reap range_seg_cache
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Reviewed by: Saso Kiselkov <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/5163
https://github.com/illumos/illumos-gate/commit/83803b5
Porting Notes:
Added umem_cache_reap_now() wrapped to suppress unused variable
warning for user space build in arc_kmem_reap_now().
Ported-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Over the years the default values for the taskqs used on Linux have
differed slightly from illumos. In the vast majority of cases this
was done to avoid creating an obnoxious number of idle threads which
would pollute the process listing.
With the addition of support for dynamic taskqs all multi-threaded
queues should be created as dynamic taskqs. This allows us to get
the best of both worlds.
* The illumos default values for the I/O pipeline can be restored.
These values are known to work well for most workloads. The only
exception is the zio write interrupt taskq which is changed to
ZTI_P(12, 8). At least under Linux more threads has been shown
to improve performance, see commit 7e55f4e.
* Reduces the number of idle threads on the system when it's not
under heavy load. The maximum number of threads will only be
created when they are required.
* Remove the vdev_file_taskq and rely on the system_taskq instead
which is now dynamic and may have up to 64-threads. Again this
brings us back inline with upstream.
* Tasks dispatched with taskq_dispatch_ent() are allowed to use
dynamic taskqs. The Linux taskq implementation supports this.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #3507
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we don't account for that, then we might end up overwriting disk
area of buffers that have not been evicted yet, because l2arc_evict
operates in terms of disk addresses.
The discrepancy between the write size calculation and the actual
increment to l2ad_hand was introduced in commit 3a17a7a9.
The change that introduced l2ad_hand alignment was almost correct
as the write size was accumulated as a sum of rounded buffer sizes.
See commit illumos/illumos-gate@e14bb32.
Also, we now consistently use asize / a_sz for the allocated size and
psize / p_sz for the physical size. The latter accounts for a
possible size reduction because of the compression, whereas the
former accounts for a possible subsequent size expansion because of
the alignment requirements.
The code still assumes that either underlying storage subsystems or
hardware is able to do read-modify-write when an L2ARC buffer size is
not a multiple of a disk's block size. This is true for 4KB sector disks
that provide 512B sector emulation, but may not be true in general.
In other words, we currently do not have any code to make sure that
an L2ARC buffer, whether compressed or not, which is used for physical
I/O has a suitable size.
Note that currently the cache device utilization is calculated based
on the physical size, not the allocated size. The same applies to
l2_asize kstat. That is wrong, but this commit does not fix that.
The accounting problem was introduced partially in commit 3a17a7a9
and partially in 3038a2b (accounting became consistent but in favour
of the wrong size).
Porting Notes:
Reworked to be C90 compatible and the 'write_psize' variable was
removed because it is now unused.
References:
https://reviews.csiden.org/r/229/
https://reviews.freebsd.org/D2764
Ported-by: kernelOfTruth <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3400
Closes #3433
Closes #3451
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5701 zpool list reports incorrect "alloc" value for cache devices
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Alek Pinchuk <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/5701
https://github.com/illumos/illumos-gate/commit/a52fc31
Porting Notes:
arc_space_return(HDR_L2ONLY_SIZE, ARC_SPACE_L2HDRS);
correctly placed at arc_hdr_l2hdr_destroy(arc_buf_hdr_t *hdr).
Ported by: kernelOfTruth [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unit testing at ClusterHQ found that passing an invalid file handle to
zfs_ioc_hold results in a NULL pointer dereference on a system without
assertions:
IP: [<ffffffffa0218aa0>] zfsdev_getminor+0x10/0x20 [zfs]
Call Trace:
[<ffffffffa021b4b0>] zfs_onexit_fd_hold+0x20/0x40 [zfs]
[<ffffffffa0214043>] zfs_ioc_hold+0x93/0xd0 [zfs]
[<ffffffffa0215890>] zfsdev_ioctl+0x200/0x500 [zfs]
An assertion would have caught this had they been enabled, but this is
something that the kernel module should handle without failing. We
resolve this by searching the linked list to ensure that the file
handle's private_data points to a valid zfsdev_state_t.
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Andriy Gapon <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3506
|
|
|
|
|
|
|
|
|
|
| |
This seems generally useful. metaslab_aliquot is the ZFS allocation
granularity, which is roughly equivalent to what is called the stripe
size in traditional RAID arrays. It seems relevant to performance
tuning.
Signed-off-by: Etienne Dechamps <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Etienne Dechamps <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The metaslab allocator device selection algorithm contains a bias
mechanism whose goal is to achieve roughly equal disk space usage across
all top-level vdevs.
It seems that the initial rationale for this code was to allow newly
added (empty) vdevs to "come up to speed" faster in an attempt to make
the pool quickly converge to a steady state where all vdevs are equally
utilized.
While the code seems to work reasonably well for this use case, there
is another scenario in which this algorithm fails miserably: the case
where top-level vdevs don't have the same sizes (capacities). ZFS
allows this, and it is a good feature to have, so that users who simply
want to build a pool with the disks they happen to have lying around can
do so even if the disks have heteregenous sizes.
Here's a script that simulates a pool with two vdevs, with one 4X larger
than the other:
dd if=/dev/zero of=/tmp/d1 bs=1 count=1 seek=134217728
dd if=/dev/zero of=/tmp/d2 bs=1 count=1 seek=536870912
zpool create testspace /tmp/d1 /tmp/d2
dd if=/dev/zero of=/testspace/foobar bs=1M count=256
zpool iostat -v testspace
Before this commit, the script would output the following:
capacity
pool alloc free
---------- ----- -----
testspace 252M 375M
/tmp/d1 104M 18.5M
/tmp/d2 148M 356M
---------- ----- -----
This demonstrates that the current code handles this situation very
poorly: d1 shows 85% usage despite the pool itself being only 40% full.
d1 is quite saturated at this point, and is slowing down the entire pool
due to saturation, fragmentation and the like.
In contrast, here's the result with the code in this commit:
capacity
pool alloc free
---------- ----- -----
testspace 252M 375M
/tmp/d1 56.7M 66.3M
/tmp/d2 195M 309M
---------- ----- ------
This looks much better. d1 is 46% used, which is close to the overall
pool utilization (40%). The code still doesn't result in perfectly
balanced allocation, probably because of the way mg_bias is applied
which does not guarantee perfect accuracy, but this is still much better
than before.
Signed-off-by: Etienne Dechamps <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3389
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For kernels which do not implement a per-suberblock shrinker,
those older than Linux 3.1, the shrink_dcache_parent() function
was used to attempt to reclaim dentries. This was found not be
entirely reliable and could lead to performance issues on older
kernels running meta-data heavy workloads.
To address this issue a zfs_sb_prune_aliases() function has been
added to implement this functionality. It relies on traversing
the list of znodes for a filesystem and adding them to a private
list with a reference held. The private list can then be safely
walked outside the z_znodes_lock to prune dentires and drop the
last reference so the inode can be freed.
This provides the same synchronous behavior as the per-filesystem
shrinker and has the advantage of depending on only long standing
interfaces.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #3501
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The number of threads in the iput taskq has been increased to speed
up the number of iputs which can be handled. This has been observed
to improve the meta data reclaim regardless of zfs_sb_prune()
implementation in use.
The taskq has also been renamed z_iput to for consistency with the
rest of the I/O pipeline taskqs which are all named z_*.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 3.15 commit torvalds/linux@293bc98 introduced two new methods.
The ->read_iter() and ->write_iter() methods were designed to replace
the ->aio_read() and ->aio_write() interfaces. Both interfaces were
preserved for several kernel releases in order to migrate all existing
consumers to the new interfaces. But as of Linux 4.1 the legacy
interface has been retired and the ZFS code must be updated to use
the new interfaces.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3352
|
|
|
|
|
|
|
|
|
|
| |
Kernels >= 3.12 have a NUMA-aware superblock shrinker which is used in
ZoL by zfs_sb_prune(). This patch calls the shrinker for each on-line
NUMA node in order that memory be freed for each one.
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3495
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Linux kernel watchdog will automatically dump a backtrace for
any process while sleeps for over 120s in an uninterruptible state.
The solution is for the prefetch thread to sleep in an interruptible
state. The way the existing code was written this is safe because
when woken it will always reevaluate its conditional. As a general
rule it is preferable to sleep in an interruptible when possible.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3450
Closes #3402
|
|
|
|
|
|
|
|
|
|
|
| |
This is the counterpart to zfsonlinux/spl@2345368 which replaces the
cv_wait_interruptible() function with cv_wait_sig(). There is no
functional change to patch merely brings the function names in to
sync to maximize portability.
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #3450
Issue #3402
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZoL had lowered the minimum ARC size to 4MiB to better accommodate tiny
systems such as the raspberry pi, however, as of addition of large block
support, the arc_adapt() function depends on arc_c being >= 32MiB (2 *
SPA_MAXBLOCKSIZE).
This patch raises the minimum ARC size to 32MiB and adds a VERIFY test
to arc_adapt() for future-proofing.
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As described in the comment above arc_adapt_thread() it is critical
that the arc_adapt_thread() function never sleep while holding a hash
lock. This behavior was possible in the Linux implementation because
the arc_prune() logic was implemented to be synchronous. Under
illumos the analogous dnlc_reduce_cache() function is asynchronous.
To address this the arc_do_user_prune() function is has been reworked
in to two new functions as follows:
* arc_prune_async() is an asynchronous implementation which dispatches
the prune callback to be run by the system taskq. This makes it
suitable to use in the context of the arc_adapt_thread().
* arc_prune() is a synchronous implementation which depends on the
arc_prune_async() implementation but blocks until the outstanding
callbacks complete. This is used in arc_kmem_reap_now() where it
is safe, and expected, that memory will be freed.
This patch additionally adds the zfs_arc_meta_strategy module option
while allows the meta reclaim strategy to be configured. It defaults
to a balanced strategy which has been proved to work well under Linux
but the illumos meta-only strategy can be enabled.
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Replace taskq_wait() with taskq_wait_oustanding(). This way callers
will only block until previously submitted tasks have been completed.
This was the previous behavior of task_wait() prior to the introduction
of taskq_wait_outstanding() so this isn't really a functionalty change
for these callers.
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewed by: George Wilson <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Approved by: Dan McDonald <[email protected]>
Porting notes and other significant code changes:
The illumos 5368 patch (ARC should cache more metadata), which
was never picked up by ZoL, is mostly reverted by this patch.
Since ZoL relies on the kernel asynchronously calling the shrinker to
actually reap memory, the shrinker wakes up arc_reclaim_waiters_cv every
time it runs.
The arc_adapt_thread() function no longer calls arc_do_user_evicts()
since the newly-added arc_user_evicts_thread() calls it periodically.
Notable conflicting ZoL commits which conflicted with this patch or
whose effects are either duplicated or un-done by this patch:
302f753 - Integrate ARC more tightly with Linux
39e055c - Adjust arc_p based on "bytes" in arc_shrink
f521ce1 - Allow "arc_p" to drop to zero or grow to "arc_c"
77765b5 - Remove "arc_meta_used" from arc_adjust calculation
94520ca - Prune metadata from ghost lists in arc_adjust_meta
Trace support for multilist_insert() and multilist_remove() has been
added and produces the following output:
fio-12498 [077] .... 112936.448324: zfs_multilist__insert: ml { offset 240 numsublists 80 sublistidx 63 }
fio-12498 [077] .... 112936.448347: zfs_multilist__remove: ml { offset 240 numsublists 80 sublistidx 29 }
The following arcstats have been removed:
recycle_miss - Used by arcstat.py and arc_summary.py, both of which
have been updated appropriately.
l2_writes_hdr_miss
The following arcstats have been added:
evict_not_enough - Number of times arc_evict_state() was unable to
evict enough buffers to reach its target amount.
evict_l2_skip - Number of times arc_evict_hdr() skipped eviction
because it was being written to the l2arc.
l2_writes_lock_retry - Replaces l2_writes_hdr_miss. Number of times
l2arc_write_done() failed to acquire hash_lock (and re-tries).
arc_meta_min - Shows the value of the zfs_arc_meta_min module
parameter (see below).
The "index" column of the "dbuf" kstat has been removed since it doesn't
have a direct analog in the new multilist scheme. Additional multilist-
related stats could be added in the future but would likely require
extensions to the mulilist API.
The following module parameters have been added:
zfs_arc_evict_batch_limit - Number of ARC headers to free per sub-list
before moving on to the next sub-list.
zfs_arc_meta_min - Enforce a floor on the amount of metadata in
the ARC.
zfs_arc_num_sublists_per_state - Number of multilist sub-lists per
ARC state.
zfs_arc_overflow_shift - Controls amount by which the ARC must exceed
the target size to be considered "overflowing".
Ported-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5408 managing ZFS cache devices requires lots of RAM
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Don Brady <[email protected]>
Reviewed by: Josef 'Jeff' Sipek <[email protected]>
Approved by: Garrett D'Amore <[email protected]>
Porting notes:
Due to the restructuring of the ARC-related structures, this
patch conflicts with at least the following existing ZoL commits:
6e1d7276c94cbd7c2e19f9232f6ba4bafa62dbe0
Fix inaccurate arcstat_l2_hdr_size calculations
The ARC_SPACE_HDRS constant no longer exists and has been
somewhat equivalently replaced by HDR_L2ONLY_SIZE.
e0b0ca983d6897bcddf05af2c0e5d01ff66f90db
Add visibility in to cached dbufs
The new layering of l{1,2}arc_buf_hdr_t within the arc_buf_hdr
struct requires additional structure member names to be used
when referencing the inner items. Also, the presence of L1 or L2
inner member is indicated by flags using the new HDR_HAS_L{1,2}HDR
macros.
Ported by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5369 arc flags should be an enum
5370 consistent arc_buf_hdr_t naming scheme
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Alex Reece <[email protected]>
Reviewed by: Sebastien Roy <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Approved by: Richard Lowe <[email protected]>
Porting notes:
ZoL has moved some ARC definitions into arc_impl.h.
Signed-off-by: Brian Behlendorf <[email protected]>
Ported by: Tim Chase <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts only the l2arc_hdr part of commit
ecf3d9b8e63e5659269e15db527380c65780f71a in preparation for the illumos
5497 "lock contention on arcs_mtx" patch which does the same thing
but uses the newer two-level ARC structure following the Illumos 5408
"managing ZFS cache devices requires lots of RAM" patch.
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
| |
Illumos 5497 "lock contention on arcs_mtx" reworks eviction and obviates
the need for this.
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 037763e44e0f6d7284e9328db988a89fdc975a4e in
preparation for the illumos 5497 "lock contention on arcs_mtx" patch
which includes a fix for this very problem.
ZoL had picked up a subset of the illumos 5497 patch to deal with the
l2arc compression buffer leak.
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This reverts commit 16fcdea36340c658b4557fd34a74915fd618f7a6 in preparation
for the illumos 5497 "lock contention on arcs_mtx" patch which eliminates
"marker" within the ARC code.
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit c3520e7 restructured vdev_add_child() in such a way that
the spa variable was unused during non-debug builds. This is
consistent with the upstream illumos code but because ZoL, unlike
illumos, is built with all compiler warnings enabled this causes
a legitimate warning. Revert this hunk of the patch to keep the
build clean.
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #3432
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5818 zfs {ref}compressratio is incorrect with 4k sector size
Reviewed by: Alex Reece <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Reviewed by: Steven Hartland <[email protected]>
Approved by: Albert Lee <[email protected]>
References:
https://www.illumos.org/issues/5818
https://github.com/illumos/illumos-gate/commit/81cd5c5
Ported-by: Don Brady <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3432
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5269 zpool import slow
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/5269
https://github.com/illumos/illumos-gate/commit/12380e1e
Ported-by: DHE <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3396
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function dmu_objset_userquota_get_ids() checks and uses dn->dn_bonus
outside of dn_struct_rwlock. If the dnode is being freed then the bonus
dbuf may be in the process of getting evicted. In this case there is a
race that may cause dmu_objset_userquota_get_ids() to access the dbuf
after it has been destroyed. To prevent this, ensure that when we are
using the bonus dbuf we are either holding a reference on it or have
taken dn_struct_rwlock.
Signed-off-by: Ned Bass <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3443
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Don't check db->bb_blkid, but use the blkid argument instead.
Checking db->db_blkid may be unsafe since we doesn't yet have a
hold on the dbuf so its validity is unknown.
- Call mutex_exit() on found_db, not db, since it's not certain that
they point to the same dbuf, and the mutex was taken on found_db.
Signed-off-by: Ned Bass <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #3443
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5243 zdb -b could be much faster
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/5243
https://github.com/illumos/illumos-gate/commit/f7950bf
Ported-by: Don Brady <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3414
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There seems to be a annoying problem using NFSv4 to access ZFS
file systems under certain circumstances. It's easily reproduced:
nfs_client1: mount server:/export /mnt
nfs_client1: cd /mnt
nfs_client1: echo foo >junk
nfs_client1: cat junk
foo
Now on a different NFSv4 client:
nfs_client2: mount server:/export /mnt
nfs_client2: cd /mnt
nfs_client2: vi junk
# Make some changes to /mnt/junk and save
# This change the inode associated with /mnt/junk
Now back to the original client:
nfs_client1: cat junk
cat: junk: No such file or directory
Admittedly NFSv4 is not advertised as a cluster file system that
maintains a completely coherent view of data across multiple nodes.
But it does have some mechanisms built in that try to deal with
situations like the above. Namely, it employs specialized file
handle lookup routines that return ESTALE when a file handle contains
a non-existant inode value. The ESTALE return triggers a return
full file path lookup from the client to determine if the file has
actually gone away or if the cached file handle is no longer valid.
ZFS behavior can be brought into line with other file systems
(e.g., ext4) by applying the following patch:
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3224
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Per the documentation for dnode_next_offset in dnode.c, the "txg"
parameter specifies a lower bound on which transaction the dnode can
be found in. We are interested in all dnodes that are removed between
the first and last transaction in the snapshot. It doesn't need to be
created in that snapshot to correspond to a removed file.
In fact, the behavior of zfs diff in the test case exactly matches
this: the transaction that created the data that was deleted in snapshot
"2" was produced before, in snapshot "1", definitely predating the first
transaction in snapshot "2".
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <Tim Chase <[email protected]>
Closes #2081
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When ASSERTs are compiled out by using the --disable-debug configure
option. Then the local variable 'dsl_pool_t *dp' will be unused and
generate a compiler warning. Since this variable is only used once
in the ASSERT replace it with 'ds->ds_dir->dd_pool'.
This has the additional advantage of potentially saving a few bytes
on the stack depending on how gcc decides to compile the function.
This issue was not noticed immediately because the automated builders
use --enable-debug to make the testing as rigorous as possible.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chris Dunlap <[email protected]>
Closes #3410
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lzc_send_space when source is a bookmark
5765 add support for estimating send stream size with lzc_send_space when source is a bookmark
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: Steven Hartland <[email protected]>
Reviewed by: Bayard Bell <[email protected]>
Approved by: Albert Lee <[email protected]>
References:
https://www.illumos.org/issues/5765
https://github.com/illumos/illumos-gate/commit/643da460
Porting notes:
* Unused variable 'recordsize' in dmu_send_estimate() dropped
Ported-by: DHE <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3397
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5393 spurious failures from dsl_dataset_hold_obj()
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Will Andrews <[email protected]>
Reviewed by: Prakash Surya <[email protected]>
Reviewed by: Steven Hartland <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/5393
https://github.com/illumos/illumos-gate/commit/e1f3c20
Ported-by: Brian Behlendorf <[email protected]>
Closes #3403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
on boot
5562 ZFS sa_handle's violate kmem invariants, debug kernels panic on boot
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Robert Mustacchi <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Rich Lowe <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/5562
https://github.com/illumos/illumos-gate/commit/0fda3cc5
Ported-by: DHE <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3388
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5810 zdb should print details of bpobj
Reviewed by: Prakash Surya <[email protected]>
Reviewed by: Alex Reece <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Will Andrews <[email protected]>
Reviewed by: Simon Klinkert <[email protected]>
Approved by: Gordon Ross <[email protected]>
References:
https://www.illumos.org/issues/5810
https://github.com/illumos/illumos-gate/commit/732885fc
Ported-by: DHE <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3387
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5351 scrub goes for an extra second each txg
5352 scrub should pause when there is some dirty data
Author: Matthew Ahrens <[email protected]>
Reviewed by: Alex Reece <[email protected]>
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/5351
https://github.com/illumos/illumos-gate/commit/6f6a76a
Ported-by: Chris Dunlop <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3383
|