| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
The dsl_dataset_rollback_check() function is executed in the
txg_sync context. To prevent a potential deadlock due to direct
memory reclaim it must use KM_PUSHPAGE. This was introduced by
the recent 'zfs bookmark' features, commit da53684.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Eric Dillmann <[email protected]>
Closes #2569
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4914 zfs on-disk bookmark structure should be named *_phys_t
Reviewed by: George Wilson <[email protected]>
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: Richard Lowe <[email protected]>
Reviewed by: Saso Kiselkov <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
References:
https://www.illumos.org/issues/4914
https://github.com/illumos/illumos-gate/commit/7802d7b
Porting notes:
There were a number of zfsonlinux-specific uses of zbookmark_t which
needed to be updated. This should reduce the likelihood of further
problems like issue #2094 from occurring.
Ported by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2558
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4881 zfs send performance degradation when embedded block pointers
are encountered
Reviewed by: George Wilson <[email protected]>
Reviewed by: Christopher Siden <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/4881
https://github.com/illumos/illumos-gate/commit/06315b7
Ported by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2547
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4897 Space accounting mismatch in L2ARC/zpool
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Boris Protopopov <[email protected]>
Approved by: Dan McDonald <[email protected]>
From the illumos issue tracker:
L2ARC vdev space usage statistics are calculated as the delta
between the maximum and minimum vdev offset ever written to
by the L2ARC fill thread, but do not inform the user of how
much space in between these two offsets is actually taken up by
cached buffers. This fix changes that so that vdev space usage
stats on L2ARC devices accurately track the volume of buffers
stored on them, allowing users to see the exact L2ARC usage in
"zpool iostat -v".
References:
https://www.illumos.org/issues/4897
https://github.com/illumos/illumos-gate/commit/3038a2b
Ported by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2555
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4390 i/o errors when deleting filesystem/zvol can lead to space map corruption
Reviewed by: George Wilson <[email protected]>
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Reviewed by: Saso Kiselkov <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/4390
https://github.com/illumos/illumos-gate/commit/7fd05ac
Porting notes:
Previous stack-reduction efforts in traverse_visitb() caused a fair
number of un-mergable pieces of code. This patch should reduce its
stack footprint a bit more.
The new local bptree_entry_phys_t in bptree_add() is dynamically-allocated
using kmem_zalloc() for the purpose of stack reduction.
The new global zfs_free_leak_on_eio has been defined as an integer
rather than a boolean_t as was the case with the related zfs_recover
global. Also, zfs_free_leak_on_eio's definition has been inserted into
zfs_debug.c for consistency with the existing definition of zfs_recover.
Illumos placed it in spa_misc.c.
Ported by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2545
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4757 ZFS embedded-data block pointers ("zero block compression")
4913 zfs release should not be subject to space checks
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Max Grossman <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/4757
https://www.illumos.org/issues/4913
https://github.com/illumos/illumos-gate/commit/5d7b4d4
Porting notes:
For compatibility with the fastpath code the zio_done() function
needed to be updated. Because embedded-data block pointers do
not require DVAs to be allocated the associated vdevs will not
be marked and therefore should not be unmarked.
Ported by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2544
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewed by: George Wilson <[email protected]>
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Approved by: Richard Lowe <[email protected]>
Description from Matt Ahrens's bug report at Delphix:
Add a new zfs property, "redundant_metadata" which can have values
"all" or "most". The default will be "all", which is the current
behavior. Setting to "most" will cause us to only store 1 copy of
level-1 indirect blocks of user data files.
Additional notes:
The new man page section for this property states
"The exact behavior of which metadata blocks
are stored redundantly may change in future releases."
and:
"When set to most, ZFS stores an extra copy of most types of
metadata. This can improve performance of random writes,
because less metadata must be written."
The current implementation is as described above in Matt's blog.
It is controlled by a new global integer
"zfs_redundant_metadata_most_ditto_level", currently initialized
to 2. When "redundant_metadata" is set to "most", only indirect
blocks of the specified level and higher will have additional ditto
blocks created.
Ported by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2542
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4754 io issued to near-full luns even after setting noalloc threshold
4755 mg_alloc_failures is no longer needed
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/4754
https://www.illumos.org/issues/4755
https://github.com/illumos/illumos-gate/commit/b6240e8
Ported by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2533
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4374 dn_free_ranges should use range_tree_t
Reviewed by: George Wilson <[email protected]>
Reviewed by: Max Grossman <[email protected]>
Reviewed by: Christopher Siden <[email protected]
Reviewed by: Garrett D'Amore <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/4374
https://github.com/illumos/illumos-gate/commit/bf16b11
Ported by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2531
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4369 implement zfs bookmarks
4368 zfs send filesystems from readonly pools
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: George Wilson <[email protected]>
Approved by: Garrett D'Amore <[email protected]>
References:
https://www.illumos.org/issues/4369
https://www.illumos.org/issues/4368
https://github.com/illumos/illumos-gate/commit/78f1710
Ported by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2530
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: Josef 'Jeff' Sipek <[email protected]>
Approved by: Garrett D'Amore <[email protected]>a
References:
https://www.illumos.org/issues/4370
https://www.illumos.org/issues/4371
https://github.com/illumos/illumos-gate/commit/43466aa
Ported by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2529
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4171 clean up spa_feature_*() interfaces
4172 implement extensible_dataset feature for use by other zpool features
Reviewed by: Max Grossman <[email protected]>
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Jerry Jelinek <[email protected]>
Approved by: Garrett D'Amore <[email protected]>a
References:
https://www.illumos.org/issues/4171
https://www.illumos.org/issues/4172
https://github.com/illumos/illumos-gate/commit/2acef22
Ported-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2528
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of the write throttle & i/o schedule performance work the
zfs_trunc() function should have been updated to use TXG_WAIT.
Using TXG_WAIT ensures that the tx will be part of the next txg.
If TXG_NOWAIT is used and retried for ERESTART errors then the
tx can suffer from starvation.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ned Bass <[email protected]>
Closes #2488
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4756 metaslab_group_preload() could deadlock
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Reviewed by: Saso Kiselkov <[email protected]>
Approved by: Garrett D'Amore <[email protected]>
The metaslab_group_preload() function grabs the mg_lock and then later
tries to grab the metaslab lock. This lock ordering may lead to a
deadlock since other consumers of the mg_lock will grab the metaslab
lock first.
References:
https://www.illumos.org/issues/4756
https://github.com/illumos/illumos-gate/commit/30beaff
Ported-by: Prakash Surya <[email protected]>
Signed-off-by: Prakash Surya <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2488
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4730 metaslab group taskq should be destroyed in metaslab_group_destroy()
Reviewed by: Alex Reece <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Sebastien Roy <[email protected]>
Reviewed by: Rich Lowe <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/4730
https://github.com/illumos/illumos-gate/commit/be08211
Porting notes:
Under ZFSonlinux, one of the effects of not destroying the taskq is that
zdb would never exit (due to the SPL taskq implementation).
Ported-by: Tim Chase <[email protected]>
Signed-off-by: Prakash Surya <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2488
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4101 metaslab_debug should allow for fine-grained control
4102 space_maps should store more information about themselves
4103 space map object blocksize should be increased
4105 removing a mirrored log device results in a leaked object
4106 asynchronously load metaslab
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Sebastien Roy <[email protected]>
Approved by: Garrett D'Amore <[email protected]>
Prior to this patch, space_maps were preferred solely based on the
amount of free space left in each. Unfortunately, this heuristic didn't
contain any information about the make-up of that free space, which
meant we could keep preferring and loading a highly fragmented space map
that wouldn't actually have enough contiguous space to satisfy the
allocation; then unloading that space_map and repeating the process.
This change modifies the space_map's to store additional information
about the contiguous space in the space_map, so that we can use this
information to make a better decision about which space_map to load.
This requires reallocating all space_map objects to increase their
bonus buffer size sizes enough to fit the new metadata.
The above feature can be enabled via a new feature flag introduced by
this change: com.delphix:spacemap_histogram
In addition to the above, this patch allows the space_map block size to
be increase. Currently the block size is set to be 4K in size, which has
certain implications including the following:
* 4K sector devices will not see any compression benefit
* large space_maps require more metadata on-disk
* large space_maps require more time to load (typically random reads)
Now the space_map block size can adjust as needed up to the maximum size
set via the space_map_max_blksz variable.
A bug was fixed which resulted in potentially leaking an object when
removing a mirrored log device. The previous logic for vdev_remove() did
not deal with removing top-level vdevs that are interior vdevs (i.e.
mirror) correctly. The problem would occur when removing a mirrored log
device, and result in the DTL space map object being leaked; because
top-level vdevs don't have DTL space map objects associated with them.
References:
https://www.illumos.org/issues/4101
https://www.illumos.org/issues/4102
https://www.illumos.org/issues/4103
https://www.illumos.org/issues/4105
https://www.illumos.org/issues/4106
https://github.com/illumos/illumos-gate/commit/0713e23
Porting notes:
A handful of kmem_alloc() calls were converted to kmem_zalloc(). Also,
the KM_PUSHPAGE and TQ_PUSHPAGE flags were used as necessary.
Ported-by: Tim Chase <[email protected]>
Signed-off-by: Prakash Surya <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2488
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes moves the called to metaslab_group_alloc_update() to the
metaslab_sync_reassess() function. The original placement of the call
within metaslab_sync_done() appears to have been a simple mistake,
introduced by ac72fac3eaa569902cad88053167f7d74e7fe7e4.
This aligns us more closely to the upstream illumos code base.
Signed-off-by: Prakash Surya <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update the current code to ensure inodes are never dirtied if they are
part of a read-only file system or snapshot. If they do somehow get
dirtied an attempt will make made to write them to disk. In the case
of snapshots, which don't have a ZIL, this will result in a NULL
dereference in zil_commit().
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2405
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4168 ztest assertion failure in dbuf_undirty
4169 verbatim import causes zdb to segfault
4170 zhack leaves pool in ACTIVE state
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Eric Schrock <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/4168
https://www.illumos.org/issues/4169
https://www.illumos.org/issues/4170
https://github.com/illumos/illumos-gate/commit/7fdd916
Porting notes:
Of particular interest when troubleshooting corrupted pools, the
commonly-used "zdb -e" operation may perform verbatim imports and
furthermore, it will soon have direct support for verbatim imports via
a new "-V" option. The 4169 fix eliminates a common segfault case in
which spa_history_log_version() tries to access an un-opened dsl_pool_t.
Ported by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2451
Closes #2283
Closes #2467
|
|
|
|
|
|
|
|
|
|
| |
The parameter was added as illumos issue 4081 which was committed to
zfsonlinux in ac72fac3eaa569902cad88053167f7d74e7fe7e4. This patch
documents the parameter and allows for it to be set as a module parameter.
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2483
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the smaller of 2 different sized child vdev's of a mirrored vdev is
detached, and the pool has the autoexpand property set to off, as the
remaining larger vdev is promoted to a top level vdev it fails to retain
the asize of the original top level mirror vdev and therefore partially
autoexpands.
This partially autoexpanded state leaves the new vdev too large to
re-mirror by adding the smaller vdev back in, and the pool fails to
utilize the space until next imported.
If the autoexpand property is set to on, the child vdev grows
in size after it has been promoted to a top level vdev as expected.
This commit causes the remaining child mirror to retain the asize of its
old parent mirror vdev if the autoexpand property is set to off,
this allows the smaller vdev to be re-added if required the vdev
can then be told to expand if required by the usual using zpool online -e.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Andrew Barnes <[email protected]>
Signed-off-by: George Wilson <[email protected]>
Closes #1208
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4936 lz4 could theoretically overflow a pointer with a certain input
Reviewed by: Saso Kiselkov <[email protected]>
Reviewed by: Keith Wesolowski <[email protected]>
Approved by: Gordon Ross <[email protected]>
Ported by: Tim Chase <[email protected]>
References:
https://illumos.org/issues/4936
https://github.com/illumos/illumos-gate/commit/58d0718
Porting notes:
This fixes the widely-reported "20-year-old vulnerability" in
LZO/LZ4 implementations which inherited said bug from the reference
implementation.
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2429
|
|
|
|
|
|
|
|
|
|
| |
Added several comments regarding the removal of real_LZ4_uncompress()
which exists in the reference implementation but has been removed here
since it's not used.
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 91579709fccd3e55a21970742b66c388fb1403db which
limited the asynchronous dispatch to kernel space. We want to do
this for two reasons:
1) While we have slightly more headroom in user space excessively
deep stacks have been observed while running ztest, see #2293.
2) Removing this conditional makes the pipeline behave consistently
regardless of if it's executing in kernel space or user space.
This way we're more likely to uncover subtle issues with ztest.
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2384
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We should not override the default memory type of the kmem cache. This
was done previously to force certain objects which were slightly over
object size limit cut off in to KMC_KMEM caches for better performance.
The zfsonlinux/spl#356 patch slightly increases the default cut off
from 511 bytes 1024 bytes for x86_64. This means there is long longer
a need to override the default for the caches. And since the default
values are now being used the new spl_kmem_cache_slab_limit and
spl_kmem_cache_kmem_limit tunables will apply to all kmem caches.
The following is a list of caches that will be impacted:
| object size | forced type | default type
----------------- | ------------- | ------------- | --------------
dnode_t | 936 bytes | KMC_KMEM | KMC_KMEM
zio_cache | 1104 bytes | *KMC_KMEM | *KMC_VMEM
zio_link_cache | 48 bytes | KMC_KMEM | KMC_KMEM
zio_vdev_cache | 131088 bytes | KMC_VMEM | KMC_VMEM
zio_buf_512 | 512 bytes | KMC_KMEM | KMC_KMEM
zio_data_buf_512 | 512 bytes | KMC_KMEM | KMC_KMEM
zio_buf_1024 | 1024 bytes | KMC_KMEM | KMC_KMEM
zio_data_buf_1024 | 1024 bytes | +KMC_VMEM | +KMC_KMEM
* Cache memory type will change from KMC_KMEM to KMC_VMEM.
+ Cache memory type will change from KMC_VMEM to KMC_KMEM.
This patch removes another slight point of divergence between ZoL
and Illumos.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Prakash Surya <[email protected]>
Closes #2337
|
|
|
|
|
|
|
|
|
|
| |
For consistency with disk vdevs honor the zfs_nocacheflush tunable.
This setting is available primarily for debugging and performance
analysis.
Signed-off-by: HC <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2336
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the case where a variable-sized SA overlaps the spill block pointer and
a new variable-sized SA is being added, the header size was improperly
calculated to include the to-be-moved SA. This problem could be
reproduced when xattr=sa enabled as follows:
ln -s $(perl -e 'print "x" x 120') blah
setfattr -n security.selinux -v blahblah -h blah
The symlink is large enough to interfere with the spill block pointer and
has a typical SA registration as follows (shown in modified "zdb -dddd"
<SA attr layout obj> format):
[ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ]
Adding the SA xattr will attempt to extend the registration to:
[ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ZPL_DXATTR ]
but since the ZPL_SYMLINK SA interferes with the spill block pointer, it
must also be moved to the spill block which will have a registration of:
[ ZPL_SYMLINK ZPL_DXATTR ]
This commit updates extra_hdrsize when this condition occurs, allowing
hdrsize to be subsequently decreased appropriately.
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ned Bass <[email protected]>
Issue #2214
Issue #2228
Issue #2316
Issue #2343
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Restructure the zfsdev_state_list to allow for lock-free reading by
converting to a simple singly-linked list from which items are never
deleted and over which only forward iterations are performed. It depends
on, among other things, the atomicity of accessing the zs_minor integer
and zs_next pointer.
This fixes a lock inversion in which the zfsdev_state_lock is used by
both the sync task (txg_sync) and indirectly by any user program which
uses /dev/zfs; the zfsdev_release method uses the same lock and then
blocks on the sync task.
The most typical failure scenerio occurs when the sync task is cleaning
up a user hold while various concurrent "zfs" commands are in progress.
Neither Illumos nor Solaris are affected by this issue because they use
DDI interface which provides lock-free reading of device state via the
ddi_get_soft_state() function.
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2301
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally, vdev_file used system_taskq. This would cause a deadlock,
especially on system with few CPUs. The reason is that the prefetcher
threads, which are on system_taskq, will sometimes be blocked waiting
for I/O to finish. If the prefetcher threads consume all the tasks in
system_taskq, the I/O cannot be served and thus results in a deadlock.
We fix this by creating a dedicated vdev_file_taskq for vdev_file I/O.
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2270
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The dva_get_dsize_sync() function incorrectly assumes that the call
to vdev_lookup_top() cannot fail. However, the NULL dereference at
clearly shows that under certain circumstances it is possible. Note
that offset 0x570 (1376) maps as expected to vd->vdev_deflate_ratio.
BUG: unable to handle kernel NULL pointer dereference at 00000570
crash> struct -o vdev
struct vdev {
[0] uint64_t vdev_id;
... ...
[1376] uint64_t vdev_deflate_ratio;
Given that this can happen this patch add the required error handling.
In the case where vdev_lookup_top() fails assume that no deflation
will occur for the DVA and use the asize.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ned Bass <[email protected]>
Signed-off-by: Alexey Zhuravlev <[email protected]>
Closes #1707
Closes #1987
Closes #1891
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When fetching property values of snapshots, a check against the head
dataset type must be performed. Previously, this additional check was
performed only when fetching "version", "normalize", "utf8only" or "case".
This caused the ZPL properties "acltype", "exec", "devices", "nbmand",
"setuid" and "xattr" to be erroneously displayed with meaningless values
for snapshots of volumes. It also did not allow for the display of
"volsize" of a snapshot of a volume.
This patch adds the headcheck flag paramater to zfs_prop_valid_for_type()
and zprop_valid_for_type() to indicate the check is being done
against a head dataset's type in order that properties valid only for
snapshots are handled correctly. This allows the the head check in
get_numeric_property() to be performed when fetching a property for
a snapshot.
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2265
|
|
|
|
|
|
|
| |
A minor style issue was accidentally introduced by aa7d06a.
This change resolves that style problem.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Today the metaslab_debug logic performs two tasks:
- load all metaslabs on import/open
- don't unload metaslabs at the end of spa_sync
This change provides knobs for each of these independently.
References:
https://illumos.org/issues/4101
https://github.com/illumos/illumos-gate/commit/0713e23
Notes:
1) This is a small piece of the metaslab improvement patch from
Illumos. It was worth bringing over before the rest, since it's
low risk and it can be useful on fragmented pools (e.g. Lustre
MDTs). metaslab_debug_unload would give the performance benefit
of the old metaslab_debug option without causing unwanted delay
during pool import.
Ported-by: Ned Bass <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2227
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the system attributes (SAs) for an object exceed what can
can be stored in the bonus area of a dnode a spill block is
allocated. These spill blocks are currently considered data
blocks. However, they should be accounted for as meta data
because they are effectively an extension of the dnode.
While this may seem like a minor accounting issue it has broader
implications. The key thing to be aware of is that each spill
block will hold a reference on its parent dnode. The dnode in
turn holds a reference on its dbuf in the dnode object. This
means that a single 512 byte data buffer for a spill block can
pin over 16k of meta data. This is analogous to the small file
situation described in 2b13331 where a relatively small number
of data buffer can cause the ARC to exceed the meta limit.
However, unlike the small file case a spill block can legitimately
be considered meta data. By changing the spill block to meta data
they will now be dropped from the cache when the meta limit is
reached. This then allows the dnodes and dbufs which the spill
block was pinning to be released.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Prakash Surya <[email protected]>
Closes #2294
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As described in the comment above dmu_tx_assign() this function must
only fail if the pool is out of space. If for some other reason the
TX cannot be assigned (such as memory pressure) ERESTART must be
returned. Alternately, EAGAIN could be returned to inject a delay
but that isn't required because the caller will block on the condition
variable waiting for the next TXG.
/*
* Assign tx to a transaction group. txg_how can be one of:
*
* (1) TXG_WAIT. If the current open txg is full, waits until there's
* a new one. This should be used when you're not holding locks.
* It will only fail if we're truly out of space (or over quota).
* ...
*/
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ned Bass <[email protected]>
Closes #2287
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We add support for lsattr and chattr to resolve a regression caused
by 88c283952f0bfeab54612f9ce666601d83c4244f that broke Python's
xattr.list(). That changet broke Gentoo Portage's FEATURES=xattr,
which depended on Python's xattr.list().
Only attributes common to both Solaris and Linux are supported. These
are 'a', 'd' and 'i' in Linux's lsattr and chattr commands. File
attributes exclusive to Solaris are present in the ZFS code, but cannot
be accessed or modified through this method. That was the case prior to
this patch. The resolution of issue zfsonlinux/zfs#229 should implement
some method to permit access and modification of Solaris-specific
attributes.
References:
https://bugs.gentoo.org/show_bug.cgi?id=483516
Original-patch-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #1691
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some network related block device uses tcp_sendpage, which doesn't
behave well when using 0-count page. Add assertion to catch them.
This has a runtime dependency on:
zfsonlinux/spl@ae16ed9 Fix crash when using ZFS on Ceph rbd
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2277
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Endianness detection in LZ4 is broken in user-space builds. This
bug corrupts compressed data and manifests itself in several ztest
failures. When LZ4 was originally ported to Illumos ZFS, the proper
checks for Linux were stripped out. The Linux port then inherited
the remaining detection code that works on Illumos but not on Linux.
The current LZ4 endianness check misuses the condition
defined(__BIG_ENDIAN) to indicate a big-endian system. On Linux
__BIG_ENDIAN is defined uncondtionally in the user-space header
/usr/include/endian.h, regardless of the endianness of the system.
The kernel does not use this header, so only user-space builds are
affected.
While we could fix this by restoring the upstream LZ4 endianness
detection code, reliable checks already exist in
libspl/include/sys/isa_defs.h. This change uses the libspl results
to replace the word-size and endianness checks in LZ4, simplifying
the code and reducing duplication.
Signed-off-by: Ned Bass <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: DHE <[email protected]>
Signed-off-by: Prakash Surya <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Fixes #1963
Fixes #1964
Fixes #1965
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to an asymmetry in the kmem accounting a memory leak was being
reported when it was only an accounting issue. All memory allocated
with kmem_alloc() must be released with kmem_free() or it will not
be properly accounted for.
In this case the code used strfree() to release the memory allocated
by kmem_alloc(). Presumably this was done because the size of the
memory region wasn't available when the memory needed to be freed.
To resolve this issue the code has been updated to use strdup() instead
of kmem_alloc() to allocate the memory. Like strfree(), strdup() is
not integrated with the memory accounting. This means we can use
strfree() to release it like Illumos.
SPL: kmem leaked 10/4368729 bytes
address size data func:line
ffff880067e9aa40 10 ZZZZZZZZZZ zfsdev_ioctl:5655
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #2262
|
|
|
|
|
|
|
|
| |
Caught by ztest and valgrind.
Signed-off-by: DHE <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2259
|
|
|
|
|
|
|
|
|
| |
Also, make sure we use clock_t for ddi_get_lbolt to prevent type conversion
from screwing things.
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #2142
|
|
|
|
|
|
|
|
|
|
| |
rq_for_each_segment changed from taking bio_vec * to taking bio_vec.
We provide rq_for_each_segment4 which takes both.
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #2124
|
|
|
|
|
|
|
|
|
|
|
|
| |
After the dmu_req_copy change, bi_io_vecs are not touched, so this is no
longer needed.
This reverts commit e26ade5101ba1d8e8350ff1270bfca4258e1ffe3.
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #2124
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally, dmu_req_copy modifies bv_len and bv_offset in bio_vec so that it
can continue in subsequent passes. However, after the immutable biovec changes
in Linux 3.14, this is not allowed. So instead, we just tell dmu_req_copy how
many bytes are already copied and it will skip to the right spot accordingly.
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #2124
|
|
|
|
|
|
|
|
|
|
| |
bi_sector, bi_size and bi_idx are moved from bio to bio->bi_iter.
This patch creates BIO_BI_*(bio) macros to hide the differences.
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #2124
|
|
|
|
|
|
|
|
|
| |
posix_acl_{create,chmod} is changed to __posix_acl_{create_chmod}
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #2124
|
|
|
|
|
| |
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zfsonlinux/zfs#180 occurred because of a race between inode eviction and
zfs_zget(). zfsonlinux/zfs@36df284 tried to address it by making a call
to the VFS to learn whether an inode is being evicted. If it was being
evicted the operation was retried after dropping and reacquiring the
relevant resources. Unfortunately, this introduced another deadlock.
INFO: task kworker/u24:6:891 blocked for more than 120 seconds.
Tainted: P O 3.13.6 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u24:6 D ffff88107fcd2e80 0 891 2 0x00000000
Workqueue: writeback bdi_writeback_workfn (flush-zfs-5)
ffff8810370ff950 0000000000000002 ffff88103853d940 0000000000012e80
ffff8810370fffd8 0000000000012e80 ffff88103853d940 ffff880f5c8be098
ffff88107ffb6950 ffff8810370ff980 ffff88103a9a5b78 0000000000000000
Call Trace:
[<ffffffff813dd1d4>] schedule+0x24/0x70
[<ffffffff8115fc09>] __wait_on_freeing_inode+0x99/0xc0
[<ffffffff8115fdd8>] find_inode_fast+0x78/0xb0
[<ffffffff811608c5>] ilookup+0x65/0xd0
[<ffffffffa035c5ab>] zfs_zget+0xdb/0x260 [zfs]
[<ffffffffa03589d6>] zfs_get_data+0x46/0x340 [zfs]
[<ffffffffa035fee1>] zil_add_block+0xa31/0xc00 [zfs]
[<ffffffffa0360642>] zil_commit+0x12/0x20 [zfs]
[<ffffffffa036a6e4>] zpl_putpage+0x174/0x840 [zfs]
[<ffffffff811071ec>] do_writepages+0x1c/0x40
[<ffffffff8116df2b>] __writeback_single_inode+0x3b/0x2b0
[<ffffffff8116ecf7>] writeback_sb_inodes+0x247/0x420
[<ffffffff8116f5f3>] wb_writeback+0xe3/0x320
[<ffffffff81170b8e>] bdi_writeback_workfn+0xfe/0x490
[<ffffffff8106072c>] process_one_work+0x16c/0x490
[<ffffffff810613f3>] worker_thread+0x113/0x390
[<ffffffff81066edf>] kthread+0xdf/0x100
This patch implements the original fix in a slightly different manner in
order to avoid both deadlocks. Instead of relying on a call to ilookup()
which can block in __wait_on_freeing_inode() the return value from igrab()
is used. This gives us the information that ilookup() provided without
the risk of a deadlock.
Alternately, this race could be closed by registering an sops->drop_inode()
callback. The callback would need to detect the active SA hold thereby
informing the VFS that this inode should not be evicted.
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #180
|
|
|
|
|
|
| |
This reverts commit 36df284366caa77cb40083d2e6bcce02274e2f05.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a vdev starts getting I/O or checksum errors it is now
possible to automatically rebuild to a hot spare device.
To cleanly support this functionality in a shell script some
additional information was added to all zevent ereports which
include a vdev. This covers both io and checksum zevents but
may be used but other scripts.
In the Illumos FMA solution the same information is required
but it is retrieved through the libzfs library interface.
Specifically the following members were added:
vdev_spare_paths - List of vdev paths for all hot spares.
vdev_spare_guids - List of vdev guids for all hot spares.
vdev_read_errors - Read errors for the problematic vdev
vdev_write_errors - Write errors for the problematic vdev
vdev_cksum_errors - Checksum errors for the problematic vdev.
By default the required hot spare scripts are installed but this
functionality is disabled. To enable hot sparing uncomment the
ZED_SPARE_ON_IO_ERRORS and ZED_SPARE_ON_CHECKSUM_ERRORS in the
/etc/zfs/zed.d/zed.rc configuration file.
These scripts do no add support for the autoexpand property. At
a minimum this requires adding a new udev rule to detect when
a new device is added to the system. It also requires that the
autoexpand policy be ported from Illumos, see:
https://github.com/illumos/illumos-gate/blob/master/usr/src/cmd/syseventd/modules/zfs_mod/zfs_mod.c
Support for detecting the correct name of a vdev when it's not
a whole disk was added by Turbo Fredriksson.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chris Dunlap <[email protected]>
Signed-off-by: Turbo Fredriksson <[email protected]>
Issue #2
|