| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a '-f' option to 'zpool offline' to fault a vdev
instead of bringing it offline. Unlike the OFFLINE state, the
FAULTED state will trigger the FMA code, allowing for things like
autoreplace and triggering the slot fault LED. The -f faults
persist across imports, unless they were set with the temporary
(-t) flag. Both persistent and temporary faults can be cleared
with zpool clear.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #6094
|
|
|
|
|
|
|
|
|
|
|
| |
One pre-check in zfs_ereport_start() was being called after
the nvlists were being allocated. This simply corrects that
issue.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #6140
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bit 21 of the send stream flags was inadvertently used for two
different features under concurrent development. To avoid any
future compatibility problems the large dnode flag is being
switched to bit 23 which is unused.
The large dnode feature has only been present in pre-releases of
ZoL and dnodesize defaults to legacy which is compatible with
existing OpenZFS implementations. Users with dnodesize=auto
needing to use zfs send/recv must update ZoL on both the
source and destination systems.
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed-by: Ned Bass <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #6139
|
|
|
|
|
|
|
|
|
|
| |
In glibc-2.23 <sys/sysmacros.h> isn't automatically included in
<sys/types.h> [1], so we need ot explicitely include it.
https://sourceware.org/ml/libc-alpha/2015-11/msg00253.html
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Justin Lecher <[email protected]>
Closes #6132
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lock is designed to protect internal state of zvol_state_t and
to avoid taking spa_namespace_lock (e.g. in dmu_objset_own() code path)
while holding zvol_stat_lock. Refactor the code accordingly.
Signed-off-by: Boris Protopopov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3484
Closes #6065
Closes #6134
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix lock order inversion with zvol_open() as it did not account
for use of zvols as vdevs. The latter use cases resulted in the
lock order inversion deadlocks that involved spa_namespace_lock
and bdev->bd_mutex.
Signed-off-by: Boris Protopopov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #6065
Issue #6134
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On a raidz vdev, a block that does not span all child vdevs, excluding
its skip sectors if any, may not be affected by a child vdev outage or
failure. In such cases, the block does not need to be resilvered.
However, current resilver algorithm simply resilvers all blocks on a
degraded raidz vdev. Such spurious IO is not only wasteful, but also
adds the risk of overwriting good data.
This patch eliminates such spurious IOs.
Reviewed-by: Gvozden Neskovic <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Signed-off-by: Isaac Huang <[email protected]>
Closes #5316
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable additional test cases, in most cases this required a few
minor modifications to the test scripts. In a few cases a real
bug was uncovered and fixed. And in a handful of cases where pools
are layered on pools the test case will be skipped until this is
supported. Details below for each test case.
* zpool_add_004_pos - Skip test on Linux until adding zvols to pools
is fully supported and deadlock free.
* zpool_add_005_pos.ksh - Skip dumpadm portion of the test which isn't
relevant for Linux. The find_vfstab_dev, find_mnttab_dev, and
save_dump_dev functions were updated accordingly for Linux. Add
O_EXCL to the in-use check to prevent the -f (force) option from
working for mounted filesystems and improve the resulting error.
* zpool_add_006_pos - Update test case such that it doesn't depend
on nested pools. Switch to truncate from mkfile to reduce space
requirements and speed up the test case.
* zpool_clear_001_pos - Speed up test case by filling filesystem to
25% capacity.
* zpool_create_002_pos, zpool_create_004_pos - Use sparse files for
file vdevs in order to avoid increasing the partition size.
* zpool_create_006_pos - 6ba1ce9 allows raidz+mirror configs with
similar redundancy. Updating the valid_args and forced_args cases.
* zpool_create_008_pos - Disable overlapping partition portion.
* zpool_create_011_neg - Fix to correctly create the extra partition.
Modified zpool_vdev.c to use fstat64_blk() wrapper which includes
the st_size even for block devices.
* zpool_create_012_neg - Updated to properly find swap devices.
* zpool_create_014_neg, zpool_create_015_neg - Updated to use
swap_setup() and swap_cleanup() wrappers which do the right thing
on Linux and Illumos. Removed '-n' option which succeeds under
Linux due to differences in the in-use checks.
* zpool_create_016_pos.ksh - Skipped test case isn't useful.
* zpool_create_020_pos - Added missing / to cleanup() function.
Remove cache file prior to test to ensure a clean environment
and avoid false positives.
* zpool_destroy_001_pos - Removed test case which creates a pool on
a zvol. This is more likely to deadlock under Linux and has never
been completely supported on any platform.
* zpool_destroy_002_pos - 'zpool destroy -f' is unsupported on Linux.
Mount point must not be busy in order to unmount them.
* zfs_destroy_001_pos - Handle EBUSY error which can occur with
volumes when racing with udev.
* zpool_expand_001_pos, zpool_expand_003_neg - Skip test on Linux
until adding zvols to pools is fully supported and deadlock free.
The test could be modified to use loop-back devices but it would
be preferable to use the test case as is for improved coverage.
* zpool_export_004_pos - Updated test case to such that it doesn't
depend on nested pools. Normal file vdev under /var/tmp are fine.
* zpool_import_all_001_pos - Updated to skip partition 1, which is
known as slice 2, on Illumos. This prevents overwriting the
default TESTPOOL which was causing the failure.
* zpool_import_002_pos, zpool_import_012_pos - No changes needed.
* zpool_remove_003_pos - No changes needed
* zpool_upgrade_002_pos, zpool_upgrade_004_pos - Root cause addressed
by upstream OpenZFS commit 3b7f360.
* zpool_upgrade_007_pos - Disabled in test case due to known failure.
Opened issue https://github.com/zfsonlinux/zfs/issues/6112
* zvol_misc_002_pos - Updated to to use ext2.
* zvol_misc_001_neg, zvol_misc_003_neg, zvol_misc_004_pos,
zvol_misc_005_neg, zvol_misc_006_pos - Moved to skip list, these
test case could be updated to use Linux's crash dump facility.
* zvol_swap_* - Updated to use swap_setup/swap_cleanup helpers.
File creation switched from /tmp to /var/tmp. Enabled minimal
useful tests for Linux, skip test cases which aren't applicable.
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: loli10K <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #3484
Issue #5634
Issue #2437
Issue #5202
Issue #4034
Closes #6095
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Matthew Ahrens <[email protected]>
Reviewed by: Serapheim Dimitropoulos <[email protected]>
Reviewed by: Pavel Zakharov <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: George Melikov <[email protected]>
A standard practice in ZFS is to keep track of "per-txg" state. Any of
the 3 active TXG's (open, quiescing, syncing) can have different values
for this state. We should assert that we do not attempt to modify other
(inactive) TXG's.
Porting Notes:
- ASSERTV added to txg_sync_waiting() for unused variable.
OpenZFS-issue: https://www.illumos.org/issues/8063
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/01acb46
Closes #6109
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Matthew Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed-by: loli10K <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: Matthew Ahrens <[email protected]>
If we do a scrub while a leaf device is offline (via "zpool offline"),
we will inadvertently clear the DTL (dirty time log) of the offline
device, even though it is still damaged. When the device comes back
online, we will incompletely resilver it, thinking that the scrub
repaired blocks written before the scrub was started. The incomplete
resilver can lead to data loss if there is a subsequent failure of a
different leaf device.
The fix is to never clear the DTL of offline devices. Note that if a
device is onlined while a scrub is in progress, the scrub will be
restarted.
The problem can be worked around by running "zpool scrub" after
"zpool online".
OpenZFS-issue: https://www.illumos.org/issues/8166
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/372
Closes #5806
Closes #6103
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The arc layer tracks checksums of its data in the arc header
so that it can ensure that buffers haven't changed when they're
not supposed to. This checksum is only maintained while there
is an uncompressed buffer still attached to the header.
Unfortunately there is a missing call to arc_free_cksum() in
arc_release() that can trigger ASSERTs. This has not been a
common issue because the checksums are only maintained for
debug builds and triggering the bug requires writing a block
(and therefore calling arc_release()) while a compressed buffer
is still being used on a debug build. This simply corrects the
issue.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #6105
|
|
|
|
|
|
|
|
| |
Linux 4.9 added current_time() as the preferred interface to get
the filesystem time. CURRENT_TIME was retired in Linux 4.12.
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #6114
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows users to specify "-o property=value" to override and
"-x property" to exclude properties when receiving a zfs send stream.
Both native and user properties can be specified.
This is useful when using zfs send/receive for periodic
backup/replication because it lets users change properties such as
canmount, mountpoint, or compression without modifying the source.
References:
https://www.illumos.org/issues/2745
https://www.illumos.org/issues/3753
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #1350
Closes #5349
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Document the existence of `createtxg` and `guid` native properties
in man pages and zfs command output.
One of the great features of ZFS is incremental replication of
snapshots, possibly between pools on different machines.
Shell scripts are commonly used to auomate this procedure. They have to
find the most recent common snapshot between both sides and then
perform incremental send & recv.
Currently, scripts rely on the sorting order of `zfs list`, which
defaults to `createtxg`, and the assumption that snapshot names on
either side do not change.
By making `createtxg` and `guid` part of the public ZFS interface,
scripts are enabled to use
a) `createtxg` to determine the logical & temporal order of snapshots
(the creation property is not an equivalent substitute since
multiple snapshots may be created within one second)
b) `guid` to uniquely identify a snapshot, independent of its current
display name
This has the potential of making scripts safer and correct.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: DHE <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Signed-off-by: Christian Schwarz <[email protected]>
Closes #6102
|
|
|
|
|
|
|
|
|
|
|
| |
A race condition between 'zpool export' and 'zfs create' can crash the
latter: this is because we never check libzfs`zpool_open() return
value in libzfs`zfs_create().
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #6096
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Debian zfs package generated by alien doesn't call the prerm script
(rpm's %preun) with an integer as first parameter, which results in
the following warning:
"zfs.prerm: line 2: [: remove: integer expression expected"
Modify the if-condition to avoid the warning.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #6108
|
|
|
|
|
|
|
|
|
|
| |
CID 161638: Resource leak (RESOURCE_LEAK)
Ensure the string array in print_zpool_script_help
is freed in cases when there is an error.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Giuseppe Di Natale <[email protected]>
Closes #6111
|
|
|
|
|
|
|
|
| |
zfsonlinux/spl@8f87971 added __spl_pf_fstrans_check for the xfs related
check, so we use them accordingly.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #6113
|
|
|
|
|
|
| |
Fourth release candidate.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the lz4_ac local variable from dmu_write_policy() to resolve
the following unused variable warning on non-debug builds.
dmu.c: In function ‘dmu_write_policy’:
dmu.c:1892:12: warning: unused variable ‘lz4_ac’ [-Wunused-variable]
boolean_t lz4_ac = spa_feature_is_active(os->os_spa,
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The proposed debugging enhancements in zfsonlinux/spl#587
identified the following missing *_destroy/*_fini calls.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Gvozden Neskovic <[email protected]>
Closes #5428
|
|
|
|
|
|
|
|
|
| |
Change the default ZVOL behavior so requests are handled asynchronously.
This behavior is functionally the same as in the zfs-0.6.4 release.
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #5902
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux has read-ahead logic designed to accelerate sequential workloads.
ZFS has its own read-ahead logic called zprefetch that operates on both
ZVOLs and datasets. Having two prefetchers active at the same time can
cause overprefetching, which unnecessarily reduces IOPS performance on
CoW filesystems like ZFS.
Testing shows that entirely disabling the Linux prefetch results in
a significant performance penalty for reads while commensurate benefits
are seen in random writes. It appears that read-ahead benefits are
inversely proportional to random write benefits, and so a single page
of Linux-layer read-ahead appears to offer the middle ground for both
workloads.
Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Issue #5902
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current ZVOL implementation does not explicitly set merge
options on ZVOL device queues, which results in the default merge
behavior.
Explicitly set QUEUE_FLAG_NOMERGES on ZVOL queues allowing the
ZIO pipeline to do its work.
Initial benchmarks (tiotest with no O_DIRECT) show random write
performance going up almost 3X on 8K ZVOLs, even after significant
rewrites of the logical space allocation.
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: RageLtMan <rageltman@sempervictus>
Issue #5902
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The send-c_volume test case has been observed to occasionally
fail on 32-bit systems. Until this issue is fully understood
disable this test case.
The rsend_014_pos test case can occasionally fail due to an
EBUSY during export. This can lead to subsequent test failures.
Resolve the issue by retrying the export on EBUSY. Additionally,
remove the gratuitous use of eval.
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #6088
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* zfs_destroy_001_pos - Unable to reproduce the failures locally.
Re-enabled to determine observed buildbot failure rate.
* zfs_destroy_005_neg - Updated for expected Linux behavior.
Busy mount points, even snapshots, are expected to fail.
* zfs_destroy_010_pos - Resolved transient EBUSY with retry.
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #5635
Issue #5893
Closes #6091
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit allow higher ashift values (up to 16) in 'zpool create'
The ashift value was previously limited to 13 (8K block) in b41c990
because the limited number of uberblocks we could fit in the
statically sized (128K) vdev label ring buffer could prevent the
ability the safely roll back a pool to recover it.
Since b02fe35 the largest uberblock size we support is 8K: this
allow us to store a minimum number of 16 uberblocks in the vdev
label, even with higher ashift values.
Additionally change 'ashift' pool property behaviour: if set it will
be used as the default hint value in subsequent vdev operations
('zpool add', 'attach' and 'replace'). A custom ashift value can still
be specified from the command line, if desired.
Finally, fix a bug in add-o_ashift.ksh caused by a missing variable.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #2024
Closes #4205
Closes #4740
Closes #5763
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When vdev_psize increases, the location of labels 2 and 3 changes
because their location is relative to the end of the device.
The configs for labels 2 and 3 are written during the next spa_sync()
because the vdev is added to the dirty config list. However, the
uberblock rings are not re-written in their new location, leaving the
device vulnerable to the beginning of the device being overwritten or
damaged.
This patch copies the uberblock ring from label 0 to labels 2 and 3,
in their new locations, at the next sync after vdev_psize increases.
Also, add a test zpool_expand_004_pos.ksh to confirm the uberblocks
are copied.
Reviewed-by: BearBabyLiu <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #5108
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add zfs_nicebytes() to print human-readable sizes
Some 'zfs', 'zpool' and 'zdb' output strings can be confusing to the
user when no units are specified. This add a new zfs_nicenum_format
"ZFS_NICENUM_BYTES" used to print bytes in their human-readable form.
Additionally, update some test cases to use machine-parsable 'zfs get'.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #2414
Closes #3185
Closes #3594
Closes #6032
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When multiple filesystems are in use, memory pressure causes arc_cache
to collapse to a minimum. Allow arc_cache to maintain proportional size
even when hit rates are disproportionate. We do this only via evictable
size from the kernel shrinker, thus it's only in effect under memory
pressure.
AKAMAI: zfs: CR 3695072
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Debabrata Banerjee <[email protected]>
Closes #6035
|
|
|
|
|
|
|
|
|
|
|
| |
Could return the wrong pages value
AKAMAI: zfs: CR 3695072
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Debabrata Banerjee <[email protected]>
Issue #6035
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calling it when nothing is evictable will cause extra kswapd cpu. Also
if we didn't shrink it's unlikely to have memory to reap because we
likely just called it microseconds ago. The exception is if we are in
direct reclaim.
You can see how hard this is being hit in kswapd with a light test
workload:
34.95% [zfs] [k] arc_kmem_reap_now
5.40% [spl] [k] spl_kmem_cache_reap_now
3.79% [kernel] [k] _raw_spin_lock
2.86% [spl] [k] __spl_kmem_cache_generic_shrinker.isra.7
2.70% [kernel] [k] shrink_slab.part.37
1.93% [kernel] [k] isolate_lru_pages.isra.43
1.55% [kernel] [k] __wake_up_bit
1.20% [kernel] [k] super_cache_count
1.20% [kernel] [k] __radix_tree_lookup
With ZFS just mounted but only ext4/pagecache memory pressure
arc_kmem_reap_now still consumes excessive CPU:
12.69% [kernel] [k] isolate_lru_pages.isra.43
10.76% [kernel] [k] free_pcppages_bulk
7.98% [kernel] [k] drop_buffers
7.31% [kernel] [k] shrink_page_list
6.44% [zfs] [k] arc_kmem_reap_now
4.19% [kernel] [k] free_hot_cold_page
4.00% [kernel] [k] __slab_free
3.95% [kernel] [k] __isolate_lru_page
3.09% [kernel] [k] __radix_tree_lookup
Same pagecache only workload as above with this patch series:
11.58% [kernel] [k] isolate_lru_pages.isra.43
11.20% [kernel] [k] drop_buffers
9.67% [kernel] [k] free_pcppages_bulk
8.44% [kernel] [k] shrink_page_list
4.86% [kernel] [k] __isolate_lru_page
4.43% [kernel] [k] free_hot_cold_page
4.00% [kernel] [k] __slab_free
3.44% [kernel] [k] __radix_tree_lookup
(arc_kmem_reap_now has 0 samples in perf)
AKAMAI: zfs: CR 3695042
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Debabrata Banerjee <[email protected]>
Issue #6035
|
|
|
|
|
|
|
|
|
| |
AKAMAI: zfs: CR 3695072
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Debabrata Banerjee <[email protected]>
Issue #6035
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lock contention, by itself, shouldn't indicate a stop condition to the
kernel's slab shrinker. Doing so can cause stalls when the kernel is
trying to free large parts of the cache such as is done by drop_caches
Also, perhaps arc_reclaim_lock should be a spinlock, and this code
eliminated.
AKAMAI: zfs: CR 3593801
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Debabrata Banerjee <[email protected]>
Issue #6035
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move arcstat_need_free increment from all direct calls to when
arc_reclaim_lock is busy and we exit wihout doing anything. Data will
be reclaimed in reclaim thread. The previous location meant that we
both reclaim the memory in this thread, and also schedule the same
amount of memory for reclaim in arc_reclaim, effectively doubling the
requested reclaim.
AKAMAI: zfs: CR 3695072
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Debabrata Banerjee <[email protected]>
Issue #6035
|
|
|
|
|
|
|
|
|
|
|
| |
Ensures proper accounting of bytes we requested to free
AKAMAI: zfs: CR 3695072
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Debabrata Banerjee <[email protected]>
Issue #6035
|
|
|
|
|
|
|
|
|
|
|
| |
Ghost meta/data buffers are not actually allocated
AKAMAI: zfs: CR 3695072
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Debabrata Banerjee <[email protected]>
Issue #6035
|
|
|
|
|
|
|
|
|
|
|
| |
It doesn't need to have a loop to free page in a single scatterlist
entry because it should be single or compound page. The pages can be
freed in one invocation to __free_pages() for both cases.
Reviewed-by: Gvozden Neskovic <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Yao <[email protected]>
Signed-off-by: Jinshan Xiong <[email protected]>
Closes #6057
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In current implementation, only zio buffers in 16KB and bigger are
guaranteed PAGESIZE alignment. This breaks Lustre since it assumes
that 'arc_buf_t::b_data' must be page aligned when zio buffers are
greater than or equal to PAGESIZE.
This patch will make the zio buffers to be PAGESIZE aligned when
the sizes are not less than PAGESIZE.
This change may cause a little bit memory waste but that should be
fine because after ABD is introduced, zio buffers are used to hold
data temporarily and live in memory for a short while.
Reviewed-by: Don Brady <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jinshan Xiong <[email protected]>
Signed-off-by: Jinshan Xiong <[email protected]>
Closes #6084
|
|
|
|
|
|
|
|
|
|
| |
All filesystems were converted to dynamically allocated BDIs. The
destruction of backing_dev_info structures is handled as part of
super block destruction. Refactor the code to abstract away the
details of creating and destroying a BDI.
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #6089
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Yuri Pankov <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Approved by: Albert Lee <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: bunder2015 <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/7786
OpenZFS-commit: http://github.com/openzfs/openzfs/commit/db8498f
Closes #6074
|
|
|
|
|
|
|
|
|
|
| |
Reinstate default 4G zfs_dirty_data_max_max limit.
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #6072
Closes #6081
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 37f9dac removed the zvol_taskq for processing zvol requests.
This was removed as part of switching to make_request_fn and was
motivated by a concern at the time over dispatch latency.
However, this also made all bio request synchronous, and caused
serious performance issues as the bio submitter would wait for
every bio it submitted, effectively making the IO depth 1.
This patch reinstate zvol_taskq, and to make sure overlapped I/Os
are ordered properly, we take range lock in zvol_request, and pass
it along with bio to the I/O functions zvol_{write,discard,read}.
In order to facilitate benchmarks a zvol_request_sync module
option was added to switch between sync and async request handling.
For the moment, the default behavior is synchronous but this is
likely to change pending additional testing.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #5824
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was documented as being related to zfs_vdev_async_max_active
when it is actually related to zfs_vdev_async_write_max_active.
Also, expand the documentation to describe the allocation throttle
which was introduced as part of OpenZFS 7090 in 3dfb57a.
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #6064
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenZFS 7252 - compressed zfs send / receive
OpenZFS 7628 - create long versions of ZFS send / receive options
Authored by: Dan Kimmel <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: John Kennedy <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Pavel Zakharov <[email protected]>
Reviewed by: Sebastien Roy <[email protected]>
Reviewed by: David Quigley <[email protected]>
Reviewed by: Thomas Caputi <[email protected]>
Approved by: Dan McDonald <[email protected]>
Reviewed by: David Quigley <[email protected]>
Reviewed-by: loli10K <[email protected]>
Ported-by: bunder2015 <[email protected]>
Ported-by: Don Brady <[email protected]>
Ported-by: Brian Behlendorf <[email protected]>
Porting Notes:
- Most of 7252 was already picked up during ABD work. This
commit represents the gap from the final commit to openzfs.
- Fixed split_large_blocks check in do_dump()
- An alternate version of the write_compressible() function was
implemented for Linux which does not depend on fio. The behavior
of fio differs significantly based on the exact version.
- mkholes was replaced with truncate for Linux.
OpenZFS-issue: https://www.illumos.org/issues/7252
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/5602294
Closes #6067
|
|
|
|
|
|
|
|
|
| |
After run a long time with QAT compression, the variable "inst_num"
is overflow by "atomic_inc_32_nv", which causes its neighbor
variable overwritten. Change its definition from U16 to U32.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Weigang Li <[email protected]>
Closes #6051
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix typo in zfs-module-parameters man page
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Mike McQuaid <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Gvozden Neskovic <[email protected]>
Reviewed-by: Neal Gompa <[email protected]>
Reviewed-by: Jason Zaman <[email protected]>
Reviewed-by: Ned Bass <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Prakash Surya <[email protected]>
Reviewed-by: Hajo Möller <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: DHE <[email protected]>
Reviewed-by: Matthew Thode <[email protected]>
Reviewed-by: Thomas Caputi <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: kernelOfTruth <[email protected]>
Reviewed-by: Kash Pande <[email protected]>
Reviewed-by: ilovezfs <[email protected]>
Reviewed-by: @jwittlincohen
Reviewed-by: Jack Draak <[email protected]>
Reviewed-by: @ptx0
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: @Kokokokoka
Reviewed-by: @JCount <[email protected]>
Signed-off-by: bunder2015 <[email protected]>
Closes #6054
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: Matthew Ahrens <[email protected]>
Reviewed by: Dan Kimmel <[email protected]>
Reviewed by: Pavel Zakharov <[email protected]>
Reviewed by: Prashanth Sreenivasa <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Ported-by: Brian Behlendorf <[email protected]>
dbuf_read() creates a zio_root() to track and wait for all the zio's
that may happen as part of this call. However, if the blkptr_t for
this buffer is NULL or a hole, we will not create any more zio's, so
this zio_root() is unnecessary. This is always the case when calling
dbuf_read() on a bonus buffer, because it has no blkptr (it's part of
the containing dnode). For workloads that read a lot of bonus buffers
(e.g. file creation and removal), creating and destroying these
unnecessary zio's can decrease performance by around 3%.
The fix is to only create/destroy the zio_root() in dbuf_read() if the
blkptr is not NULL and not a hole.
Porting Notes:
- The error handling for when dbuf_read_impl() fails which was
originally added in commit 5f6d0b6f5 has been preserved.
OpenZFS-issue: https://www.illumos.org/issues/8025
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8ec5c7c
Closes #6048
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: DHE <[email protected]>
Closes #6061
|
|
|
|
|
|
|
|
|
| |
Fixes formatting errors from commit d6418de057
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: loli10K <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: DHE <[email protected]>
Closes #6060
|