| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]
Reviewed-by: Giuseppe Di Natale <[email protected]>>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Haakan T Johansson <[email protected]>
Closes #5547
Closes #5543
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[bio] The req_op enum was changed to req_opf. Update the "Linux 4.8 API"
autotools checks to use an int to determine whether the various REQ_OP
values are defined. This should work properly on kernels >= 4.8.
[bio] bio_set_op_attrs() is now an inline function and can't be detected
with #ifdef. Add a configure check to determine whether bio_set_op_attrs()
is defined. Move the local definition of it from vdev_disk.c to
blkdev_compat.h for consistency with other related compability shims.
[bio] The read/write flags and their modifiers, including WRITE_FLUSH,
WRITE_FUA and WRITE_FLUSH_FUA have been removed from fs.h. Add the new
bio_set_flush() compatibility wrapper to replace VDEV_WRITE_FLUSH_FUA
and set the flags appropriately for each supported kernel version.
[vfs] The generic_readlink() function has been made static. If .readlink
in inode_operations is NULL, generic_readlink() is used.
[zol typo] Completely unrelated to 4.10 compat, fix a typo in the check
for REQ_OP_SECURE_ERASE so that the proper macro is defined:
s/HAVE_REQ_OP_SECURE_DISCARD/HAVE_REQ_OP_SECURE_ERASE/
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #5499
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a regression accidentally introduced by e0ab3ab.
Additionally, add a new script zpool_import_014_pos.ksh to
the ZFS test suite to exercise 'zpool import -t' functionality.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #5466
Closes #5515
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The introduction of parallel zvol prefetch causes deadlock when using
vdev_file.
spa_async->(spa_namespace_lock)->txg_wait_synced->(wait for txg_sync)
txg_sync->zio_wait->(wait for vdev_file_io_fsync on system_taskq)
zvol_prefetch_minors_impl (on system_taskq)->spa_open_common->(wait for spa_namespace_lock)
We fix this by using dedicated taskq for vdev_file. This same change
was originally made in commit bc25c93 but reverted in commit aa9af22
when dynamic taskqs were added.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #5506
Closes #5495
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When iterating over the input nvlist in dsl_props_set_sync_impl() when we don't
preserve the nvpair name before looking up ZPROP_VALUE, so when we later go to
process it nvpair_name() is always "value" and not the actual property name.
This fixes a couple of bugs in zfs_ioc_recv():
* Received properties were not restored correctly when failing to receive an
incremental send stream
* Received properties were not completely replaced by the new ones when
successfully receiving an incremental send stream
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #5497
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This branch contains the following fixes/improvements.
* Fix setting i_flags
* Fix wrong operator in xvattr.h
* Fix fchange macro in zpl_ioctl_setflags()
* Added configure check to use inode_set_flags()
* Added a test case for chattr for better test coverage
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #5486
Closes #5470
Closes #5469
|
| |
| |
| |
| | |
Signed-off-by: Chunwei Chen <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The fchange in zpl_ioctl_setflags was for detecting flag change. However it
was incorrect and would always fail to detect a flag change from set to unset,
causing users without CAP_LINUX_IMMUTABLE to be able to unset flags.
Signed-off-by: Chunwei Chen <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fix zfs_xvattr_set to set S_IMMUTABLE and S_APPEND flags correctly.
Reinstate zfs_set_inode_flags and use it when zfs_xvatter_set and also when
setting up inode in zfs_znode_alloc and zfs_rezget.
Signed-off-by: Chunwei Chen <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
zfs_sb_create would normally takes ownership of zmo, and it will be freed in
zfs_sb_free. However, when zfs_sb_create fails we need to explicit free it.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #5490
Closes #5496
|
|/
|
|
|
|
|
|
|
| |
Adapt avx512bw implementation for use with abd buffers. Mul2 implementation
is rewritten to take advantage of the BW instruction set.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Romain Dolbeau <[email protected]>
Signed-off-by: Gvozden Neskovic <[email protected]>
Closes #5477
|
|
|
|
|
|
|
|
| |
User of ida needs to call ida_destroy after using it. Otherwise
ida->free_bitmap and/or other stuff may leak.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #5484
|
|
|
|
|
|
|
|
| |
This removes two large whitespaces in "modinfo zfs" as well as correcting
a couple typos.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: bunder2015 <[email protected]>
Closes #5475
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable picky cstyle checks and resolve the new warnings. The vast
majority of the changes needed were to handle minor issues with
whitespace formatting. This patch contains no functional changes.
Non-whitespace changes are as follows:
* 8 times ; to { } in for/while loop
* fix missing ; in cmd/zed/agents/zfs_diagnosis.c
* comment (confim -> confirm)
* change endline , to ; in cmd/zpool/zpool_main.c
* a number of /* BEGIN CSTYLED */ /* END CSTYLED */ blocks
* /* CSTYLED */ markers
* change == 0 to !
* ulong to unsigned long in module/zfs/dsl_scan.c
* rearrangement of module_param lines in module/zfs/metaslab.c
* add { } block around statement after for_each_online_node
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: HÃ¥kan Johansson <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5465
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't count '@' for dataset namelen if not a snapshot. This
fixes making a pool unimportable when the dataset namelen
is 255.
Add test file for zfs create name length 255.
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #5432
Closes #5456
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CID 154617: Memory - illegal accesses (UNINIT)
The value here just needs to be initialized to make Coverity happy.
When dsize == 0, then value of daiter.iter_mapaddr is irrelevant. That
address won't be accessed, it's only used for some arithmetic. dsize
can be zero either if dabd is null, or if code column is longer than the
current data column.
Reviewed-by: Gvozden Neskovic <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: luozhengzheng <[email protected]>
Closes #5437
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Speed up import and export speed by:
* Add system delay taskq
* Parallel prefetch zvol dnodes during zvol_create_minors
* Parallel zvol_free during zvol_remove_minors
* Reduce list linear search using ida and hash
Reviewed-by: Boris Protopopov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #5433
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
On some kernel version, blk_cleanup_queue and put_disk will wait for more then
10ms. So a pool with a lot of zvols will easily wait for more then 1 min if we
do zvol_free sequentially.
Signed-off-by: Chunwei Chen <[email protected]>
Requires-spl: refs/pull/588/head
|
| |
| |
| |
| |
| |
| |
| |
| | |
Do parallel prefetch all zvol dnodes before actually creating each individual.
This will greatly reduce the import time when having a lot of zvols and disk
is slow.
Signed-off-by: Chunwei Chen <[email protected]>
|
| |
| |
| |
| |
| |
| |
| | |
Use kernel ida to generate minor number, and use hash table to find zvol with
name.
Signed-off-by: Chunwei Chen <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Use it for spa_deadman, zpl_posix_acl_free, snapentry_expire.
This free system_taskq from the above long delay tasks, and allow us to do
taskq_wait_outstanding on system_taskq without being blocked forever, making
system_taskq more generic and useful.
Signed-off-by: Chunwei Chen <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Enable zio_dva_throttle_enabled=1 by default. Subsequent
testing has been unable to reproduce the suspected regression.
Tested-by: kernelOfTruth [email protected]
Reviewed-by: Olaf Faaland <[email protected]>
Signed-off-by: Brian Behlendorf [email protected]
Reverts #5335
Closes #5289
Closes #5457
|
| |
| |
| |
| |
| |
| |
| |
| | |
Save and reuse ddt dspace calculation when there have been no ddt changes.
This avoids unnecessary traversal of 168KiB of ddt histograms.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Gvozden Neskovic <[email protected]>
Closes #5425
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It was observed that even when the txg history is disabled by
setting `zfs_txg_history=0` the txg_sync thread still fetches
the vdev stats unnecessarily.
This patch refactors the code such that vdev_get_stats() is no
longer called when `zfs_txg_history=0`. And it further reduces
the differences between upstream and the ZoL txg_sync_thread()
function.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5412
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
dbuf_read() creates a zio_root() to track and wait for all the
zio's that may happen as part of this call. However, if the blkptr_t
for this buffer is NULL or a hole, we will not create any more zio's,
so this zio_root() is unnecessary. This is always the case when calling
dbuf_read() on a bonus buffer, because it has no blkptr (it's part of
the containing dnode). For workloads that read a lot of bonus buffers
(e.g. file creation and removal), creating and destroying these
unnecessary zio's can decrease performance by around 3%.
The fix is to only create/destroy the zio_root() in dbuf_read() if
the blkptr is not NULL and not a hole.
Changes sponsored by Intel Corp.
Authored by: Matthew Ahrens <[email protected]>
Reviewed-by: Alex Zhuravlev <[email protected]>
Ported-by: Brian Behlendorf <[email protected]>
Issue openzfs/openzfs#137
Closes #4803
Closes #5382
|
| |
| |
| |
| |
| |
| |
| |
| | |
This should be & and not | so is_metadata is set correctly.
Reviewed-by: Dan Kimmel <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: luozhengzheng <[email protected]>
Closes #5438
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It looks like this was functionality which was added in the
original SA implementation and then never needed. It can
be safely removed now and easily added back if we find a
use for it.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5440
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
zio.h includes zio_impl.h but zio_impl.h also includes zio.h, so the
header files to contain each other. Get rid of the zio_impl.h include
in zio.h and update zio_inject.c to include zio.h instead of zio_impl.h.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5439
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In multiple cases zio_buf_alloc() was used instead of kmem_alloc()
or vmem_alloc(). This was often done because the allocations
could be large and it was easy to use zfs_buf_alloc() for them.
But this isn't ideal for allocations which are small or short
lived. In these cases it is better to use kmem_alloc() or
vmem_alloc(). If possible we want to avoid the case where
we have slabs allocated for kmem caches which are rarely used.
Note for small allocations vmem_alloc() will be internally
converted to kmem_alloc(). Therefore as long as large
allocations are infrequent and short lived the penalty for
using vmem_alloc() is small.
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5409
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* Convert ABD to use the Linux Kernel scatterlist implementation
instead of the hand rolled one from illumos.
* Scatter ABDs are preferentially populated with higher order
compound pages from a single zone. Allocation size is
progressively decreased until it can be satisfied without
performing reclaim or compaction.
* An alternate page allocator is provided for kernels older
than 3.6 and for CONFIG_HIGHMEM systems. This allocator
is designed as a fallback for maximum compatibility.
* Extended abdstats to provide visibility in the the allocator.
* Add cached value for PAGESIZE in userspace.
Contributions-by:
Chunwei Chen <[email protected]>
Gvozden Neskovic <[email protected]>
Jinshan Xiong <[email protected]>
Isaac Huang <[email protected]>
David Quigley <[email protected]>
Brian Behlendorf <[email protected]>
|
| |
| |
| |
| |
| | |
Convert usage of kmap to kmap_atomic while correctly saving off
irq state.
|
| |
| |
| |
| |
| |
| | |
Port NEON implementation of RAID-Z functions to ABD.
Signed-off-by: Roomain Dolbeau <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Implement shift based multiplication for 512f. Higher IPC over lookup based
methods yields up to 40% better performance on the current hardware.
Results on Xeon Phi(TM) CPU 7210:
implementation gen_p gen_pq gen_pqr rec_p rec_q rec_r rec_pq rec_pr rec_qr rec_pqr
original 142232671 24411492 12948205 283053705 22348167 4215911 9171609 2265548 2378370 1648495
scalar 295711162 49851491 33253815 293198109 88179448 61866752 27941684 25764416 17384442 12138153
sse2 410055998 199642658 117973654 406240463 152688682 121092250 84968180 79291076 47473657 20779719
ssse3 411641595 199669571 117937647 406211024 137638508 117050346 81263322 76120405 46281559 32696722
avx2 616485806 311515332 188595628 605455115 260602390 230554476 148198817 138800254 92273356 62937819
avx512f 832191523 408509425 253599522 810094481 404325734 317590971 218235687 197204920 133101937 94001219
fastest avx512f avx512f avx512f avx512f avx512f avx512f avx512f avx512f avx512f avx512f
Signed-off-by: Gvozden Neskovic <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Enable vectorized raidz code on ABD buffers. The avx512f,
avx512bw, neon and aarch64_neonx2 are disabled in this commit.
With the exception of avx512bw these implementations are
updated for ABD in the subsequent commits.
Signed-off-by: Gvozden Neskovic <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* userspace: aligned buffers. Minimum of 32B alignment is
needed for AVX2. Kernel buffers are aligned 512B or more.
* add abd_get_offset_size() interface
* abd_iter_map(): fix calculation of iter_mapsize
* add abd_raidz_gen_iterate() and abd_raidz_rec_iterate()
Signed-off-by: Gvozden Neskovic <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Isaac Huang <[email protected]>
|
|/ |
|
|
|
|
|
|
|
| |
Linux kernel commit 723c038475b78 removed this field.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: DHE <[email protected]>
Closes #5393
|
|
|
|
|
|
|
| |
CID 147503: Dereference after null check (FORWARD_NULL)
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: luozhengzheng <[email protected]>
Closes #5326
|
|
|
|
|
|
|
|
|
|
|
| |
CID 147540: unsigned_compare
- Cast nsec to a int32_t to properly detect the expected overflow.
CID 147542: unsigned_compare
- intval can never be less than ZIO_FAILURE_MODE_WAIT which is
defined to be zero. Remove this useless check.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5379
|
|
|
|
|
|
|
|
|
| |
It's used by Lustre to determine if the objset can be upgraded.
The inline version doesn't work because dmu_objset_is_snapshot()
is not exported.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jinshan Xiong <[email protected]>
Closes #5385
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 3.14 introduces inode->set_acl(). Normally, acl modification will come
from setxattr, which will handle by the acl xattr_handler, and we already
handles that well. However, nfsd will directly calls inode->set_acl or
return error if it doesn't exists.
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Massimo Maggi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #5371
Closes #5375
|
|
|
|
|
|
|
|
|
| |
CID 147626: Type:Dereference before null check
CID 147628: Type:Dereference before null check
Reviewed-by: Brian Behlendorf <[email protected]
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5304
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The phase 2 work primarily entails the Diagnosis Engine and
the Retire Agent modules. It also includes infrastructure
to support a crude FMD environment to host these modules.
The Diagnosis Engine consumes I/O and checksum ereports and
feeds them into a SERD engine which will generate a corres-
ponding fault diagnosis when the SERD engine fires. All the
diagnosis state data is collected into cases, one case per
vdev being tracked.
The Retire Agent responds to diagnosed faults by isolating
the faulty VDEV. It will notify the ZFS kernel module of
the new VDEV state (degraded or faulted). This agent is
also responsible for managing hot spares across pools.
When it encounters a device fault or a device removal it
replaces the device with an appropriate spare if available.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #5343
|
|
|
|
|
|
|
|
|
|
| |
CID 147575, Type:Unintentional integer overflow
CID 147577, Type:Unintentional integer overflow
CID 147578, Type:Unintentional integer overflow
CID 147579, Type:Unintentional integer overflow
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5365
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently every calls to zpl_posix_acl_release will schedule a delayed task,
and each delayed task will add a timer. This used to be fine except for
possibly bad performance impact.
However, in Linux 4.8, a new timer wheel implementation[1] is introduced. In
this new implementation, the larger the delay, the less accuracy the timer is.
So when we have a flood of timer from zpl_posix_acl_release, they will expire
at the same time. Couple with the fact that task_expire will do linear search
with lock held. This causes an extreme amount of contention inside interrupt
and would actually lockup the system.
We fix this by doing batch free to prevent a flood of delayed task. Every call
to zpl_posix_acl_release will put the posix_acl to be freed on a lockless
list. Every batch window, 1 sec, the zpl_posix_acl_free will fire up and free
every posix_acl that passed the grace period on the list. This way, we only
have one delayed task every second.
[1] https://lwn.net/Articles/646950/
Signed-off-by: Chunwei Chen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Only restrict the maximum zio alloc size to 32-bit kernel space.
The same virtual address space limitations don't apply to user
space. This resolves a memory allocation failure in raidz_test
where it expects to be able to exercises all valid zio sizes.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the Fletcher4 algorithm implemented in pure C, but using
multiple counters using algorithms identical to those used for
SSE/NEON and AVX2.
This allows for faster execution on core with strong superscalar
capabilities but weak SIMD capabilities.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Romain Dolbeau <[email protected]>
Closes #5317
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 3.11 add O_TMPFILE to open(2), which allow creating an unlinked file on
supported filesystem. It's basically doing open(2) and unlink(2) atomically.
The filesystem support is added through i_op->tmpfile. We basically copy the
create operation except we get rid of the link and name related stuff and add
the new node to unlinked set.
We also add support for linkat(2) to link tmpfile. However, since all previous
file operation will skip ZIL, we force a txg_wait_synced to make sure we are
sync safe.
Signed-off-by: Chunwei Chen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, doing things like fsetxattr(2) on an unlinked file will result in
ENODATA. There's two places that cause this: zfs_dirent_lock and zfs_zget.
The fix in zfs_dirent_lock is pretty straightforward. In zfs_zget though, we
need it to not return error when the zp is unlinked. This is a pretty big
change in behavior, but skimming through all the callers, I don't think this
change would cause any problem. Also there's nothing preventing z_unlinked
from being set after the z_lock mutex is dropped before but before zfs_zget
returns anyway.
The rest of the stuff is to make sure we don't log xattr stuff when owner is
unlinked.
Signed-off-by: Chunwei Chen <[email protected]>
|