| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Convert usage of kmap to kmap_atomic while correctly saving off
irq state.
|
|
|
|
|
|
| |
Port NEON implementation of RAID-Z functions to ABD.
Signed-off-by: Roomain Dolbeau <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement shift based multiplication for 512f. Higher IPC over lookup based
methods yields up to 40% better performance on the current hardware.
Results on Xeon Phi(TM) CPU 7210:
implementation gen_p gen_pq gen_pqr rec_p rec_q rec_r rec_pq rec_pr rec_qr rec_pqr
original 142232671 24411492 12948205 283053705 22348167 4215911 9171609 2265548 2378370 1648495
scalar 295711162 49851491 33253815 293198109 88179448 61866752 27941684 25764416 17384442 12138153
sse2 410055998 199642658 117973654 406240463 152688682 121092250 84968180 79291076 47473657 20779719
ssse3 411641595 199669571 117937647 406211024 137638508 117050346 81263322 76120405 46281559 32696722
avx2 616485806 311515332 188595628 605455115 260602390 230554476 148198817 138800254 92273356 62937819
avx512f 832191523 408509425 253599522 810094481 404325734 317590971 218235687 197204920 133101937 94001219
fastest avx512f avx512f avx512f avx512f avx512f avx512f avx512f avx512f avx512f avx512f
Signed-off-by: Gvozden Neskovic <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Enable vectorized raidz code on ABD buffers. The avx512f,
avx512bw, neon and aarch64_neonx2 are disabled in this commit.
With the exception of avx512bw these implementations are
updated for ABD in the subsequent commits.
Signed-off-by: Gvozden Neskovic <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
* userspace: aligned buffers. Minimum of 32B alignment is
needed for AVX2. Kernel buffers are aligned 512B or more.
* add abd_get_offset_size() interface
* abd_iter_map(): fix calculation of iter_mapsize
* add abd_raidz_gen_iterate() and abd_raidz_rec_iterate()
Signed-off-by: Gvozden Neskovic <[email protected]>
|
|
|
|
| |
Signed-off-by: Isaac Huang <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
Otherwise, the checksum function pointer isn't initialized.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #5411
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a command (-c) option to zpool status and zpool iostat. The
-c option allows you to run an arbitrary command on each vdev and display
the first line of output in zpool status/iostat. The environment vars
VDEV_PATH and VDEV_UPATH are set to the vdev's path and "underlying path"
before running the command. For device mapper, multipath, or partitioned
vdevs, VDEV_UPATH is the actual underlying /dev/sd* disk. This can be useful
if the command you're running requires a /dev/sd* device.
The patch also uses /sys/block/<dev>/slaves/ to lookup the underlying device
instead of using libdevmapper. This not only removes the libdevmapper
requirement at build time, but also allows you to resolve device mapper
devices without being root. This means that UDEV_UPATH get set correctly
when running zpool status/iostat as an unprivileged user.
Example:
$ zpool status -c 'echo I am $VDEV_PATH, $VDEV_UPATH'
NAME STATE READ WRITE CKSUM
mypool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
mpatha ONLINE 0 0 0 I am /dev/mapper/mpatha, /dev/sdc
sdb ONLINE 0 0 0 I am /dev/sdb1, /dev/sdb
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #5368
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow `zfs unshare <protocol> -a` command to share or unshare all datasets
of a given protocol, nfs or smb.
Additionally, enable most of ZFS Test Suite zfs_share/zfs_unshare test cases.
To work around some Illumos-specific functionalities ($SHARE/$UNSHARE) some
function wrappers were added around them.
Finally, fix and issue in smb_is_share_active() that would leave SMB shares
exported when invoking 'zfs unshare -a'
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Turbo Fredriksson <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #3238
Closes #5367
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Each test in the performance regression test suite
creates a pool and a dataset for use. Unfortunately,
these tests do not cleanup the pool and dataset
correctly once they complete. Each test now kills
fio and iostat, destroys the dataset, and finally
destroys the pool. Each test also now traps the
SIGTERM signal to handle cases where test-runner
kills a test.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Giuseppe Di Natale <[email protected]>
Requires-builders: all
Closes #5407
|
|
|
|
|
|
|
| |
The user_property_002_pos passes as expected.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: ChaoyuZhang <[email protected]>
Closes #5406
|
|
|
|
|
|
|
| |
Linux kernel commit 723c038475b78 removed this field.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: DHE <[email protected]>
Closes #5393
|
|
|
|
|
|
|
|
| |
Bold and Normal codes were mixed up in a few places resulting in
bad highlighting.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: DHE <[email protected]>
Closes #5397
|
|
|
|
|
|
|
|
| |
Repair indent of zpool.8 man page, just before zpool labelclear
details. Accidentally introduced by 193a37cb2 (git bisect).
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Haakan T Johansson <[email protected]>
Closes #5394
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before adding the entry to the configuration verify that the
device can be opened exclusively. This ensures that as long
as multipathd is running the underlying multipath devices, which
otherwise appear identical to their /dev/mapper counterpart,
are pruned from the configuration.
Failure to do so can result in a result in the vdev appearing
as UNAVAIL when the vdev path provided to the kernel can't be
opened exclusively.
This check would normally be performed in zpool_open_func()
but placing it there would result in false positives because
it is called concurrently for many devices.
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5387
|
|
|
|
|
|
|
|
|
|
| |
Now that ZED has internal fault diagnosis and the statechange event
is generated for faulted states, we can replace the io-notify and
checksum-notify zedlets with one based on statechange.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #5383
|
|
|
|
|
|
|
| |
CID 147503: Dereference after null check (FORWARD_NULL)
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: luozhengzheng <[email protected]>
Closes #5326
|
|
|
|
|
|
|
|
|
|
|
| |
CID 147540: unsigned_compare
- Cast nsec to a int32_t to properly detect the expected overflow.
CID 147542: unsigned_compare
- intval can never be less than ZIO_FAILURE_MODE_WAIT which is
defined to be zero. Remove this useless check.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5379
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pass `ACL_TYPE_ACCESS` for type parameter of `set_cached_acl()` and
`forget_cached_acl()` to avoid removal of dead code after BUG() in
compile time. Tested on 3.2.0 kernel.
Introduced in 3779913
Reviewed-by: Massimo Maggi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: Gvozden Neskovic <[email protected]>
Closes #5378
|
|
|
|
|
|
|
|
|
| |
It's used by Lustre to determine if the objset can be upgraded.
The inline version doesn't work because dmu_objset_is_snapshot()
is not exported.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jinshan Xiong <[email protected]>
Closes #5385
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 3.14 introduces inode->set_acl(). Normally, acl modification will come
from setxattr, which will handle by the acl xattr_handler, and we already
handles that well. However, nfsd will directly calls inode->set_acl or
return error if it doesn't exists.
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Massimo Maggi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #5371
Closes #5375
|
|
|
|
|
|
|
|
|
| |
These were named in the zed/Makefile.am as vdev_clear-blinkled.sh
and statechange-blinkled.sh causing bad symlinks to be created.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #5384
|
|
|
|
|
|
|
|
| |
CID 147586: function:allow_usage Type:out-of-bounds read
Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5364
|
|
|
|
|
|
|
| |
CID 147629: Type:Dereference before null check
Reviewed-by: Brian Behlendorf <[email protected]
Signed-off-by: <cao.xuewen [email protected]>
Closes #5376
|
|
|
|
|
|
|
|
| |
CID 154021: Null pointer dereference
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: luozhengzheng <[email protected]>
Closes #5380
|
|
|
|
|
|
|
|
|
| |
CID 147626: Type:Dereference before null check
CID 147628: Type:Dereference before null check
Reviewed-by: Brian Behlendorf <[email protected]
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5304
|
|
|
|
|
|
|
|
| |
The ztest, filebench, xfstests, and zfsstress test suites should
be skipped when testing on 32-bit platforms until they pass
reliably.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5381
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The phase 2 work primarily entails the Diagnosis Engine and
the Retire Agent modules. It also includes infrastructure
to support a crude FMD environment to host these modules.
The Diagnosis Engine consumes I/O and checksum ereports and
feeds them into a SERD engine which will generate a corres-
ponding fault diagnosis when the SERD engine fires. All the
diagnosis state data is collected into cases, one case per
vdev being tracked.
The Retire Agent responds to diagnosed faults by isolating
the faulty VDEV. It will notify the ZFS kernel module of
the new VDEV state (degraded or faulted). This agent is
also responsible for managing hot spares across pools.
When it encounters a device fault or a device removal it
replaces the device with an appropriate spare if available.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #5343
|
|
|
|
|
|
|
|
|
|
| |
CID 147575, Type:Unintentional integer overflow
CID 147577, Type:Unintentional integer overflow
CID 147578, Type:Unintentional integer overflow
CID 147579, Type:Unintentional integer overflow
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5365
|
|
|
|
|
|
|
|
|
| |
Originally, these two function are inline, so their usability is tied to
posix_acl_release. However, since Linux 3.14, they became EXPORT_SYMBOL, so we
can always use them. In this patch, we create an independent test for these
two functions so we can use them when possible.
Signed-off-by: Chunwei Chen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently every calls to zpl_posix_acl_release will schedule a delayed task,
and each delayed task will add a timer. This used to be fine except for
possibly bad performance impact.
However, in Linux 4.8, a new timer wheel implementation[1] is introduced. In
this new implementation, the larger the delay, the less accuracy the timer is.
So when we have a flood of timer from zpl_posix_acl_release, they will expire
at the same time. Couple with the fact that task_expire will do linear search
with lock held. This causes an extreme amount of contention inside interrupt
and would actually lockup the system.
We fix this by doing batch free to prevent a flood of delayed task. Every call
to zpl_posix_acl_release will put the posix_acl to be freed on a lockless
list. Every batch window, 1 sec, the zpl_posix_acl_free will fire up and free
every posix_acl that passed the grace period on the list. This way, we only
have one delayed task every second.
[1] https://lwn.net/Articles/646950/
Signed-off-by: Chunwei Chen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch addresses multiple 'zpool import' block device
indentification problems which are most likely to occur on a
system configured to use blkid, by_vdev paths, multipath and
failover. The symptom most commonly observed is the import
uses different path names to import the pool than would
normally be expected.
* When using blkid to identify vdevs the listed devices may
be added to the cache in any order. In order to apply the
preferred search order heuristic a zfs_path_order() function
was added to calculate the order given full path names.
* Since it's possible to have multiple block devices with
different vdev guids which refer to the same ZPOOL_CONFIG_PATH
the slice cache must be indexed by guid and name. By avoiding
collisions the preferred ordering can be maintaining even
when multiple block devices claim the same ZPOOL_CONFIG_PATH.
The preferred sorting by partition was never benefitial for
a Linux system and was removed as part of this change.
* When adding entries to the blkid cache avl_find/avl_insert
are used instead of avl_add because collisions are possible
and must be handled gracefully.
* For pools using multipath devices there are, at a minimum,
three devices where a vdev label may be read. They are the
dm-* device and each underlying /dev/sd* device. Due to the
way the block cache is implemented each of these devices may
have a different cached copy of the vdev label. This can
result in "ghost pools" which appear to persist even after
a 'zpool labelclear' has been done to the dm-* device. In
order to prevent this the vdev label is read with O_DIRECT
in order to bypass any caching to get the on-disk version.
* When opening a block device verify that vdev guid read from
the disk matches the expected vdev guid. This allows for bad
labels to be filtered out.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5359
|
|
|
|
|
|
|
|
|
| |
Only restrict the maximum zio alloc size to 32-bit kernel space.
The same virtual address space limitations don't apply to user
space. This resolves a memory allocation failure in raidz_test
where it expects to be able to exercises all valid zio sizes.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The isainfo(1) utility was used by the ZFS Test Suite to determine
when running on a 32-bit platform. This non-portable check has been
replaced with an is_32bit helper function which uses getconf(1).
The getconf(1) utility is available for Linux, FreeBSD, and Illumos.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous autoreplace code assumed that if you were using autoreplace, then
you also had the enclosure SES driver loaded. This could lead to autoreplace
not working if the SES driver wasn't loaded, or if it wasn't creating the
proper enclosure_device symlinks (which has happened). This patch removes
that assumption.
Reviewed by: Don Brady <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #5363
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the Fletcher4 algorithm implemented in pure C, but using
multiple counters using algorithms identical to those used for
SSE/NEON and AVX2.
This allows for faster execution on core with strong superscalar
capabilities but weak SIMD capabilities.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Romain Dolbeau <[email protected]>
Closes #5317
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 3.11 add O_TMPFILE to open(2), which allow creating an unlinked file on
supported filesystem. It's basically doing open(2) and unlink(2) atomically.
The filesystem support is added through i_op->tmpfile. We basically copy the
create operation except we get rid of the link and name related stuff and add
the new node to unlinked set.
We also add support for linkat(2) to link tmpfile. However, since all previous
file operation will skip ZIL, we force a txg_wait_synced to make sure we are
sync safe.
Signed-off-by: Chunwei Chen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, doing things like fsetxattr(2) on an unlinked file will result in
ENODATA. There's two places that cause this: zfs_dirent_lock and zfs_zget.
The fix in zfs_dirent_lock is pretty straightforward. In zfs_zget though, we
need it to not return error when the zp is unlinked. This is a pretty big
change in behavior, but skimming through all the callers, I don't think this
change would cause any problem. Also there's nothing preventing z_unlinked
from being set after the z_lock mutex is dropped before but before zfs_zget
returns anyway.
The rest of the stuff is to make sure we don't log xattr stuff when owner is
unlinked.
Signed-off-by: Chunwei Chen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
avx512f should work on all AVX512 hardware, since it only uses
Foundation instructions.
avx512bw should be faster on hardware supporting the AVW512BW
extension. We can use full-width pshufb (instead of relying on the 256
bits AVX2 pshufb). As a side-effect, the code is also unrolled more.
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Gvozden Neskovic <[email protected]>
Reviewed-by: Jinshan Xiong <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Romain Dolbeau <[email protected]>
Closes #5219
|
|
|
|
|
|
|
|
|
|
|
| |
On error dsl_prop_get_all_ds() does not free the nvlist it allocates.
This behavior may have been intentional when originally written
but is atypical and often confusing. Since no callers rely on this
behavior the function has been updated to always free the nvlist
on error.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: BearBabyLiu <[email protected]>
Closes #5320
|
|
|
|
|
|
|
|
|
|
|
|
| |
The async_destroy_001_pos test case currently hangs when testing on
a 32-bit system. Conditionally skip this test case on 32-bit
systems until the root cause is identified and resolved.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #5352
Issue #5347
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On 32-bit Linux systems use vmem_size() to correctly size the ARC
and better determine when IO should be throttle due to low memory.
On 64-bit systems this change has no effect since the virtual
address space available far exceeds the physical memory available.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #5347
|
|
|
|
|
|
|
|
|
|
|
| |
A limit of 1TB exists for zvols on 32-bit systems. Update the code
to correctly reflect this limitation in a similar manor as the
OpenZFS implementation.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #5347
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally the .zfs/snapshot directory was disabled for 32-bit systems
because 64-bit inode numbers were not supported. This is no longer
the case and this functionality can be enabled by default.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #5347
Closes #2002
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the TASKQID_INVALID macros and update callers to use the macro
instead of testing against 0. There is no functional change
even though the functions in zfs_ctldir.c incorrectly used -1
instead of 0.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #5347
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch ensures that all systemd services are processed through the
systemd scriptlets, so that services are properly configured per the
preset file installed by the package.
Without this, zfs.target is set, but none of the services are enabled per
the preset file, meaning automounting filesystems and such won't work
out of the box.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Neal Gompa <[email protected]>
Closes #5356
|
|
|
|
|
|
|
|
| |
Replace magic value 16 with ARRAY_SIZE() to correctly handle
when the sa_legacy_attrs array size changes.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5354
|
|
|
|
|
|
|
| |
CID 147553: Type:Dereference null return value
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5305
|
|
|
|
|
|
|
| |
CID 147548: Type:Dereference null return value
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5321
|