| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Linux kernel commit 723c038475b78 removed this field.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: DHE <[email protected]>
Closes #5393
|
|
|
|
|
|
|
| |
CID 147503: Dereference after null check (FORWARD_NULL)
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: luozhengzheng <[email protected]>
Closes #5326
|
|
|
|
|
|
|
|
|
|
|
| |
CID 147540: unsigned_compare
- Cast nsec to a int32_t to properly detect the expected overflow.
CID 147542: unsigned_compare
- intval can never be less than ZIO_FAILURE_MODE_WAIT which is
defined to be zero. Remove this useless check.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5379
|
|
|
|
|
|
|
|
|
| |
It's used by Lustre to determine if the objset can be upgraded.
The inline version doesn't work because dmu_objset_is_snapshot()
is not exported.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jinshan Xiong <[email protected]>
Closes #5385
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 3.14 introduces inode->set_acl(). Normally, acl modification will come
from setxattr, which will handle by the acl xattr_handler, and we already
handles that well. However, nfsd will directly calls inode->set_acl or
return error if it doesn't exists.
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Massimo Maggi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #5371
Closes #5375
|
|
|
|
|
|
|
|
|
| |
CID 147626: Type:Dereference before null check
CID 147628: Type:Dereference before null check
Reviewed-by: Brian Behlendorf <[email protected]
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5304
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The phase 2 work primarily entails the Diagnosis Engine and
the Retire Agent modules. It also includes infrastructure
to support a crude FMD environment to host these modules.
The Diagnosis Engine consumes I/O and checksum ereports and
feeds them into a SERD engine which will generate a corres-
ponding fault diagnosis when the SERD engine fires. All the
diagnosis state data is collected into cases, one case per
vdev being tracked.
The Retire Agent responds to diagnosed faults by isolating
the faulty VDEV. It will notify the ZFS kernel module of
the new VDEV state (degraded or faulted). This agent is
also responsible for managing hot spares across pools.
When it encounters a device fault or a device removal it
replaces the device with an appropriate spare if available.
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #5343
|
|
|
|
|
|
|
|
|
|
| |
CID 147575, Type:Unintentional integer overflow
CID 147577, Type:Unintentional integer overflow
CID 147578, Type:Unintentional integer overflow
CID 147579, Type:Unintentional integer overflow
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5365
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently every calls to zpl_posix_acl_release will schedule a delayed task,
and each delayed task will add a timer. This used to be fine except for
possibly bad performance impact.
However, in Linux 4.8, a new timer wheel implementation[1] is introduced. In
this new implementation, the larger the delay, the less accuracy the timer is.
So when we have a flood of timer from zpl_posix_acl_release, they will expire
at the same time. Couple with the fact that task_expire will do linear search
with lock held. This causes an extreme amount of contention inside interrupt
and would actually lockup the system.
We fix this by doing batch free to prevent a flood of delayed task. Every call
to zpl_posix_acl_release will put the posix_acl to be freed on a lockless
list. Every batch window, 1 sec, the zpl_posix_acl_free will fire up and free
every posix_acl that passed the grace period on the list. This way, we only
have one delayed task every second.
[1] https://lwn.net/Articles/646950/
Signed-off-by: Chunwei Chen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Only restrict the maximum zio alloc size to 32-bit kernel space.
The same virtual address space limitations don't apply to user
space. This resolves a memory allocation failure in raidz_test
where it expects to be able to exercises all valid zio sizes.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 3.11 add O_TMPFILE to open(2), which allow creating an unlinked file on
supported filesystem. It's basically doing open(2) and unlink(2) atomically.
The filesystem support is added through i_op->tmpfile. We basically copy the
create operation except we get rid of the link and name related stuff and add
the new node to unlinked set.
We also add support for linkat(2) to link tmpfile. However, since all previous
file operation will skip ZIL, we force a txg_wait_synced to make sure we are
sync safe.
Signed-off-by: Chunwei Chen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, doing things like fsetxattr(2) on an unlinked file will result in
ENODATA. There's two places that cause this: zfs_dirent_lock and zfs_zget.
The fix in zfs_dirent_lock is pretty straightforward. In zfs_zget though, we
need it to not return error when the zp is unlinked. This is a pretty big
change in behavior, but skimming through all the callers, I don't think this
change would cause any problem. Also there's nothing preventing z_unlinked
from being set after the z_lock mutex is dropped before but before zfs_zget
returns anyway.
The rest of the stuff is to make sure we don't log xattr stuff when owner is
unlinked.
Signed-off-by: Chunwei Chen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
avx512f should work on all AVX512 hardware, since it only uses
Foundation instructions.
avx512bw should be faster on hardware supporting the AVW512BW
extension. We can use full-width pshufb (instead of relying on the 256
bits AVX2 pshufb). As a side-effect, the code is also unrolled more.
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Gvozden Neskovic <[email protected]>
Reviewed-by: Jinshan Xiong <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Romain Dolbeau <[email protected]>
Closes #5219
|
|
|
|
|
|
|
|
|
|
|
| |
On error dsl_prop_get_all_ds() does not free the nvlist it allocates.
This behavior may have been intentional when originally written
but is atypical and often confusing. Since no callers rely on this
behavior the function has been updated to always free the nvlist
on error.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: BearBabyLiu <[email protected]>
Closes #5320
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On 32-bit Linux systems use vmem_size() to correctly size the ARC
and better determine when IO should be throttle due to low memory.
On 64-bit systems this change has no effect since the virtual
address space available far exceeds the physical memory available.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #5347
|
|
|
|
|
|
|
|
|
|
|
| |
A limit of 1TB exists for zvols on 32-bit systems. Update the code
to correctly reflect this limitation in a similar manor as the
OpenZFS implementation.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #5347
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally the .zfs/snapshot directory was disabled for 32-bit systems
because 64-bit inode numbers were not supported. This is no longer
the case and this functionality can be enabled by default.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #5347
Closes #2002
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the TASKQID_INVALID macros and update callers to use the macro
instead of testing against 0. There is no functional change
even though the functions in zfs_ctldir.c incorrectly used -1
instead of 0.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #5347
|
|
|
|
|
|
|
|
| |
Replace magic value 16 with ARRAY_SIZE() to correctly handle
when the sa_legacy_attrs array size changes.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5354
|
|
|
|
|
|
|
| |
CID 147553: Type:Dereference null return value
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5305
|
|
|
|
|
|
|
| |
CID 152975: Type:Dereference null return value
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5322
|
|
|
|
|
|
|
|
| |
CID 147509: Explicit null dereferenced
- l2arc_sublist_lock is fragile as relied on caller too much.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: GeLiXin <[email protected]>
Closes #5319
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ubuntu added support for checking inode permissions to lookup_bdev() in kernel
commit 193fb6a2c94fab8eb8ce70a5da4d21c7d4023bee (merged in 4.4.0-6.21).
Upstream bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1636517
This patch adds a test for Ubuntu's variant of lookup_bdev() to configure and
calls the function in the correct way.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Hajo Möller <[email protected]>
Closes #5336
|
|
|
|
|
|
|
|
|
|
| |
Until it can be determined definitively that a performance
regression wasn't introduced accidentally by 3dfb57a this
functionality is being disabled by default. It can be re-
enabled by setting zio_dva_throttle_enabled=1.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5335
Issue #5289
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Fix autoreplace behaviour on statechange-led.sh script.
ZED sends the following events on an auto-replace:
1. statechange: Disk goes UNAVAIL->ONLINE
2. statechange: Disk goes ONLINE->UNAVAIL
3. vdev_attach: Disk goes ONLINE
Events 1-2 happen when ZED first attempts to do an auto-online. When that
fails, ZED then tries an auto-replace, generating the vdev_attach event in #3.
In the previous code, statechange-led was only looking at the UNAVAIL->ONLINE
transition to turn off the LED. It ignored the #2 ONLINE->UNAVAIL transition,
assuming it was just the "old" VDEV going offline. This is problematic, as
a drive can go from ONLINE->UNAVAIL when it's malfunctioning, and we don't want
to ignore that.
This new patch correctly turns on the fault LED every time a drive becomes
UNAVAIL. It also monitors vdev_attach events to trigger turning off the LED
when an auto-replaced disk comes online.
- Remove unnecessary libdevmapper warning with --with-config=kernel
This fixes an unnecessary libdevmapper warning when building
--with-config=kernel. Kernel code does not use libdevmapper, so the warning
is not needed.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #2375
Closes #5312
Closes #5331
|
|
|
|
|
|
|
|
|
| |
This is caught by kmemleak when running compress_004_pos
Reviewed-by: Tim Chase <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #5244
Closes #5330
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When creating and destroying pools in tight loop it's possible to
exhaust the number of allowed threads on a system. This results
in taskq_create() failling and a NULL dereference.
Resolve the issue by falling back to opening the vdevs all
synchronously.
Reviewed-by: Denys Rtveliashvili <[email protected]>
Reviewed-by: Håkan Johansson <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes zfsonlinux/spl#521
Closes #4637
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously when a drive faulted, the statechange-led.sh script would lookup
the drive's LED sysfs entry in /sys/block/sd*/device/enclosure_device, and
turn it on. During testing we noticed that if you pulled out a drive, or if
the drive was so badly broken that it no longer appeared to Linux, that the
/sys/block/sd* path would be removed, and the script could not lookup the
LED entry.
To fix this, this patch looks up the disks's more persistent
"/sys/class/enclosure/X:X:X:X/Slot N" LED sysfs path at pool import. It then
passes that path to the statechange-led script to use, rather than having the
script look it up on the fly. This allows the script to turn on/off the slot
LEDs even when the drive is missing.
Closes #5309
Closes #2375
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The AVL tree compare function requires that either -1, 0, or 1 be
returned. However the strcmp() function only guarantees that a
negative, zero, or positive value is returned. Therefore, the
return value of strcmp() needs to be sanitized with AVL_ISIGN.
This was initially overlooked because the x86_64 implementation
of strcmp() happens to only returns the allowed values. This
was observed on an aarch64 platform which behaves correctly but
differently as described above.
Reviewed-by: Jinshan Xiong <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5311
Closes #5313
|
|
|
|
|
|
|
|
| |
CID 153459: Null pointer dereferences (FORWARD_NULL)
Accidentally introduced by #5159.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: luozhengzheng <[email protected]>
Closes #5310
|
|
|
|
|
|
|
|
| |
CID 147551: Type:dereference null return value
CID 147552: Type:dereference null return value
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5279
|
|
|
|
|
|
|
| |
CID 147472: Type: 'Constant' variable guards dead code
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5288
|
|
|
|
|
|
|
|
|
|
|
|
| |
In torvalds/linux@31051c8 the inode_change_ok() function was
renamed setattr_prepare() and updated to take a dentry ratheri
than an inode. Update the code to call the setattr_prepare()
and add a wrapper function which call inode_change_ok() for
older kernels.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Requires-spl: refs/pull/581/head
|
|
|
|
|
|
|
|
| |
In Linux 4.9, torvalds/linux@fd50eca, iops->{set,get,remove}xattr and
generic_{set,get,remove}xattr are removed. xattr operations will directly
go through sb->s_xattr.
Signed-off-by: Chunwei Chen <[email protected]>
|
|
|
|
|
|
|
| |
In Linux 4.9, torvalds/linux@2773bf0, iops->rename() and iops->rename2() are
merged together into iops->rename(), it now wants flags.
Signed-off-by: Chunwei Chen <[email protected]>
|
|
|
|
|
|
|
| |
These operations are dir specific, there's no point putting them in
zpl_inode_operations which is for regular files.
Signed-off-by: Chunwei Chen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Enable multipath autoreplace support for FMA.
This extends FMA autoreplace to work with multipath disks. This
requires libdevmapper to be installed at build time.
2. Turn on/off fault LEDs when VDEVs become degraded/faulted/online
Set ZED_USE_ENCLOSURE_LEDS=1 in zed.rc to have ZED turn on/off the enclosure
LED for a drive when a drive becomes FAULTED/DEGRADED. Your enclosure must
be supported by the Linux SES driver for this to work. The enclosure LED
scripts work for multipath devices as well. The scripts will clear the LED
when the fault is cleared.
3. Rate limit ZIO delay and checksum events so as not to flood ZED
ZIO delay and checksum events are rate limited to 5/sec in the zfs module.
Reviewed-by: Richard Laager <[email protected]>
Reviewed by: Don Brady <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #2449
Closes #3017
Closes #5159
|
|
|
|
|
|
|
|
|
|
| |
CID 150926: Unchecked return value (CHECKED_RETURN)
- This case cannot occur given the existing taskq implementation
and flags passed to task_dispatch().
Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: luozhengzheng <[email protected]>
Closes #5272
|
|
|
|
|
|
|
|
|
| |
Accidentally introduced by 3dfb57a, when building with debugging
disabled several variables are unused. Resolve this by wrapping
them in ASSERTV to remove them for non-debug builds.
Reviewed by: Don Brady <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5284
|
|
|
|
|
|
|
|
| |
CID 49339: Type:Buffer not null terminated
CID 153393: Type:Buffer not null terminated
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: <cao.xuewen [email protected]>
Closes #5296
|
|
|
|
|
|
|
|
|
|
| |
CID 150924: Unchecked return value (CHECKED_RETURN)
- On taskq_dispatch failure the reference must be dropped and
this entry can be safely skipped. This case should be impossible
in the existing implementation but should be handled regardless.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: luozhengzheng <[email protected]>
Closes #5278
|
|
|
|
|
|
|
|
|
| |
CID 147488, Type:explicit null dereferenced
CID 147490, Type:dereference null return value
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5237
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenZFS 7090 - zfs should throttle allocations
Authored by: George Wilson <[email protected]>
Reviewed by: Alex Reece <[email protected]>
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: Dan Kimmel <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Prakash Surya <[email protected]>
Reviewed by: Sebastien Roy <[email protected]>
Approved by: Matthew Ahrens <[email protected]>
Ported-by: Don Brady <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
When write I/Os are issued, they are issued in block order but the ZIO
pipeline will drive them asynchronously through the allocation stage
which can result in blocks being allocated out-of-order. It would be
nice to preserve as much of the logical order as possible.
In addition, the allocations are equally scattered across all top-level
VDEVs but not all top-level VDEVs are created equally. The pipeline
should be able to detect devices that are more capable of handling
allocations and should allocate more blocks to those devices. This
allows for dynamic allocation distribution when devices are imbalanced
as fuller devices will tend to be slower than empty devices.
The change includes a new pool-wide allocation queue which would
throttle and order allocations in the ZIO pipeline. The queue would be
ordered by issued time and offset and would provide an initial amount of
allocation of work to each top-level vdev. The allocation logic utilizes
a reservation system to reserve allocations that will be performed by
the allocator. Once an allocation is successfully completed it's
scheduled on a given top-level vdev. Each top-level vdev maintains a
maximum number of allocations that it can handle (mg_alloc_queue_depth).
The pool-wide reserved allocations (top-levels * mg_alloc_queue_depth)
are distributed across the top-level vdevs metaslab groups and round
robin across all eligible metaslab groups to distribute the work. As
top-levels complete their work, they receive additional work from the
pool-wide allocation queue until the allocation queue is emptied.
OpenZFS-issue: https://www.illumos.org/issues/7090
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4756c3d7
Closes #5258
Porting Notes:
- Maintained minimal stack in zio_done
- Preserve linux-specific io sizes in zio_write_compress
- Added module params and documentation
- Updated to use optimize AVL cmp macros
|
|
|
|
|
|
|
|
|
| |
CID:150943, Type:Unintentional integer overflow
CID:150938, Type:Explicit null dereferenced
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5255
|
|
|
|
|
|
|
|
| |
CID 147571: Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
CID 147574: Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: luozhengzheng <[email protected]>
Closes #5268
|
|
|
|
|
|
|
| |
coverity scan CID 153394, Type:String overflow
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: luozhengzheng <[email protected]>
Closes #5263
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fix use after free in zfsctl_snapshot_unmount(). Use /usr/bin/env
instead of /bin/sh to fix a shell code injection flaw and allow use
with grsecurity.
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: Stian Ellingsen <[email protected]>
Closes #5250
Closes #4377
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Call mount and umount via /usr/bin/env instead of /bin/sh in
zfsctl_snapshot_mount() and zfsctl_snapshot_unmount().
This change fixes a shell code injection flaw. The call to /bin/sh
passed the mountpoint unescaped, only surrounded by single quotes. A
mountpoint containing one or more single quotes would cause the command
to fail or potentially execute arbitrary shell code.
This change also provides compatibility with grsecurity patches.
Grsecurity only allows call_usermodehelper() to use helper binaries in
certain paths. /usr/bin/* is allowed, /bin/* is not.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is as much an upstream compatibility as it's a bit of a performance
gain.
The illumos taskq implemention doesn't allow a TASKQ_THREADS_CPU_PCT type
to be dynamic and in fact enforces as much with an ASSERT.
As to performance, if this taskq is dynamic, it can cause excessive
contention on tq_lock as the threads are created and destroyed because it
can see bursts of many thousands of tasks in a short time, particularly
in heavy high-concurrency zvol write workloads.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #5236
|