| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This failed on 5.2-rc1 with "error: unknown" message, for set_fs_pwd()
not being visible in both const and non-const tests.
This is caused by torvalds/linux@83da1bed86. It's configurable,
but we would want to be able to compile with default kbuild setting.
set_fs_pwd() has never been exported with exception of some distro
kernels, and set_fs_pwd() wasn't used in ZoL to begin with. The test
result was used for a spl function vn_set_fs_pwd().
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: loli10K <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8777
|
|
|
|
|
|
|
|
| |
It's accessible via <sys/mntent.h>.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8765
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The issue is caused by an incorrect usage of the sizeof() operator
in vdev_obsolete_sm_object(): on 64-bit systems this is not an issue
since both "uint64_t" and "uint64_t*" are 8 bytes in size. However on
32-bit systems pointers are 4 bytes long which is not supported by
zap_lookup_impl(). Trying to remove a top-level vdev on a 32-bit system
will cause the following failure:
VERIFY3(0 == vdev_obsolete_sm_object(vd, &obsolete_sm_object)) failed (0 == 22)
PANIC at vdev_indirect.c:833:vdev_indirect_sync_obsolete()
Showing stack for process 1315
CPU: 6 PID: 1315 Comm: txg_sync Tainted: P OE 4.4.69+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
c1abc6e7 0ae10898 00000286 d4ac3bc0 c14397bc da4cd7d8 d4ac3bf0 d4ac3bd0
d790e7ce d7911cc1 00000523 d4ac3d00 d790e7d7 d7911ce4 da4cd7d8 00000341
da4ce664 da4cd8c0 da33fa6e 49524556 28335946 3d3d2030 65647620 626f5f76
Call Trace:
[<>] dump_stack+0x58/0x7c
[<>] spl_dumpstack+0x23/0x27 [spl]
[<>] spl_panic.cold.0+0x5/0x41 [spl]
[<>] ? dbuf_rele+0x3e/0x90 [zfs]
[<>] ? zap_lookup_norm+0xbe/0xe0 [zfs]
[<>] ? zap_lookup+0x57/0x70 [zfs]
[<>] ? vdev_obsolete_sm_object+0x102/0x12b [zfs]
[<>] vdev_indirect_sync_obsolete+0x3e1/0x64d [zfs]
[<>] ? txg_verify+0x1d/0x160 [zfs]
[<>] ? dmu_tx_create_dd+0x80/0xc0 [zfs]
[<>] vdev_sync+0xbf/0x550 [zfs]
[<>] ? mutex_lock+0x10/0x30
[<>] ? txg_list_remove+0x9f/0x1a0 [zfs]
[<>] ? zap_contains+0x4d/0x70 [zfs]
[<>] spa_sync+0x9f1/0x1b10 [zfs]
...
[<>] ? kthread_stop+0x110/0x110
This commit simply corrects the "integer_size" parameter used to lookup
the vdev's ZAP object.
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8790
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CID 186143: Memory - illegal accesses (USE_AFTER_FREE)
This patch fixes an use-after-free in spa_import_progress_destroy()
moving the kmem_free() call at the end of the function.
Reviewed-by: Chris Dunlop <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8788
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In `config/kernel-timer.m4` refactor slightly to check more generally
for the new `timer_setup()` APIs, but also check the callback signature
because some kernels (notably 4.14) have the new `timer_setup()` API but
use the old callback signature. Also add a check for a `flags` member in
`struct timer_list`, which was added in 4.1-rc8.
Add compatibility shims to `include/spl/sys/timer.h` to allow using the
new timer APIs with the only two caveats being that the callback
argument type must be declared as `spl_timer_list_t` and an explicit
assignment is required to get the timer variable for the `timer_of()`
macro. So the callback would look like this:
```c
__cv_wakeup(spl_timer_list_t t)
{
struct timer_list *tmr = (struct timer_list *)t;
struct thing *parent = from_timer(parent, tmr,
parent_timer_field);
... /* do stuff with parent */
```
Make some minor changes to `spl-condvar.c` and `spl-taskq.c` to use the
new timer APIs instead of conditional code.
Reviewed-by: Tomohiro Kusumi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rafael Kitover <[email protected]>
Closes #8647
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When reading kstats, the health (aka state) of the pool is stored into
/proc/spl/kstat/zfs/POOLNAME/state via spa_state_to_name().
However, during import/export there is a case where the spa exists,
but the root vdev does not exist. This fix checks that case and sets
the state to "TRANSITIONING"
Unfortunately, it is not easy to reproduce a test for this. It was
detected randomly during ZTS runs while kstats were also being sampled
regularly. After this change, further testing did not trip on the case
and the TRANSITIONING state was collected at least once by the kstats.
For posterity, the backtrace prior to this fix is:
[Mon May 13 17:21:00 2019] RIP: 0010:spa_state_to_name+0x10/0xb0 [zfs]
...
Mon May 13 17:21:00 2019] Call Trace:
[Mon May 13 17:21:00 2019] spa_state_data+0x1a/0x40 [zfs]
[Mon May 13 17:21:00 2019] kstat_seq_show+0x117/0x440 [spl]
[Mon May 13 17:21:00 2019] seq_read+0xe5/0x430
[Mon May 13 17:21:00 2019] proc_reg_read+0x45/0x70
[Mon May 13 17:21:00 2019] __vfs_read+0x1b/0x40
[Mon May 13 17:21:00 2019] vfs_read+0x8e/0x130
[Mon May 13 17:21:00 2019] SyS_read+0x55/0xc0
[Mon May 13 17:21:00 2019] ? SyS_fcntl+0x5d/0xb0
[Mon May 13 17:21:00 2019] do_syscall_64+0x73/0x130
[Mon May 13 17:21:00 2019] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Richard Elling <[email protected]>
Closes #8746
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit torvalds/linux@46ad0840b has removed the architecture specific
rwsem source and headers leaving only the generic version. As part
of this change the RWSEM_ACTIVE_READ_BIAS and RWSEM_ACTIVE_WRITE_BIAS
macros were moved to the private kernel/locking/rwsem.h header.
This results in a build failure because these macros were required
to implement the rw_tryupgrade() compatibility function.
In practice, this isn't a major problem because there are only a
few consumers of rw_tryupgrade() and because consumers of rw_tryupgrade
should be written to retry using rw_enter(RW_WRITER).
After auditing all of the callers only dmu_zfetch() was determined
not to perform a retry. It has been updated in this commit to
resolve this issue.
That said, the rw_tryupgrade() functionality should be considered
for possible removal in a future release due to the difficultly
in supporting the interface.
Reviewed-by: Tomohiro Kusumi <[email protected]>
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8730
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The db_dirtycnt of an EVICTING dbuf is always 0. However, it still
appears in the dn_dbufs tree. If we call dnode_dirty_l1range on a
range that contains an EVICTING dbuf, we will attempt to mark it dirty
(which will fail because it's EVICTING, resulting in a new dbuf being
created and dirtied). Later, in ZFS_DEBUG mode, we assert that all the
dbufs in the range are dirty. If the EVICTING dbuf is still present,
this will trip the assertion erroneously.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: Sara Hartse <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes #8745
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an import requires a long MMP activity check, or when the user
requests pool recovery, the import make take a long time. The user may
not know why, or be able to tell whether the import is progressing or is
hung.
Add a kstat which lists all imports currently being processed by the
kernel (currently only one at a time is possible, but the kstat allows
for more than one). The kstat is /proc/spl/kstat/zfs/import_progress.
The kstat contents are as follows:
pool_guid load_state multihost_secs max_txg pool_name
16667015954387398 3 15 0 tank3
load_state: the value of spa_load_state
multihost_secs: seconds until the end of the multihost activity
check; if over, or none required, this is 0
max_txg: current spa_load_max_txg, if rewind is occurring
This could be used by outside tools, such as a pacemaker resource agent,
to report import progress, or as a part of manual troubleshooting. The
zpool import subcommand could also be modified to report this
information.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #8696
|
|
|
|
|
|
|
|
|
| |
These messages will want '\n' like any other regular printk() messages.
Reviewed by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8726
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Given how zfs_getattr() is implemented, zfs_getattr_fast() (used by
->getattr() of zpl inodes) also needs to consider an additional link
count if "snapdir" property is set to "visible".
Without this, # of directories in root inode of each dataset doesn't
match the link count when snapdir is visible.
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8727
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 5.0 kernel defines the macro ASM_BUG. In order to prevent a
conflict and build failure rename ASM_BUG to ZFS_ASM_BUG. This
is currently only an issue on aarch64 but all instances of
ASM_BUG we're renamed to avoid any future conflict on x86_64.
Reviewed-by: Tomohiro Kusumi <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Chris Dunlop <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8725
Issue #8545
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 98bb45e resolved a deadlock which could occur when
handling a page fault in zfs_write(). This change added
the uio_fault_disable field to the uio structure but failed
to initialize it to B_FALSE. This uninitialized field would
cause uiomove_iov() to call __copy_from_user_inatomic()
instead of copy_from_user() resulting in unexpected EFAULTs.
Resolve the issue by fully initializing the uio, and clearing
the uio_fault_disable flags after it's used in zfs_write().
Additionally, reorder the uio_t field assignments to match
the order the fields are declared in the structure.
Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8640
Closes #8719
|
|
|
|
|
|
|
|
| |
Exported and documented a new module parameter.
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: DHE <[email protected]>
Closes #8706
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When receiving a DRR_OBJECT record the receive_object() function
needs to determine how to handle a spill block associated with the
object. It may need to be removed or kept depending on how the
object was modified at the source.
This determination is currently accomplished using a heuristic which
takes in to account the DRR_OBJECT record and the existing object
properties. This is a problem because there isn't quite enough
information available to do the right thing under all circumstances.
For example, when only the block size changes the spill block is
removed when it should be kept.
What's needed to resolve this is an additional flag in the DRR_OBJECT
which indicates if the object being received references a spill block.
The DRR_OBJECT_SPILL flag was added for this purpose. When set then
the object references a spill block and it must be kept. Either
it is update to date, or it will be replaced by a subsequent DRR_SPILL
record. Conversely, if the object being received doesn't reference
a spill block then any existing spill block should always be removed.
Since previous versions of ZFS do not understand this new flag
additional DRR_SPILL records will be inserted in to the stream.
This has the advantage of being fully backward compatible. Existing
ZFS systems receiving this stream will recreate the spill block if
it was incorrectly removed. Updated ZFS versions will correctly
ignore the additional spill blocks which can be identified by
checking for the DRR_SPILL_UNMODIFIED flag.
The small downside to this approach is that is may increase the size
of the stream and of the received snapshot on previous versions of
ZFS. Additionally, when receiving streams generated by previous
unpatched versions of ZFS spill blocks may still be lost.
OpenZFS-issue: https://www.illumos.org/issues/9952
FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233277
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8668
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`zfs set atime|relatime=off|on` doesn't disable or enable the property
on read for datasets whose property was inherited from parent, until
a dataset is once unmounted and mounted again.
(The properties start to work properly if a dataset is once unmounted
and mounted again. The difference comes from regular mount process,
e.g. via zpool import, uses mount options based on properties read
from ondisk layout for each dataset, whereas
`zfs set atime|relatime=off|on` just remounts a specified dataset.)
--
# zpool create p1 <device>
# zfs create p1/f1
# zfs set atime=off p1
# echo test > /p1/f1/test
# sync
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
p1 176K 18.9G 25.5K /p1
p1/f1 26K 18.9G 26K /p1/f1
# zfs get atime
NAME PROPERTY VALUE SOURCE
p1 atime off local
p1/f1 atime off inherited from p1
# stat /p1/f1/test | grep Access | tail -1
Access: 2019-04-26 23:32:33.741205192 +0900
# cat /p1/f1/test
test
# stat /p1/f1/test | grep Access | tail -1
Access: 2019-04-26 23:32:50.173231861 +0900
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ changed by read(2)
--
The problem is that zfsvfs::z_atime which was probably intended to keep
incore atime state just gets updated by a callback function of "atime"
property change, atime_changed_cb(), and never used for anything else.
Since now that all file read and atime update use a common function
zpl_iter_read_common() -> file_accessed(), and whether to update atime
via ->dirty_inode() is determined by atime_needs_update(),
atime_needs_update() needs to return false once atime is turned off.
It currently continues to return true on `zfs set atime=off`.
Fix atime_changed_cb() by setting or dropping SB_NOATIME in VFS super
block depending on a new atime value, so that atime_needs_update() works
as expected after property change.
The same problem applies to "relatime" except that a self contained
relatime test is needed. This is because relatime_need_update() is based
on a mount option flag MNT_RELATIME, which doesn't exist in datasets
with inherited "relatime" property via `zfs set relatime=...`, hence it
needs its own relatime test zfs_relatime_need_update().
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8674
Closes #8675
|
|
|
|
|
|
|
|
|
|
|
|
| |
Drop duplicated phrases in comments.
Also drop an obsolete comment "Perform a mount of the associated...",
as all it does now is get objid from DMU and lookup incore inode.
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8707
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux kernel commit ca79b0c211af63fa3276f0e3fd7dd9ada2439839
"mm: convert totalram_pages and totalhigh_pages variables to atomic"
replaced `totalhigh_pages` with an inline function `totalhigh_pages()`.
This broke compilation on IA32, etc, as ZoL uses `totalhigh_pages`
on archs with highmem. Confirmed on Fedora 30 (5.0.9-301.fc30.i686).
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8677
Closes #8701
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The kernel function which adds new zvols as disks to the system,
add_disk(), briefly opens and closes the zvol as part of its work.
Closing a zvol involves waiting for two txgs to sync. This, combined
with the fact that the taskq processing new zvols is single threaded,
makes this processing new zvols slow.
Waiting for these txgs to sync is only necessary if the zvol has been
written to, which is not the case during add_disk(). This change adds
tracking of whether a zvol has been written to so that we can skip the
txg_wait_synced() calls when they are unnecessary.
This change also fixes the flags passed to blkdev_get_by_path() by
vdev_disk_open() to be FMODE_READ | FMODE_WRITE | FMODE_EXCL instead of
just FMODE_EXCL. The flags were being incorrectly calculated because
we were using the wrong version of vdev_bdev_mode().
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: John Gallagher <[email protected]>
Closes #8526
Closes #8615
|
|
|
|
|
|
|
|
|
|
| |
The comment in lz4_compress_zfs could be more clear and specific. It
also contains needlessly strong language.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes: #8702
Closes: #8703
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 'zpool resilver' command requires that the resilver_defer
feature is active on the pool. Unfortunately, the check for
this was left out of the original patch. This commit simply
corrects this so that the command properly returns an error
in this case.
Reviewed by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8700
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The size argument of snprintf(3) in glibc and snprintf() in Linux
kernel includes trailing \0, as snprintf(3) man page explains it as
"write at most size bytes (including the trailing null byte ('\0'))",
i.e. snprintf() can just take buffer size.
e.g. For snprintf() in module/zfs/zfs_ctldir.c, a buffer size is
MAXPATHLEN, and a caller is passing MAXPATHLEN to snprintf(), so size
should just be `path_len` to do what the caller is trying to do.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8692
|
|
|
|
|
|
|
|
|
| |
Not all block devices, notably scsi_debug, set a root_blkg on the
request queue. Remove this assertion and allow the the existing
call to blkg_tryget() to gracefully handle the NULL (which it does).
Reviewed-by: Tomohiro Kusumi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8678
|
|
|
|
|
|
|
|
| |
Use NV_ENCODE_NATIVE for nvlist encoding variable instead of 0.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Chris Dunlop <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8653
|
|
|
|
|
|
|
|
| |
Use either SEEK_* or 0,1,2..., but not both.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8656
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes 2 issues with the DMU free throttle implemented
in dmu_free_long_range(). The first issue is that get_next_chunk()
was calculating the number of L1 blocks the free would dirty
incorrectly. In some cases involving extremely large files, this
code would greatly overestimate the number of effected L1 blocks,
causing excessive calls to txg_wait_open(). This patch corrects
the calculation.
The second issue is that the free throttle uses the total number
of free'd blocks in all (open, quiescing, and syncing) txgs to
determine whether to throttle. This causes large frees (such as
those created by the first issue) to cause 4 txg syncs before
any further frees were allowed to proceed. This patch ensures
that the accounting is done entirely in a per-txg fashion, so
that frees from a given txg don't affect those that immediately
follow it.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8655
|
|
|
|
|
|
|
|
|
|
| |
Unused since 5649246dd3("Remove znode move functionality"),
and ZNODE_STAT_ADD() will never be needed.
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8636
|
|
|
|
|
|
|
|
|
|
|
| |
These aren't unused.
`flag` in zfs_create() also isn't to indicate large file.
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8635
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Support QAT when ZFS is root file-system:
When ZFS module is loaded before QAT started, the QAT can
be started again in post-process, e.g.:
echo 0 > /sys/module/zfs/parameters/zfs_qat_compress_disable
echo 0 > /sys/module/zfs/parameters/zfs_qat_encrypt_disable
echo 0 > /sys/module/zfs/parameters/zfs_qat_checksum_disable
2. Verify alder checksum of the de-compress result
3. Allocate Digest, IV and AAD buffer in physical contiguous
memory by QAT_PHYS_CONTIG_ALLOC.
4. Update the documentation for zfs_qat_compress_disable,
zfs_qat_checksum_disable, zfs_qat_encrypt_disable.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Weigang Li <[email protected]>
Signed-off-by: Chengfeix Zhu <[email protected]>
Closes #8323
Closes #8610
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This replaces empty for loops with while loops to make the code easier
to read.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reported-by: github.com/dcb314
Signed-off-by: Richard Laager <[email protected]>
Closes #6681
Closes #6682
Closes #6683
Closes #8623
|
|
|
|
|
|
|
|
|
| |
GRUB supports large_blocks.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes #8626
|
|
|
|
|
|
|
| |
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes #8626
|
|
|
|
|
|
|
| |
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes #8626
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When receiving a raw send stream only reallocated objects
whose contents were not freed by the standard indicators
should call dmu_free_long_range().
Furthermore, if calling dmu_free_long_range() is required
then the objects current block size must be used and not
the new block size.
Two additional test cases were added to provided realistic
test coverage for processing reallocated objects which are
part of a raw receive.
Reviewed-by: Olaf Faaland <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8528
Closes #8607
|
|
|
|
|
|
|
|
|
| |
This patch simply up cleans up a nit and corrects an error message
issue that were introduced in the Multiple DVA scrub patch.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8619
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When receiving an object to a previously allocated interior slot
the new object should be "allocated" by setting DMU_NEW_OBJECT,
not "reallocated" with dnode_reallocate(). For resilience verify
the slot is free as required in case the stream is malformed.
Add a test case to generate more realistic incremental send streams
that force reallocation to occur during the receive.
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8067
Closes #8614
|
|
|
|
|
|
|
|
|
|
| |
Fix style issue for 'tx->tx_txg&TXG_MASK'. There should be white
space around the '&' character. Split the dnode_reallocate() ASSERT
to make it more readable to clearly separate the checks.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8606
|
|
|
|
|
|
|
|
|
|
|
|
| |
The error path in zio_crypt_key_unwrap would call zio_crypt_key_destroy which
calls rw_destroy(&key->zk_salt_lock); which has not yet been initialized.
We move the rw_init() call to the start of zio_crypt_key_unwrap instead.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Closes #8604
Closes #8605
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bulk[] array index, count, must be reset per-iteration in order to
not overwrite the stack.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Chris Dunlop <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #8072
Closes #8597
Closes #8601
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
d12614521a("Fixes for procfs files backed by linked lists")
uses PDE_DATA(), but since PDE_DATA() (public interface which
replaced old public interface PDE()) first appeared in upstream
kernel 3.10, it lacks visible local definition for kernel < 3.10.
Move the local PDE_DATA() definition to a ZoL header, to unbreak
build on kernel < 3.10.
--
module/spl/spl-procfs-list.c: In function 'procfs_list_open':
module/spl/spl-procfs-list.c:166: error: implicit declaration of function 'PDE_DATA'
module/spl/spl-procfs-list.c:166: warning: assignment makes pointer from integer without a cast
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Gallagher <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8599
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This partially reverts commit 5dbf8b4ed. This change resolved
the issues observed with truncated files in raw sends. However,
the required changes to dnode_allocate() introduced a regression
for non-raw streams which needs to be understood.
The additional debugging improvements from the original patch
were not reverted.
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #7378
Issue #8528
Issue #8540
Issue #8565
Close #8584
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a pool is initially created (by `zpool create`), predictive
prefetch is inadvertently disabled, until the pool is export/import-ed,
or the machine is rebooted.
When device removal was introduced, we added some code to disable
predictive prefetching until indirect vdevs have been loaded. This
resulted in the "default state" of prefetch being disabled, until we
proactively enable it after indirect vdevs are loaded. Unfortunately
this resulted in a few bugs where in some code paths we neglect to
enable predictive prefetch. The first of these was fixed by
https://github.com/zfsonlinux/zfs/commit/20507534d4ede14d4dd82c99fc8d461704ce7419
This commit fixes another case where we also need to explicitly enable
predictive prefetch, when the pool is initially created.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #8577
|
|
|
|
|
|
|
|
| |
The features.kernel layout should match features.pool.
Reviewed-by: Sara Hartse <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #8566
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are several places where we use zfs_dbgmsg and %p to
print pointers. In the Linux kernel, these values obfuscated
to prevent information leaks which means the pointers aren't
very useful for debugging crash dumps. We decided to restrict
the permissions of dbgmsg (and some other kstats while we were
at it) and print pointers with %px in zfs_dbgmsg as well as
spl_dumpstack
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Gallagher <[email protected]>
Signed-off-by: sara hartse <[email protected]>
Closes #8467
Closes #8476
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Callers of txg_wait_open() which set should_quiesce=B_TRUE should be
accounted for as iowait time. Otherwise, the caller is understood
to be idle and cv_wait_sig() is used to prevent incorrectly inflating
the system load average.
Similarly txg_wait_wait() has been updated to use cv_wait_io() to
be accounted against iowait.
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8550
Closes #8558
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UNMAP/TRIM support is a frequently-requested feature to help
prevent performance from degrading on SSDs and on various other
SAN-like storage back-ends. By issuing UNMAP/TRIM commands for
sectors which are no longer allocated the underlying device can
often more efficiently manage itself.
This TRIM implementation is modeled on the `zpool initialize`
feature which writes a pattern to all unallocated space in the
pool. The new `zpool trim` command uses the same vdev_xlate()
code to calculate what sectors are unallocated, the same per-
vdev TRIM thread model and locking, and the same basic CLI for
a consistent user experience. The core difference is that
instead of writing a pattern it will issue UNMAP/TRIM commands
for those extents.
The zio pipeline was updated to accommodate this by adding a new
ZIO_TYPE_TRIM type and associated spa taskq. This new type makes
is straight forward to add the platform specific TRIM/UNMAP calls
to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are
handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs.
This makes it possible to largely avoid changing the pipieline,
one exception is that TRIM zio's may exceed the 16M block size
limit since they contain no data.
In addition to the manual `zpool trim` command, a background
automatic TRIM was added and is controlled by the 'autotrim'
property. It relies on the exact same infrastructure as the
manual TRIM. However, instead of relying on the extents in a
metaslab's ms_allocatable range tree, a ms_trim tree is kept
per metaslab. When 'autotrim=on', ranges added back to the
ms_allocatable tree are also added to the ms_free tree. The
ms_free tree is then periodically consumed by an autotrim
thread which systematically walks a top level vdev's metaslabs.
Since the automatic TRIM will skip ranges it considers too small
there is value in occasionally running a full `zpool trim`. This
may occur when the freed blocks are small and not enough time
was allowed to aggregate them. An automatic TRIM and a manual
`zpool trim` may be run concurrently, in which case the automatic
TRIM will yield to the manual TRIM.
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Contributions-by: Saso Kiselkov <[email protected]>
Contributions-by: Tim Chase <[email protected]>
Contributions-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8419
Closes #598
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes a few issues with raw receives involving
truncated files:
* dnode_reallocate() now calls dnode_set_blksz() instead of
dnode_setdblksz(). This ensures that any remaining dbufs with
blkid 0 are resized along with their containing dnode upon
reallocation.
* One of the calls to dmu_free_long_range() in receive_object()
needs to check that the object it is about to free some contents
or hasn't been completely removed already by a previous call to
dmu_free_long_object() in the same function.
* The same call to dmu_free_long_range() in the previous point
needs to ensure it uses the object's current block size and
not the new block size. This ensures the blocks of the object
that are supposed to be freed are completely removed and not
simply partially zeroed out.
This patch also adds handling for DRR_OBJECT_RANGE records to
dprintf_drr() for debugging purposes.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #7378
Closes #8528
|
|
|
|
|
|
|
|
|
|
|
| |
Make a local copy of the vd_path and preserve the removal error
for use in spa_history_log_internal(). This is required because
after spa_vdev_exit() there is nothing preventing the vdev state
from changing.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Igor Kozhukhov <[email protected]>
Closes #8522
|
|
|
|
|
|
|
|
|
|
|
| |
Added missing remove of detachable VDEV from txg's DTL list
to avoid use-after-free for the split VDEV
Reviewed by: Pavel Zakharov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Signed-off-by: Roman Strashkin <[email protected]>
Closes #5565
Closes #7856
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZFS supports O_RSYNC for read operations and when specified will ensure
the same level of data integrity that O_DSYNC and O_SYNC provides for
writes. O_RSYNC by itself has no effect so it must be combined with
either O_DSYNC or O_SYNC. However, many platforms don't support O_RSYNC
and have mapped O_SYNC to mean O_RSYNC within ZFS. This is incorrect
and causes unnecessary calls to zil_commit. Only platforms which
support O_RSYNC should implement the zil_commit functionality in the
read code path.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Wilson <[email protected]>
Closes #8523
|