| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
On Linux, this is in man section 8, not 1M. Also, there is no fsdb on
Linux, so I removed that.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes #8626
|
|
|
|
|
|
|
|
|
|
|
|
| |
This had a mix of command vs subcommand, quoted vs not quoted, and
bolded vs. not bolded command names.
Also, fix man page sections from 1M (Solaris) to 8 (Linux).
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes #8626
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the "spare" vdev type was described as "A special
pseudo-vdev which...". I wanted to eliminate the word "special" from
that, now that the allocation_classes feature exists and there is such a
thing as a "special vdev". I ended up eliminating almost all instances
of the word "special" that are not referencing the allocation_classes
feature.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes #8626
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When receiving a raw send stream only reallocated objects
whose contents were not freed by the standard indicators
should call dmu_free_long_range().
Furthermore, if calling dmu_free_long_range() is required
then the objects current block size must be used and not
the new block size.
Two additional test cases were added to provided realistic
test coverage for processing reallocated objects which are
part of a raw receive.
Reviewed-by: Olaf Faaland <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8528
Closes #8607
|
|
|
|
|
|
|
|
| |
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reported-by: Matthew Ahrens <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes #8563
Closes #8622
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use (ZFS_IOC_LAST - ZFS_IOC_FIRST) instead of 256.
It seems 256 is just a number large enough to hold ioctls
at the moment.
Using 256 also causes compile-time warning or error
on platfoms whose enum zfs_ioc definition differs.
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8598
|
|
|
|
|
|
|
|
|
|
| |
It's 81 now.
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8598
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By default, depending on the version, gcc can reuse the frame pointer register.
This is a micro-optimization that might help on some very old x86 processors.
However, it also makes dynamic tracing less useful because the stacks cannot
be easily observed.
This rule change instructs gcc to use the -fno-omit-frame-pointer option
when compiling.
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Elling <[email protected]>
Closes #8617
|
|
|
|
|
|
|
|
|
| |
This patch simply up cleans up a nit and corrects an error message
issue that were introduced in the Multiple DVA scrub patch.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8619
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When receiving an object to a previously allocated interior slot
the new object should be "allocated" by setting DMU_NEW_OBJECT,
not "reallocated" with dnode_reallocate(). For resilience verify
the slot is free as required in case the stream is malformed.
Add a test case to generate more realistic incremental send streams
that force reallocation to occur during the receive.
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8067
Closes #8614
|
|
|
|
|
|
|
|
|
|
| |
Fix style issue for 'tx->tx_txg&TXG_MASK'. There should be white
space around the '&' character. Split the dnode_reallocate() ASSERT
to make it more readable to clearly separate the checks.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8606
|
|
|
|
|
|
|
|
|
|
|
| |
The cleanup function of auto_online_001_pos does not account for the
possibility that the test may fail while a disk is still removed. If
the test run is using real disks, cleanup should involve restoring any
that are missing.
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: John Kennedy <[email protected]>
Closes #8579
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a system sleeps during a zfs-test, the time spent
hibernating is counted against the test's runtime even
though the test can't and isn't running.
This patch tries to detect timeouts due to hibernation and
reruns tests that timed out due to system sleeping.
In this version of the patch, the existing behavior of returning
non-zero when a test was killed is preserved. With this patch applied
we still return nonzero and we also automatically rerun the test we
suspect of being killed due to system hibernation.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: John Kennedy <[email protected]>
Signed-off-by: Alek Pinchuk <[email protected]>
Closes #8575
|
|
|
|
|
|
|
|
|
|
|
|
| |
The error path in zio_crypt_key_unwrap would call zio_crypt_key_destroy which
calls rw_destroy(&key->zk_salt_lock); which has not yet been initialized.
We move the rw_init() call to the start of zio_crypt_key_unwrap instead.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Closes #8604
Closes #8605
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bulk[] array index, count, must be reset per-iteration in order to
not overwrite the stack.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Chris Dunlop <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #8072
Closes #8597
Closes #8601
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
POSIX doesn't define pthread_t as uint_t. It could be a pointer.
This code causes below compile error on a platform using pointer
for pthread_t.
--
kernel.c:815:25: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
(void) printf("%u ", (uint_t)pthread_self());
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8558
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
d12614521a("Fixes for procfs files backed by linked lists")
uses PDE_DATA(), but since PDE_DATA() (public interface which
replaced old public interface PDE()) first appeared in upstream
kernel 3.10, it lacks visible local definition for kernel < 3.10.
Move the local PDE_DATA() definition to a ZoL header, to unbreak
build on kernel < 3.10.
--
module/spl/spl-procfs-list.c: In function 'procfs_list_open':
module/spl/spl-procfs-list.c:166: error: implicit declaration of function 'PDE_DATA'
module/spl/spl-procfs-list.c:166: warning: assignment makes pointer from integer without a cast
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Gallagher <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8599
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit df583073 introduced the ability to list the snapshots for a
specified dataset. This change inadvertently resulted in only the top-
level snapshots being listed when no dataset was specified. Fix this
issue by adding an additional check to determine if a dataset was
provided to avoid incorrectly restricting the depth.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8591
Closes #8594
|
|
|
|
|
|
|
|
|
|
|
|
| |
The length used for the strlcpy() used the size of zv_value
when it should have used the size of zc_name. Correct this
typo.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8595
Closes #8596
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This partially reverts commit 5dbf8b4ed. This change resolved
the issues observed with truncated files in raw sends. However,
the required changes to dnode_allocate() introduced a regression
for non-raw streams which needs to be understood.
The additional debugging improvements from the original patch
were not reverted.
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #7378
Issue #8528
Issue #8540
Issue #8565
Close #8584
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a pool is initially created (by `zpool create`), predictive
prefetch is inadvertently disabled, until the pool is export/import-ed,
or the machine is rebooted.
When device removal was introduced, we added some code to disable
predictive prefetching until indirect vdevs have been loaded. This
resulted in the "default state" of prefetch being disabled, until we
proactively enable it after indirect vdevs are loaded. Unfortunately
this resulted in a few bugs where in some code paths we neglect to
enable predictive prefetch. The first of these was fixed by
https://github.com/zfsonlinux/zfs/commit/20507534d4ede14d4dd82c99fc8d461704ce7419
This commit fixes another case where we also need to explicitly enable
predictive prefetch, when the pool is initially created.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #8577
|
|
|
|
|
|
|
|
| |
The features.kernel layout should match features.pool.
Reviewed-by: Sara Hartse <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #8566
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are several places where we use zfs_dbgmsg and %p to
print pointers. In the Linux kernel, these values obfuscated
to prevent information leaks which means the pointers aren't
very useful for debugging crash dumps. We decided to restrict
the permissions of dbgmsg (and some other kstats while we were
at it) and print pointers with %px in zfs_dbgmsg as well as
spl_dumpstack
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Gallagher <[email protected]>
Signed-off-by: sara hartse <[email protected]>
Closes #8467
Closes #8476
|
|
|
|
|
|
|
|
|
| |
Also describe free/allocated/fragmentation
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Josh Soref <[email protected]>
Closes #7565
Closes #8483
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Callers of txg_wait_open() which set should_quiesce=B_TRUE should be
accounted for as iowait time. Otherwise, the caller is understood
to be idle and cv_wait_sig() is used to prevent incorrectly inflating
the system load average.
Similarly txg_wait_wait() has been updated to use cv_wait_io() to
be accounted against iowait.
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8550
Closes #8558
|
|
|
|
|
|
|
|
|
|
|
|
| |
Dracut depends on the environment variable BOOTFS to be set after pool
import. This dracut specific systemd ExecStartPost command should not be
called for any non-dracut systems, so let's move it to a static systemd
unit that.
Reviewed-by: Manuel Amador (Rudd-O) <[email protected]>
Reviewed-by: Matthew Thode <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Michael Niewöhner <[email protected]>
Closes #8510
|
|
|
|
|
|
|
|
|
|
|
| |
The macOS man app strenuously objects to blank lines in man files.
mdoc warning: Empty input line #xyz
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: bunder2015 <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Josh Soref <[email protected]>
Closes #8559
|
|
|
|
|
|
|
|
|
| |
Simply appends zhp->zfs_name to the "TIME SENT SNAPSHOT" output.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Richard Elling <[email protected]>
Signed-off-by: TerraTech <[email protected]>
Closes #8543
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, when attempting to list snapshots ZFS may do a lot of
extra work checking child datasets. This is because the code does
not realize that it will not be able to reach any snapshots
contained within snapshots that are at the depth limit since the
snapshots of those datasets are counted as an additional layer
deeper. This patch corrects this issue.
In addition, this patch adds the ability to do perform the commands:
$ zfs list -t snapshot <dataset>
$ zfs get -t snapshot <prop> <dataset>
as a convenient way to list out properties of all snapshots of a
given dataset without having to use the depth limit.
Reviewed-by: Alek Pinchuk <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8539
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On debian, systemd complains about missing /bin/awk because it
actually is located at /usr/bin/awk. It is not a good idea to
hardcode binary paths because different linux distros use different
paths. According to systemd's man page it is absolutely safe to
miss paths for binaries located at standard locations (/bin,
/sbin, /usr/bin, ...).
Further, replace this more or less complicated awk command by
grep.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Michael Niewöhner <[email protected]>
Issue #8510
|
|
|
|
|
|
|
|
|
|
| |
zfs-import-* services have a hard dependency on bash while not
everyone has bash installed. At this point /bin/sh is sufficient,
so use that.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Michael Niewöhner <[email protected]>
Issue #8510
|
|
|
|
|
|
|
|
|
|
| |
This patch simply clarifies some of the limitations related to
raw sends in the man page. No functional changes.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Jason Cohen <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8503
Closes #8544
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UNMAP/TRIM support is a frequently-requested feature to help
prevent performance from degrading on SSDs and on various other
SAN-like storage back-ends. By issuing UNMAP/TRIM commands for
sectors which are no longer allocated the underlying device can
often more efficiently manage itself.
This TRIM implementation is modeled on the `zpool initialize`
feature which writes a pattern to all unallocated space in the
pool. The new `zpool trim` command uses the same vdev_xlate()
code to calculate what sectors are unallocated, the same per-
vdev TRIM thread model and locking, and the same basic CLI for
a consistent user experience. The core difference is that
instead of writing a pattern it will issue UNMAP/TRIM commands
for those extents.
The zio pipeline was updated to accommodate this by adding a new
ZIO_TYPE_TRIM type and associated spa taskq. This new type makes
is straight forward to add the platform specific TRIM/UNMAP calls
to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are
handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs.
This makes it possible to largely avoid changing the pipieline,
one exception is that TRIM zio's may exceed the 16M block size
limit since they contain no data.
In addition to the manual `zpool trim` command, a background
automatic TRIM was added and is controlled by the 'autotrim'
property. It relies on the exact same infrastructure as the
manual TRIM. However, instead of relying on the extents in a
metaslab's ms_allocatable range tree, a ms_trim tree is kept
per metaslab. When 'autotrim=on', ranges added back to the
ms_allocatable tree are also added to the ms_free tree. The
ms_free tree is then periodically consumed by an autotrim
thread which systematically walks a top level vdev's metaslabs.
Since the automatic TRIM will skip ranges it considers too small
there is value in occasionally running a full `zpool trim`. This
may occur when the freed blocks are small and not enough time
was allowed to aggregate them. An automatic TRIM and a manual
`zpool trim` may be run concurrently, in which case the automatic
TRIM will yield to the manual TRIM.
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Contributions-by: Saso Kiselkov <[email protected]>
Contributions-by: Tim Chase <[email protected]>
Contributions-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8419
Closes #598
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, zfs send streams will include a list of all snapshots
on the source side if the '-p' option is provided. This can cause
performance problems on the receive side, especially if those
snapshots aren't present on the destination. These problems arise
because guid_to_name(), which is used for several receive side
functions, will search the entire receive-side pool if it can't
find a snapshot with a matching guid. This patch corrects the
issue by ensuring only streams that require this list of snapshots
include them.
Reviewed-by: Alek Pinchuk <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8533
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes a few issues with raw receives involving
truncated files:
* dnode_reallocate() now calls dnode_set_blksz() instead of
dnode_setdblksz(). This ensures that any remaining dbufs with
blkid 0 are resized along with their containing dnode upon
reallocation.
* One of the calls to dmu_free_long_range() in receive_object()
needs to check that the object it is about to free some contents
or hasn't been completely removed already by a previous call to
dmu_free_long_object() in the same function.
* The same call to dmu_free_long_range() in the previous point
needs to ensure it uses the object's current block size and
not the new block size. This ensures the blocks of the object
that are supposed to be freed are completely removed and not
simply partially zeroed out.
This patch also adds handling for DRR_OBJECT_RANGE records to
dprintf_drr() for debugging purposes.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #7378
Closes #8528
|
|
|
|
|
|
|
| |
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: John Kennedy <[email protected]>
Signed-off-by: Richard Elling <[email protected]>
Closes #8532
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The projectid_001_pos and projecttree_001_pos test cases use the lsattr
command to detect that the project quota bit is set correctly. Due to
a bug in e2fsprogs-1.44.4 setting the Project 'P' bit also results in
the Verity 'V' bit being reported as set. This will result in the test
case failing.
The issue has been resolved in e2fsprogs but in order to avoid testing
failures these two test cases are skipped when e2fsprogs-1.44.4 is
installed.
https://github.com/tytso/e2fsprogs/commit/7e5a95e3d
Reviewed-by: Richard Elling <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8534
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Evan Allrich <[email protected]>
Closes #8535
|
|
|
|
|
|
| |
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Elling <[email protected]>
Closes #8525
|
|
|
|
|
|
|
|
|
|
|
| |
Make a local copy of the vd_path and preserve the removal error
for use in spa_history_log_internal(). This is required because
after spa_vdev_exit() there is nothing preventing the vdev state
from changing.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Igor Kozhukhov <[email protected]>
Closes #8522
|
|
|
|
|
|
|
|
|
|
|
| |
Added missing remove of detachable VDEV from txg's DTL list
to avoid use-after-free for the split VDEV
Reviewed by: Pavel Zakharov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Signed-off-by: Roman Strashkin <[email protected]>
Closes #5565
Closes #7856
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZFS supports O_RSYNC for read operations and when specified will ensure
the same level of data integrity that O_DSYNC and O_SYNC provides for
writes. O_RSYNC by itself has no effect so it must be combined with
either O_DSYNC or O_SYNC. However, many platforms don't support O_RSYNC
and have mapped O_SYNC to mean O_RSYNC within ZFS. This is incorrect
and causes unnecessary calls to zil_commit. Only platforms which
support O_RSYNC should implement the zil_commit functionality in the
read code path.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Wilson <[email protected]>
Closes #8523
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When Multihost is enabled, and a pool is imported, uberblock writes
include ub_mmp_delay to allow an importing node to calculate the
duration of an activity test. This value, is not enough information.
If zfs_multihost_fail_intervals > 0 on the node with the pool imported,
the safe minimum duration of the activity test is well defined, but does
not depend on ub_mmp_delay:
zfs_multihost_fail_intervals * zfs_multihost_interval
and if zfs_multihost_fail_intervals == 0 on that node, there is no such
well defined safe duration, but the importing host cannot tell whether
mmp_delay is high due to I/O delays, or due to a very large
zfs_multihost_interval setting on the host which last imported the pool.
As a result, it may use a far longer period for the activity test than
is necessary.
This patch renames ub_mmp_sequence to ub_mmp_config and uses it to
record the zfs_multihost_interval and zfs_multihost_fail_intervals
values, as well as the mmp sequence. This allows a shorter activity
test duration to be calculated by the importing host in most situations.
These values are also added to the multihost_history kstat records.
It calculates the activity test duration differently depending on
whether the new fields are present or not; for importing pools with
only ub_mmp_delay, it uses
(zfs_multihost_interval + ub_mmp_delay) * zfs_multihost_import_intervals
Which results in an activity test duration less sensitive to the leaf
count.
In addition, it makes a few other improvements:
* It updates the "sequence" part of ub_mmp_config when MMP writes
in between syncs occur. This allows an importing host to detect MMP
on the remote host sooner, when the pool is idle, as it is not limited
to the granularity of ub_timestamp (1 second).
* It issues writes immediately when zfs_multihost_interval is changed
so remote hosts see the updated value as soon as possible.
* It fixes a bug where setting zfs_multihost_fail_intervals = 1 results
in immediate pool suspension.
* Update tests to verify activity check duration is based on recorded
tunable values, not tunable values on importing host.
* Update tests to verify the expected number of uberblocks have valid
MMP fields - fail_intervals, mmp_interval, mmp_seq (sequence number),
that sequence number is incrementing, and that uberblock values match
tunable settings.
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #7842
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In addition to dsl_dataset_evict_async() releasing a hold, there is
an error case in dsl_dataset_hold_obj() which had missed 4 additional
release calls. This was introduced in a1d477c24.
openzfsonosx-commit: https://github.com/openzfsonosx/zfs/commit/63ff7f1c
Authored by: Jorgen Lundman <[email protected]>
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Ported-by: Brian Behlendorf <[email protected]>
Closes #8517
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the buffer 'digest_buffer' is allocated in the qat_checksum()
stack, it can't ensure that the address is physically contiguous,
and the DMA result of the buffer may be handled incorrectly.
Using QAT_PHYS_CONTIG_ALLOC() ensures a physically
contiguous allocation.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Chengfei, Zhu <[email protected]>
Closes #8323
Closes #8521
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update the dirty check in dmu_offset_next() such that dnode's
are only considered dirty for the purpose or reporting holes
when there are pending data blocks or frees to be synced. This
ensures that when there are only metadata updates to be synced
(atime) that holes are reported.
Reviewed-by: Debabrata Banerjee <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #6958
Closes #8505
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) As implemented the `zpool labelclear` command overwrites
the calculated offsets of all four vdev labels even when only a
single valid label is found. If the device as been re-purposed
but still contains a valid label this can result in space no
longer owned by ZFS being zeroed. Prevent this by verifying
every label removed is intact before it's overwritten.
2) Address a small bug in zpool_do_labelclear() which prevented
labelclear from working on file vdevs. Only block devices support
BLKFLSBUF, try the ioctl() but when it's reported as unsupported
this should not be fatal.
3) Fix `zpool labelclear` so it can be run on vdevs which were
removed from the pool with `zpool remove`. Additionally, allow
intact but partial labels to be cleared as in the case of a failed
`zpool attach` or `zpool replace`.
4) Remove LABELCLEAR and LABELREAD variables for test cases.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8500
Closes #8373
Closes #6261
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As it turns out, on the Windows platform when rw_init() is called
(rather its bedrock call ExInitializeResourceLite) it is placed on
an active-list of locks, and is removed at rw_destroy() time.
dnode_move() has logic to copy over the old-dnode to new-dnode,
including calling dmu_zfetch_init(new-dnode). But due to the missing
dmu_zfetch_fini(old-dnode), kmem will call dnode_dest() to release the
memory (and in debug builds fill pattern 0xdeadbeef) over the Windows
active-lock's prev/next list pointers, making Windows sad.
But on other platforms, the contents of dmu_zfetch_fini() is one
call to list_destroy() and one to rw_destroy(), which is effectively
a no-op call and is not required. This commit is mostly for
"correctness" and can be skipped there.
Porting Notes:
* This leak exists on Linux but currently can never happen because
the dnode_move() functionality is not supported.
openzfsonosx-commit: openzfsonosx/zfs@d95fe517
Authored by: Julian Heuking <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Closes #8519
|
|
|
|
|
|
|
|
|
|
|
| |
This patch simply adds a missing space in the
ZFS_ERR_FROM_IVSET_GUID_MISSING error message.
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8514
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When destroying an arc_buf_hdr_t its identity cannot be discarded
until it is entirely undiscoverable. This not only includes being
unhashed, but also being removed from the l2arc header list.
Discarding the header's identify prematurely renders the hash
lock useless because it will always hash to bucket zero.
This change resolves a race with l2arc_evict() by discarding the
identity after it has been removed from the l2arc header list.
This ensures either the header is not on the list or contains
the correct identify.
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7688
Closes #8144
|