| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
* zfs redact error messages do not end with newline character
* 30af21b0 inadvertently removed some ZFS_PROP comments
* man/zfs: zfs redact <redaction_snapshot> is not optional
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8988
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a volume is created in a pool with raidz vdevs and
volblocksize != 128k, the volume can reference more space than is
reserved with the automatically calculated refreservation. There
are two deficiencies in vol_volsize_to_reservation that contribute
to this:
1) Skip blocks may be added to keep each allocation a multiple
of parity + 1. This is the dominating factor when volblocksize
is close to 2^ashift.
2) raidz deflation for 128 KB blocks is different for most other
block sizes.
See "The theory of raidz space accounting" comment in
libzfs_dataset.c for a full explanation.
Authored by: Mike Gerdts <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Reviewed by: Sanjay Nadkarni <[email protected]>
Reviewed by: Jerry Jelinek <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Kody Kantor <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Approved by: Dan McDonald <[email protected]>
Ported-by: Mike Gerdts <[email protected]>
Porting Notes:
* ZTS: wait for zvols to exist before writing
* ZTS: use log_must_busy with {zpool|zfs} destroy
OpenZFS-issue: https://www.illumos.org/issues/9318
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/b73ccab0
Closes #8973
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the ability for the user to unload keys for
datasets as they are being unmounted. This is analogous to
'zfs mount -l'.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes: #8917
Closes: #8952
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ztest creates some extremely large files as part of its
operation. When zdb tries to dump a large enough file, it
can run out of memory or spend an extremely long time
attempting to print millions or billions of uint64_ts.
We cap the amount of data from a uint64 object that we
are willing to read and print.
Reviewed-by: Don Brady <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
External-issue: DLPX-53814
Closes #8947
|
|
|
|
|
|
|
|
|
| |
When used with verbosity >= 4 zdb fails an assertion in dump_bookmarks()
because it expects snprintf() to retun 0 on success.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8948
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "zfs remap" command was disabled by
6e91a72fe3ff8bb282490773bd687632f3e8c79d, because it has little utility
and introduced some tricky bugs. This commit removes the code for it,
the associated ZFS_IOC_REMAP ioctl, and tests.
Note that the ioctl and property will remain, but have no functionality.
This allows older software to fail gracefully if it attempts to use
these, and avoids a backwards incompatibility that would be introduced if
we renumbered the later ioctls/props.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #8944
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Resolve the incorrect use of srcdir and builddir references for
various files in the build system. These have crept in over time
and went unnoticed because when building in the top level directory
srcdir and builddir are identical.
With this change it's again possible to build in a subdirectory.
$ mkdir obj
$ cd obj
$ ../configure
$ make
Reviewed-by: loli10K <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8921
Closes #8943
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The zfs-mount service can unexpectedly fail to start when zfs
encounters a mount that is in progress. This service uses
zfs mount -a, which has a window between the time it checks if
the dataset was mounted and when the actual mount (via mount.zfs
binary) occurs.
The reason for the racing mounts is that both zfs-mount.target
and zfs-share.target are allowed to execute concurrently after
the import. This is more of an issue with the relatively recent
addition of parallel mounting, and we should consider serializing
the mount and share targets.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: John Kennedy <[email protected]>
Reviewed-by: Allan Jude <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #8881
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Count the bytes of payload for each replication record type
Count the bytes of overhead (replication records themselves)
Include these counters in the output summary at the end of the run.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Allan Jude <[email protected]>
Sponsored-By: Klara Systems and Catalogic
Closes #8432
|
|
|
|
|
|
|
|
|
|
|
| |
Since 30af21b0 was merged 'zfs send' help message format is broken
and lists "-r" as a valid option: this commit corrects these
small issues.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8942
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If dedup is in use, the `dedupditto` property can be set, causing ZFS to
keep an extra copy of data that is referenced many times (>100x). The
idea was that this data is more important than other data and thus we
want to be really sure that it is not lost if the disk experiences a
small amount of random corruption.
ZFS (and system administrators) rely on the pool-level redundancy to
protect their data (e.g. mirroring or RAIDZ). Since the user/sysadmin
doesn't have control over what data will be offered extra redundancy by
dedupditto, this extra redundancy is not very useful. The bulk of the
data is still vulnerable to loss based on the pool-level redundancy.
For example, if particle strikes corrupt 0.1% of blocks, you will either
be saved by mirror/raidz, or you will be sad. This is true even if
dedupditto saved another 0.01% of blocks from being corrupted.
Therefore, the dedupditto functionality is rarely enabled (i.e. the
property is rarely set), and it fulfills its promise of increased
redundancy even more rarely.
Additionally, this feature does not work as advertised (on existing
releases), because scrub/resilver did not repair the extra (dedupditto)
copy (see https://github.com/zfsonlinux/zfs/pull/8270).
In summary, this seldom-used feature doesn't work, and even if it did it
wouldn't provide useful data protection. It has a non-trivial
maintenance burden (again see https://github.com/zfsonlinux/zfs/pull/8270).
We should remove the dedupditto functionality. For backwards
compatibility with the existing CLI, "zpool set dedupditto" will still
"succeed" (exit code zero), but won't have any effect. For backwards
compatibility with existing pools that had dedupditto enabled at some
point, the code will still be able to understand dedupditto blocks and
free them when appropriate. However, ZFS won't write any new dedupditto
blocks.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Issue #8270
Closes #8310
|
|
|
|
|
|
|
|
| |
Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Richard Elling <[email protected]>
Signed-off-by: Michael Niewöhner <[email protected]>
Closes #8897
Closes #8911
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Prashanth Sreenivasa <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Chris Williamson <[email protected]>
Reviewed-by: Pavel Zhakarov <[email protected]>
Reviewed-by: Sebastien Roy <[email protected]>
Reviewed-by: Prakash Surya <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes #7958
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've observed that on some highly fragmented pools, most metaslab
allocations are small (~2-8KB), but there are some large, 128K
allocations. The large allocations are for ZIL blocks. If there is a
lot of fragmentation, the large allocations can be hard to satisfy.
The most common impact of this is that we need to check (and thus load)
lots of metaslabs from the ZIL allocation code path, causing sync writes
to wait for metaslabs to load, which can take a second or more. In the
worst case, we may not be able to satisfy the allocation, in which case
the ZIL will resort to txg_wait_synced() to ensure the change is on
disk.
To provide a workaround for this, this change adds a tunable that can
reduce the size of ZIL blocks.
External-issue: DLPX-61719
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #8865
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This matches the behavior of other python scripts, such as arcstat and
dbufstat, which are always installed but whose install-exec-hook actions
will simply touch up the shebang if a python interpreter was configured
*and* that interpreter is a python2 interpreter.
Fixes installation in a minimal build chroot without python available.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Eli Schwartz <[email protected]>
Closes #8851
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, --without-python would cause ./configure to fail. Now it is
able to proceed, and the Python scripts will not be built.
Use portable parameter expansion matching instead of nonstandard
substring matching to detect the Python version. This test is
duplicated in several places, so define a function for it.
Don't assume the full path to binaries, since different platforms do
install things in different places. Use AC_CHECK_PROGS instead.
When building without Python, also build without pyzfs.
Sponsored by: iXsystems, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Eli Schwartz <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #8809
Closes #8731
|
|
|
|
|
|
|
| |
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Chris Dunlop <[email protected]>
Signed-off-by: Josh Soref <[email protected]>
Closes #8818
|
|
|
|
|
|
|
|
|
|
|
|
| |
s/get_vdev_spec/make_root_vdev
The former doesn't exist anymore.
Sponsored by: iXsystems, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #8759
|
|
|
|
|
|
|
|
|
|
|
|
| |
On systems where "char" is an unsigned type the value returned by
getopt() will never be negative (-1), leading to an endless loop:
this issue prevents both 'zpool remove' and 'zstreamdump' for
working on some systems.
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Chris Dunlop <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8789
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, 'zfs mount -a' will always attempt to parallelize
work related to mounting as best it can. Unfortunately, when
the user passes the '-l' option to load keys, this causes
all threads to prompt the user for their keys at once,
resulting in a confusing and racy user experience. This patch
simply disables parallel mounting when using the '-l' flag.
Reviewed by: Sebastien Roy <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8762
Closes #8811
|
|
|
|
|
|
|
|
|
| |
This commit adds the undocumented "-t" option to zpool(8) help message.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Chris Dunlop <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8782
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit simply adds a missing newline ("\n") character to the error
message printed by the zfs command when the provided pool parameter
can't be found.
Reviewed-by: Chris Dunlop <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8783
|
|
|
|
|
|
|
|
|
|
| |
This commit removes the documented but not handled "-p" option from
zpool(8) help message.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Chris Dunlop <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #8781
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code never returned match comparing two datasets (not snapshots).
As result, uu_avl_find(), called from zfs_callback(), never succeeded,
allowing to add same dataset into the list multiple times, for example:
# zfs get name pers pers pers@z pers@z
NAME PROPERTY VALUE SOURCE
pers name pers -
pers name pers -
pers@z name pers@z -
With the patch:
# zfs get name pers pers pers@z pers@z
NAME PROPERTY VALUE SOURCE
pers name pers -
pers@z name pers@z -
Reviewed by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes #8723
|
|
|
|
|
|
|
|
| |
Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Justin Scholz <[email protected]>
Closes #8712
Closes #8721
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Users of existing pools, especially pools with top-level encrypted
datasets, could run into trouble trying to work around Errata #4.
Clarify that removing encrypted snapshots and bookmarks is enough
to clear the errata.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Justin Scholz <[email protected]>
Closes #8682
Closes #8683
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, it is possible for the 'zpool scrub' command to
progress slightly beyond 100% due to concurrent changes
happening on the live pool. This behavior is expected, but
the userspace code for 'zpool status' would subtract the
expected amount of data from the amount of data already
scrubbed, resulting in a negative integer being casted to a
large positive one. This number was then used to calculate
the estimated completion time, resulting in wildly wrong
results. This code changes the behavior so that 'zpool status'
does not attempt to report an estimate during this period.
Reviewed by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8611
Closes #8687
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sigignore(3) isn't portable.
This code fails to compile on platforms without sigignore(3).
Use sigaction(2).
--
zfs_main.c: In function 'zfs_do_diff':
zfs_main.c:7178:9: error: implicit declaration of function 'sigignore' [-Werror=implicit-function-declaration]
(void) sigignore(SIGPIPE);
^~~~~~~~~
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tomohiro Kusumi <[email protected]>
Closes #8593
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the 'zfs version' and 'zpool version' subcommands to display
the version of the user space utilities and loaded zfs kernel
module. For example:
$ zfs version
zfs-0.8.0-rc3_169_g67e0366b88
zfs-kmod-0.8.0-rc3_169_g67e0366b88
The '-V' and '--version' aliases were added to support the
common convention of using 'zfs --version` to obtain the version
information.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Signed-off-by: TerraTech <[email protected]>
Closes #2501
Closes #8567
|
|
|
|
|
|
|
|
|
| |
This patch simply up cleans up a nit and corrects an error message
issue that were introduced in the Multiple DVA scrub patch.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8619
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit df583073 introduced the ability to list the snapshots for a
specified dataset. This change inadvertently resulted in only the top-
level snapshots being listed when no dataset was specified. Fix this
issue by adding an additional check to determine if a dataset was
provided to avoid incorrectly restricting the depth.
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8591
Closes #8594
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are several places where we use zfs_dbgmsg and %p to
print pointers. In the Linux kernel, these values obfuscated
to prevent information leaks which means the pointers aren't
very useful for debugging crash dumps. We decided to restrict
the permissions of dbgmsg (and some other kstats while we were
at it) and print pointers with %px in zfs_dbgmsg as well as
spl_dumpstack
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Gallagher <[email protected]>
Signed-off-by: sara hartse <[email protected]>
Closes #8467
Closes #8476
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, when attempting to list snapshots ZFS may do a lot of
extra work checking child datasets. This is because the code does
not realize that it will not be able to reach any snapshots
contained within snapshots that are at the depth limit since the
snapshots of those datasets are counted as an additional layer
deeper. This patch corrects this issue.
In addition, this patch adds the ability to do perform the commands:
$ zfs list -t snapshot <dataset>
$ zfs get -t snapshot <prop> <dataset>
as a convenient way to list out properties of all snapshots of a
given dataset without having to use the depth limit.
Reviewed-by: Alek Pinchuk <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8539
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UNMAP/TRIM support is a frequently-requested feature to help
prevent performance from degrading on SSDs and on various other
SAN-like storage back-ends. By issuing UNMAP/TRIM commands for
sectors which are no longer allocated the underlying device can
often more efficiently manage itself.
This TRIM implementation is modeled on the `zpool initialize`
feature which writes a pattern to all unallocated space in the
pool. The new `zpool trim` command uses the same vdev_xlate()
code to calculate what sectors are unallocated, the same per-
vdev TRIM thread model and locking, and the same basic CLI for
a consistent user experience. The core difference is that
instead of writing a pattern it will issue UNMAP/TRIM commands
for those extents.
The zio pipeline was updated to accommodate this by adding a new
ZIO_TYPE_TRIM type and associated spa taskq. This new type makes
is straight forward to add the platform specific TRIM/UNMAP calls
to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are
handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs.
This makes it possible to largely avoid changing the pipieline,
one exception is that TRIM zio's may exceed the 16M block size
limit since they contain no data.
In addition to the manual `zpool trim` command, a background
automatic TRIM was added and is controlled by the 'autotrim'
property. It relies on the exact same infrastructure as the
manual TRIM. However, instead of relying on the extents in a
metaslab's ms_allocatable range tree, a ms_trim tree is kept
per metaslab. When 'autotrim=on', ranges added back to the
ms_allocatable tree are also added to the ms_free tree. The
ms_free tree is then periodically consumed by an autotrim
thread which systematically walks a top level vdev's metaslabs.
Since the automatic TRIM will skip ranges it considers too small
there is value in occasionally running a full `zpool trim`. This
may occur when the freed blocks are small and not enough time
was allowed to aggregate them. An automatic TRIM and a manual
`zpool trim` may be run concurrently, in which case the automatic
TRIM will yield to the manual TRIM.
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Contributions-by: Saso Kiselkov <[email protected]>
Contributions-by: Tim Chase <[email protected]>
Contributions-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8419
Closes #598
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When Multihost is enabled, and a pool is imported, uberblock writes
include ub_mmp_delay to allow an importing node to calculate the
duration of an activity test. This value, is not enough information.
If zfs_multihost_fail_intervals > 0 on the node with the pool imported,
the safe minimum duration of the activity test is well defined, but does
not depend on ub_mmp_delay:
zfs_multihost_fail_intervals * zfs_multihost_interval
and if zfs_multihost_fail_intervals == 0 on that node, there is no such
well defined safe duration, but the importing host cannot tell whether
mmp_delay is high due to I/O delays, or due to a very large
zfs_multihost_interval setting on the host which last imported the pool.
As a result, it may use a far longer period for the activity test than
is necessary.
This patch renames ub_mmp_sequence to ub_mmp_config and uses it to
record the zfs_multihost_interval and zfs_multihost_fail_intervals
values, as well as the mmp sequence. This allows a shorter activity
test duration to be calculated by the importing host in most situations.
These values are also added to the multihost_history kstat records.
It calculates the activity test duration differently depending on
whether the new fields are present or not; for importing pools with
only ub_mmp_delay, it uses
(zfs_multihost_interval + ub_mmp_delay) * zfs_multihost_import_intervals
Which results in an activity test duration less sensitive to the leaf
count.
In addition, it makes a few other improvements:
* It updates the "sequence" part of ub_mmp_config when MMP writes
in between syncs occur. This allows an importing host to detect MMP
on the remote host sooner, when the pool is idle, as it is not limited
to the granularity of ub_timestamp (1 second).
* It issues writes immediately when zfs_multihost_interval is changed
so remote hosts see the updated value as soon as possible.
* It fixes a bug where setting zfs_multihost_fail_intervals = 1 results
in immediate pool suspension.
* Update tests to verify activity check duration is based on recorded
tunable values, not tunable values on importing host.
* Update tests to verify the expected number of uberblocks have valid
MMP fields - fail_intervals, mmp_interval, mmp_seq (sequence number),
that sequence number is incrementing, and that uberblock values match
tunable settings.
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #7842
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) As implemented the `zpool labelclear` command overwrites
the calculated offsets of all four vdev labels even when only a
single valid label is found. If the device as been re-purposed
but still contains a valid label this can result in space no
longer owned by ZFS being zeroed. Prevent this by verifying
every label removed is intact before it's overwritten.
2) Address a small bug in zpool_do_labelclear() which prevented
labelclear from working on file vdevs. Only block devices support
BLKFLSBUF, try the ioctl() but when it's reported as unsupported
this should not be fatal.
3) Fix `zpool labelclear` so it can be run on vdevs which were
removed from the pool with `zpool remove`. Additionally, allow
intact but partial labels to be cleared as in the case of a failed
`zpool attach` or `zpool replace`.
4) Remove LABELCLEAR and LABELREAD variables for test cases.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8500
Closes #8373
Closes #6261
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, there is an issue in the sequential scrub code which
prevents self healing from working in some cases. The scrub code
will split up all DVA copies of a bp and issue each of them
separately. The problem is that, since each of the DVAs is no
longer associated with the others, the self healing code doesn't
have the opportunity to repair problems that show up in one of the
DVAs with the data from the others.
This patch fixes this issue by ensuring that all IOs issued by the
sequential scrub code include all DVAs. Initially, only the first
DVA of each is attempted. If an issue arises, the IO is retried
with all available copies, giving the self healing code a chance
to correct the issue.
To test this change, this patch also adds the ability for zinject
to specify individual DVAs to inject read errors into. We then
add a new test case that utilizes this functionality to ensure
scrubs and self-healing reads can handle and transparently fix
issues with individual copies of blocks.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8453
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch attempts to address some user concerns that have arisen
since errata 4 was introduced.
* The errata warning has been made less scary for users without
any encrypted datasets.
* The errata warning now clears itself without a pool reimport if
the bookmark_v2 feature is enabled and no encrypted datasets
exist.
* It is no longer possible to create new encrypted datasets without
enabling the bookmark_v2 feature, thus helping to ensure that the
errata is resolved.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Issue ##8308
Closes #8504
|
|
|
|
|
|
|
|
|
| |
Update zed.rc values reflect their default value. This helps
avoid confusion if a user expects functionality to be enabled.
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Kash Pande <[email protected]>
Closes #8498
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the verbose output of zstreamdump includes new line
characters within some individual records. Presumably, this was
originally done to keep the output from getting too wide to fit
on a terminal. However, since new flags and struct members have
been added, these rules have not been maintained consistently. In
addition, these newlines can make it hard to grep the output in
some scenarios. This patch simply removes these newlines, making
the output easier to grep and removing the inconsistency.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed by: Allan Jude <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8493
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, there is an issue in the raw receive code where
raw receives are allowed to happen on top of previously
non-raw received datasets. This is a problem because the
source-side dataset doesn't know about how the blocks on
the destination were encrypted. As a result, any MAC in
the objset's checksum-of-MACs tree that is a parent of both
blocks encrypted on the source and blocks encrypted by the
destination will be incorrect. This will result in
authentication errors when we decrypt the dataset.
This patch fixes this issue by adding a new check to the
raw receive code. The code now maintains an "IVset guid",
which acts as an identifier for the set of IVs used to
encrypt a given snapshot. When a snapshot is raw received,
the destination snapshot will take this value from the
DRR_BEGIN payload. Non-raw receives and normal "zfs snap"
operations will cause ZFS to generate a new IVset guid.
When a raw incremental stream is received, ZFS will check
that the "from" IVset guid in the stream matches that of
the "from" destination snapshot. If they do not match, the
code will error out the receive, preventing the problem.
This patch requires an on-disk format change to add the
IVset guids to snapshots and bookmarks. As a result, this
patch has errata handling and a tunable to help affected
users resolve the issue with as little interruption as
possible.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #8308
|
|
|
|
|
|
|
|
|
|
| |
This is a simple fix for a typo ("perfetch" rather than "prefetch")
in arc_summary3.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Jason Cohen <[email protected]>
Closes #8499
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch modifies the zfs_ioc_snapshot_list_next() ioctl to enable it
to take input parameters that alter the way looping through the list of
snapshots is performed. The idea here is to restrict functions that
throw away some of the snapshots returned by the ioctl to a range of
snapshots that these functions actually use. This improves efficiency
and execution speed for some rollback and send operations.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Signed-off-by: Alek Pinchuk <[email protected]>
Closes #8077
|
|
|
|
|
|
|
|
|
|
|
| |
When multihost is enabled, and a pool is suspended, return
EINVAL in response to "zpool clear <pool>". The pool
may have been imported on another host while I/O was suspended.
Reviewed-by: loli10K <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes #6933
Closes #8460
|
|
|
|
|
|
|
|
|
| |
When feeding a replication stream to `zstreamdump -d` (raw dump mode),
it does not print the raw data for DRR_WRITE_EMBEDDED records.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Signed-off-by: Allan Jude <[email protected]>
Closes #8430
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "-t" argument to "zfs program" specifies a limit on the number of
LUA instructions that can be executed. The zfs.8 manpage has the wrong
description. It should be updated to match what's in zfs-program.8
Also fix the formatting of the zfs help message.
Reviewed by: Allan Jude <[email protected]>
Reviewed-by: loli10K <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #8410
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add -h switch to zfs send command to send dataset holds. If
holds are present in the stream, zfs receive will create them
on the target dataset, unless the zfs receive -h option is used
to skip receive of holds.
Reviewed-by: Alek Pinchuk <[email protected]>
Reviewed-by: loli10K <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Signed-off-by: Paul Zuchowski <[email protected]>
Closes #7513
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
with builds on illumos based platform we can see build issue
because label_t has been redefined.
for reduce build issues on others platforms we should rename
label_t to zdb_label_t.
Reviewed-by: loli10K <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Igor Kozhukhov <[email protected]>
Closes #8397
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Initially, metaslabs and space maps used to be the same thing
in ZFS. Later, we started differentiating them by referring
to the space map as the on-disk state of the metaslab, making
the metaslab a higher-level concept that is metadata that deals
with space accounting. Today we've managed to split that code
furthermore, with the space map being its own on-disk data
structure used in areas of ZFS besides metaslabs (e.g. the
vdev-wide space maps used for zpool checkpoint or vdev removal
features).
This patch refactors the space map code to further split the
space map code from the metaslab code. It does so by getting
rid of the idea that the space map can have a different in-core
and on-disk length (sm_length vs smp_length) which is something
that is only used for the metaslab code, and other consumers
of space maps just have to deal with. Instead, this patch
introduces changes that move the old in-core length of the
metaslab's space map to the metaslab structure itself (see
ms_synced_length field) while making the space map code only
care about the actual space map's length on-disk.
The result of this is that space map consumers no longer have
to deal with syncing two different lengths for the same
structure (e.g. space_map_update() goes away) while metaslab
specific behavior stays within the metaslab code. Specifically,
the ms_synced_length field keeps track of the amount of data
metaslab_load() can read from the metaslab's space map while
working concurrently with metaslab_sync() that may be
appending to that same space map.
As a side note, the patch also adds a few comments around
the metaslab code documenting some assumptions and expected
behavior.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Pavel Zakharov <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #8328
|
|
|
|
|
|
|
|
|
|
| |
note: which is non-standard. Use builtin 'command -v' instead. [SC2230]
note: Use -n instead of ! -z. [SC2236]
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: bunder2015 <[email protected]>
Closes #8367
|