diff options
author | Serapheim Dimitropoulos <[email protected]> | 2016-12-16 14:11:29 -0800 |
---|---|---|
committer | Brian Behlendorf <[email protected]> | 2018-06-26 10:07:42 -0700 |
commit | d2734cce68cf740e015312314415f9034c67851c (patch) | |
tree | b7a140a3cf2a19bb7c88f2d277f3b5a33c121cea /man | |
parent | 88eaf610d9c7056f0946e5090cba1e6288ff2b70 (diff) |
OpenZFS 9166 - zfs storage pool checkpoint
Details about the motivation of this feature and its usage can
be found in this blogpost:
https://sdimitro.github.io/post/zpool-checkpoint/
A lightning talk of this feature can be found here:
https://www.youtube.com/watch?v=fPQA8K40jAM
Implementation details can be found in big block comment of
spa_checkpoint.c
Side-changes that are relevant to this commit but not explained
elsewhere:
* renames members of "struct metaslab trees to be shorter without
losing meaning
* space_map_{alloc,truncate}() accept a block size as a
parameter. The reason is that in the current state all space
maps that we allocate through the DMU use a global tunable
(space_map_blksz) which defauls to 4KB. This is ok for metaslab
space maps in terms of bandwirdth since they are scattered all
over the disk. But for other space maps this default is probably
not what we want. Examples are device removal's vdev_obsolete_sm
or vdev_chedkpoint_sm from this review. Both of these have a
1:1 relationship with each vdev and could benefit from a bigger
block size.
Porting notes:
* The part of dsl_scan_sync() which handles async destroys has
been moved into the new dsl_process_async_destroys() function.
* Remove "VERIFY(!(flags & FWRITE))" in "kernel.c" so zhack can write
to block device backed pools.
* ZTS:
* Fix get_txg() in zpool_sync_001_pos due to "checkpoint_txg".
* Don't use large dd block sizes on /dev/urandom under Linux in
checkpoint_capacity.
* Adopt Delphix-OS's setting of 4 (spa_asize_inflation =
SPA_DVAS_PER_BP + 1) for the checkpoint_capacity test to speed
its attempts to fill the pool
* Create the base and nested pools with sync=disabled to speed up
the "setup" phase.
* Clear labels in test pool between checkpoint tests to avoid
duplicate pool issues.
* The import_rewind_device_replaced test has been marked as "known
to fail" for the reasons listed in its DISCLAIMER.
* New module parameters:
zfs_spa_discard_memory_limit,
zfs_remove_max_bytes_pause (not documented - debugging only)
vdev_max_ms_count (formerly metaslabs_per_vdev)
vdev_min_ms_count
Authored by: Serapheim Dimitropoulos <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: John Kennedy <[email protected]>
Reviewed by: Dan Kimmel <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Approved by: Richard Lowe <[email protected]>
Ported-by: Tim Chase <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
OpenZFS-issue: https://illumos.org/issues/9166
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7159fdb8
Closes #7570
Diffstat (limited to 'man')
-rw-r--r-- | man/man5/zfs-module-parameters.5 | 25 | ||||
-rw-r--r-- | man/man5/zpool-features.5 | 20 | ||||
-rw-r--r-- | man/man8/zdb.8 | 5 | ||||
-rw-r--r-- | man/man8/zpool.8 | 93 |
4 files changed, 140 insertions, 3 deletions
diff --git a/man/man5/zfs-module-parameters.5 b/man/man5/zfs-module-parameters.5 index fe58e6e59..9125b5b1f 100644 --- a/man/man5/zfs-module-parameters.5 +++ b/man/man5/zfs-module-parameters.5 @@ -293,7 +293,7 @@ Use \fB1\fR for yes (default) and \fB0\fR for no. .sp .ne 2 .na -\fBmetaslabs_per_vdev\fR (int) +\fBvdev_max_ms_count\fR (int) .ad .RS 12n When a vdev is added, it will be divided into approximately (but no more than) this number of metaslabs. @@ -304,6 +304,17 @@ Default value: \fB200\fR. .sp .ne 2 .na +\fBvdev_min_ms_count\fR (int) +.ad +.RS 12n +Minimum number of metaslabs to create in a top-level vdev. +.sp +Default value: \fB16\fR. +.RE + +.sp +.ne 2 +.na \fBmetaslab_preload_enabled\fR (int) .ad .RS 12n @@ -2103,6 +2114,18 @@ Default value: \fB2\fR. .sp .ne 2 .na +\fBzfs_spa_discard_memory_limit\fR (int) +.ad +.RS 12n +Maximum memory used for prefetching a checkpoint's space map on each +vdev while discarding the checkpoint. +.sp +Default value: \fB16,777,216\fR. +.RE + +.sp +.ne 2 +.na \fBzfs_sync_pass_dont_compress\fR (int) .ad .RS 12n diff --git a/man/man5/zpool-features.5 b/man/man5/zpool-features.5 index ce34a05a2..7cad9a27b 100644 --- a/man/man5/zpool-features.5 +++ b/man/man5/zpool-features.5 @@ -1,5 +1,5 @@ '\" te -.\" Copyright (c) 2013, 2016 by Delphix. All rights reserved. +.\" Copyright (c) 2013, 2017 by Delphix. All rights reserved. .\" Copyright (c) 2013 by Saso Kiselkov. All rights reserved. .\" Copyright (c) 2014, Joyent, Inc. All rights reserved. .\" The contents of this file are subject to the terms of the Common Development @@ -484,6 +484,24 @@ used on a top-level vdev, and will never return to being \fBenabled\fR. .sp .ne 2 .na +\fB\fBzpool_checkpoint\fR\fR +.ad +.RS 4n +.TS +l l . +GUID com.delphix:zpool_checkpoint +READ\-ONLY COMPATIBLE yes +DEPENDENCIES none +.TE + +This feature enables the "zpool checkpoint" subcommand that can +checkpoint the state of the pool at the time it was issued and later +rewind back to it or discard it. + +This feature becomes \fBactive\fR when the "zpool checkpoint" command +is used to checkpoint the pool. +The feature will only return back to being \fBenabled\fR when the pool +is rewound or the checkpoint has been discarded. \fB\fBlarge_blocks\fR\fR .ad .RS 4n diff --git a/man/man8/zdb.8 b/man/man8/zdb.8 index 0ce4d852d..f00d9c0cf 100644 --- a/man/man8/zdb.8 +++ b/man/man8/zdb.8 @@ -23,7 +23,7 @@ .Nd display zpool debugging and consistency information .Sh SYNOPSIS .Nm -.Op Fl AbcdDFGhiLMPsvX +.Op Fl AbcdDFGhikLMPsvX .Op Fl e Oo Fl V Oc Op Fl p Ar path ... .Op Fl I Ar inflight I/Os .Oo Fl o Ar var Ns = Ns Ar value Oc Ns ... @@ -172,6 +172,9 @@ Display information about intent log .Pq ZIL entries relating to each dataset. If specified multiple times, display counts of each intent log transaction type. +.It Fl k +Examine the checkpointed state of the pool. +Note, the on disk format of the pool is not reverted to the checkpointed state. .It Fl l Ar device Read the vdev labels from the specified device. .Nm Fl l diff --git a/man/man8/zpool.8 b/man/man8/zpool.8 index 228051809..ddcba2e72 100644 --- a/man/man8/zpool.8 +++ b/man/man8/zpool.8 @@ -47,6 +47,10 @@ .Oo Fl o Ar property Ns = Ns Ar value Oc .Ar pool device new_device .Nm +.Cm checkpoint +.Op Fl d, -discard +.Ar pool +.Nm .Cm clear .Ar pool .Op Ar device @@ -93,6 +97,7 @@ .Fl a .Op Fl DflmN .Op Fl F Oo Fl n Oc Oo Fl T Oc Oo Fl X Oc +.Op Fl -rewind-to-checkpoint .Op Fl c Ar cachefile Ns | Ns Fl d Ar dir Ns | Ns device .Op Fl o Ar mntopts .Oo Fl o Ar property Ns = Ns Ar value Oc Ns ... @@ -101,6 +106,7 @@ .Cm import .Op Fl Dflm .Op Fl F Oo Fl n Oc Oo Fl T Oc Oo Fl X Oc +.Op Fl -rewind-to-checkpoint .Op Fl c Ar cachefile Ns | Ns Fl d Ar dir Ns | Ns device .Op Fl o Ar mntopts .Oo Fl o Ar property Ns = Ns Ar value Oc Ns ... @@ -470,6 +476,50 @@ configuration. .Pp The content of the cache devices is considered volatile, as is the case with other system caches. +.Ss Pool checkpoint +Before starting critical procedures that include destructive actions (e.g +.Nm zfs Cm destroy +), an administrator can checkpoint the pool's state and in the case of a +mistake or failure, rewind the entire pool back to the checkpoint. +Otherwise, the checkpoint can be discarded when the procedure has completed +successfully. +.Pp +A pool checkpoint can be thought of as a pool-wide snapshot and should be used +with care as it contains every part of the pool's state, from properties to vdev +configuration. +Thus, while a pool has a checkpoint certain operations are not allowed. +Specifically, vdev removal/attach/detach, mirror splitting, and +changing the pool's guid. +Adding a new vdev is supported but in the case of a rewind it will have to be +added again. +Finally, users of this feature should keep in mind that scrubs in a pool that +has a checkpoint do not repair checkpointed data. +.Pp +To create a checkpoint for a pool: +.Bd -literal +# zpool checkpoint pool +.Ed +.Pp +To later rewind to its checkpointed state, you need to first export it and +then rewind it during import: +.Bd -literal +# zpool export pool +# zpool import --rewind-to-checkpoint pool +.Ed +.Pp +To discard the checkpoint from a pool: +.Bd -literal +# zpool checkpoint -d pool +.Ed +.Pp +Dataset reservations (controlled by the +.Nm reservation +or +.Nm refreservation +zfs properties) may be unenforceable while a checkpoint exists, because the +checkpoint is allowed to consume the dataset's reservation. +Finally, data that is part of the checkpoint but has been freed in the +current state of the pool won't be scanned during a scrub. .Ss Properties Each pool has several properties associated with it. Some properties are read-only statistics while others are configurable and @@ -856,6 +906,39 @@ supported at the moment is ashift. .El .It Xo .Nm +.Cm checkpoint +.Op Fl d, -discard +.Ar pool +.Xc +Checkpoints the current state of +.Ar pool +, which can be later restored by +.Nm zpool Cm import --rewind-to-checkpoint . +The existence of a checkpoint in a pool prohibits the following +.Nm zpool +commands: +.Cm remove , +.Cm attach , +.Cm detach , +.Cm split , +and +.Cm reguid . +In addition, it may break reservation boundaries if the pool lacks free +space. +The +.Nm zpool Cm status +command indicates the existence of a checkpoint or the progress of discarding a +checkpoint from a pool. +The +.Nm zpool Cm list +command reports how much space the checkpoint takes from the pool. +.Bl -tag -width Ds +.It Fl d, -discard +Discards an existing checkpoint from +.Ar pool . +.El +.It Xo +.Nm .Cm clear .Ar pool .Op Ar device @@ -1290,6 +1373,16 @@ and the .Sy altroot property to .Ar root . +.It Fl -rewind-to-checkpoint +Rewinds pool to the checkpointed state. +Once the pool is imported with this flag there is no way to undo the rewind. +All changes and data that were written after the checkpoint are lost! +The only exception is when the +.Sy readonly +mounting option is enabled. +In this case, the checkpointed state of the pool is opened and an +administrator can see how the pool would look like if they were +to fully rewind. .It Fl s Scan using the default search path, the libblkid cache will not be consulted. A custom search path may be specified by setting the |