diff options
author | Brian Behlendorf <[email protected]> | 2018-12-04 09:37:37 -0800 |
---|---|---|
committer | GitHub <[email protected]> | 2018-12-04 09:37:37 -0800 |
commit | 7c9a42921e60dbad0e3003bd571591f073860233 (patch) | |
tree | 7dcdfdf535f286a9c3d4dc5f4996ed0e59c501c2 /man/man5/zfs-module-parameters.5 | |
parent | c40a1124e1d1010b665909ad31d2904630018f6f (diff) |
Detect IO errors during device removal
* Detect IO errors during device removal
While device removal cannot verify the checksums of individual
blocks during device removal, it can reasonably detect hard IO
errors from the leaf vdevs. Failure to perform this error
checking can result in device removal completing successfully,
but moving no data which will permanently corrupt the pool.
Situation 1: faulted/degraded vdevs
In the configuration shown below, the removal of mirror-0 will
permanently corrupt the pool. Device removal will preferentially
copy data from 'vdev1 -> vdev3' and from 'vdev2 -> vdev4'. Which
in this case will result in nothing being copied since one vdev
in each of those groups in unavailable. However, device removal
will complete successfully since all IO errors are ignored.
tank DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
/var/tmp/vdev1 FAULTED 0 0 0 external fault
/var/tmp/vdev2 ONLINE 0 0 0
mirror-1 DEGRADED 0 0 0
/var/tmp/vdev3 ONLINE 0 0 0
/var/tmp/vdev4 FAULTED 0 0 0 external fault
This issue is resolved by updating the source child selection
logic to exclude unreadable leaf vdevs. Additionally, unwritable
destination child vdevs which can never succeed are skipped to
prevent generating a large number of write IO errors.
Situation 2: individual hard IO errors
During removal if an unexpected hard IO error is encountered when
either reading or writing the child vdev the entire removal
operation is cancelled. While it may be possible to reconstruct
the data after removal that cannot be guaranteed. The only
strictly safe thing to do is to cancel the removal.
As a future improvement we may want to instead suspend the removal
process and allow the damaged region to be retried. But that work
is left for another time, hard IO errors during the removal process
are expected to be exceptionally rare.
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #6900
Closes #8161
Diffstat (limited to 'man/man5/zfs-module-parameters.5')
-rw-r--r-- | man/man5/zfs-module-parameters.5 | 16 |
1 files changed, 16 insertions, 0 deletions
diff --git a/man/man5/zfs-module-parameters.5 b/man/man5/zfs-module-parameters.5 index 72a850f4f..837a7d1c3 100644 --- a/man/man5/zfs-module-parameters.5 +++ b/man/man5/zfs-module-parameters.5 @@ -1985,6 +1985,22 @@ Use \fB1\fR for yes and \fB0\fR for no (default). .sp .ne 2 .na +\fBzfs_removal_ignore_errors\fR (int) +.ad +.RS 12n +.sp +Ignore hard IO errors during device removal. When set, if a device encounters +a hard IO error during the removal process the removal will not be cancelled. +This can result in a normally recoverable block becoming permanently damaged +and is not recommended. This should only be used as a last resort when the +pool cannot be returned to a healthy state prior to removing the device. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na \fBzfs_resilver_min_time_ms\fR (int) .ad .RS 12n |