Fix dRAID sequential resilver silent damage handling

This change addresses two distinct scenarios which are possible when performing a sequential resilver to a dRAID pool with vdevs that contain silent unknown damage. Which in this circumstance took the form of the devices being intentionally overwritten with zeros. However, it could also result from a device returning incorrect data while a sequential resilver was in progress. Scenario 1) A sequential resilver is performed while all of the dRAID vdevs are ONLINE and there is silent damage present on the vdev being resilvered. In this case, nothing will be repaired by vdev_raidz_io_done_reconstruct_known_missing() because rc->rc_error isn't set on any of the raid columns. To address this vdev_draid_io_start_read() has been updated to always mark the resilvering column as ESTALE for sequential resilver IO. Scenario 2) Multiple columns contain silent damage for the same block and a sequential resilver is performed. In this case it's impossible to generate the correct data from parity unless all of the damaged columns are being sequentially resilvered (and thus only good data is used to generate parity). This is as expected and there's nothing which can be done about it. However, we need to be careful not to make to situation worse. Since we can't verify the data is actually good without a checksum, we must only repair the devices which are being sequentially resilvered. Otherwise, an incorrect repair to a device which previously contained good data could effectively lock in the damage and make reconstruction impossible. A check for this was added to vdev_raidz_io_done_verified() along with a new test case. Lastly, this change updates the redundancy_draid_spare1 and redundancy_draid_spare3 test cases to be more representative of normal dRAID replacement operation. Specifically, what we care about is that the scrub run after a sequential resilver does not find additional blocks which need repair. This would indicate the sequential resilver failed to rebuild a section of one of the devices. Note also the tests were switched to using the verify_pool() function which still checks for checksum errors. Reviewed-by: Mark Maybee <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12061
author: Brian Behlendorf <[email protected]> 2021-05-20 15:05:26 -0700
committer: Brian Behlendorf <[email protected]> 2021-05-27 22:31:56 -0700
commit: 33a06f27e65d0976ed227ea4af417d71d1c4a964 (patch)
tree: a4a44da55d9ef8b1836c5156e1e4eb66b20ffc16 /tests
parent: 7ad8fc54079ba9b0870a67f23d91ebf9973c1cc6 (diff)
5 files changed, 190 insertions, 49 deletions
diff --git a/tests/runfiles/common.run b/tests/runfiles/common.run
index 7d8c2d322..703407a7d 100644
--- a/tests/runfiles/common.run
+++ b/tests/runfiles/common.run
@@ -742,10 +742,10 @@ tags = ['functional', 'raidz']
 
 [tests/functional/redundancy]
 tests = ['redundancy_draid', 'redundancy_draid1', 'redundancy_draid2',
-    'redundancy_draid3', 'redundancy_draid_spare1', 'redundancy_draid_spare2',
-    'redundancy_draid_spare3', 'redundancy_mirror', 'redundancy_raidz',
-    'redundancy_raidz1', 'redundancy_raidz2', 'redundancy_raidz3',
-    'redundancy_stripe']
+    'redundancy_draid3', 'redundancy_draid_damaged', 'redundancy_draid_spare1',
+    'redundancy_draid_spare2', 'redundancy_draid_spare3', 'redundancy_mirror',
+    'redundancy_raidz', 'redundancy_raidz1', 'redundancy_raidz2',
+    'redundancy_raidz3', 'redundancy_stripe']
 tags = ['functional', 'redundancy']
 timeout = 1200
 
diff --git a/tests/zfs-tests/tests/functional/redundancy/Makefile.am b/tests/zfs-tests/tests/functional/redundancy/Makefile.am
index ac323c893..42c11c4aa 100644
--- a/tests/zfs-tests/tests/functional/redundancy/Makefile.am
+++ b/tests/zfs-tests/tests/functional/redundancy/Makefile.am
@@ -6,6 +6,7 @@ dist_pkgdata_SCRIPTS = \
 	redundancy_draid1.ksh \
 	redundancy_draid2.ksh \
 	redundancy_draid3.ksh \
+	redundancy_draid_damaged.ksh \
 	redundancy_draid_spare1.ksh \
 	redundancy_draid_spare2.ksh \
 	redundancy_draid_spare3.ksh \
diff --git a/tests/zfs-tests/tests/functional/redundancy/redundancy_draid_damaged.ksh b/tests/zfs-tests/tests/functional/redundancy/redundancy_draid_damaged.ksh
new file mode 100755
index 000000000..6796cc78a
--- /dev/null
+++ b/tests/zfs-tests/tests/functional/redundancy/redundancy_draid_damaged.ksh
@@ -0,0 +1,153 @@
+#!/bin/ksh -p
+#
+# CDDL HEADER START
+#
+# The contents of this file are subject to the terms of the
+# Common Development and Distribution License (the "License").
+# You may not use this file except in compliance with the License.
+#
+# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
+# or http://www.opensolaris.org/os/licensing.
+# See the License for the specific language governing permissions
+# and limitations under the License.
+#
+# When distributing Covered Code, include this CDDL HEADER in each
+# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
+# If applicable, add the following below this CDDL HEADER, with the
+# fields enclosed by brackets "[]" replaced with your own identifying
+# information: Portions Copyright [yyyy] [name of copyright owner]
+#
+# CDDL HEADER END
+#
+
+#
+# Copyright (c) 2021 by Lawrence Livermore National Security, LLC.
+#
+
+. $STF_SUITE/include/libtest.shlib
+. $STF_SUITE/tests/functional/redundancy/redundancy.kshlib
+
+#
+# DESCRIPTION:
+#	When sequentially resilvering a dRAID pool with multiple vdevs
+#	that contain silent damage a sequential resilver should never
+#	introduce additional unrecoverable damage.
+#
+# STRATEGY:
+#	1. Create block device files for the test draid pool
+#	2. For each parity value [1..3]
+#	    - create draid pool
+#	    - fill it with some directories/files
+#	    - overwrite the maximum number of repairable devices
+#	    - sequentially resilver each overwritten device one at a time;
+#	      the device will not be correctly repaired because the silent
+#	      damage on the other vdevs will cause the parity calculations
+#	      to generate incorrect data for the resilvering vdev.
+#	    - verify that only the resilvering devices had invalid data
+#	      written and that a scrub is still able to repair the pool
+#	    - destroy the draid pool
+#
+
+typeset -r devs=7
+typeset -r dev_size_mb=512
+
+typeset -a disks
+
+prefetch_disable=$(get_tunable PREFETCH_DISABLE)
+rebuild_scrub_enabled=$(get_tunable REBUILD_SCRUB_ENABLED)
+
+function cleanup
+{
+	poolexists "$TESTPOOL" && destroy_pool "$TESTPOOL"
+
+	for i in {0..$devs}; do
+		rm -f "$TEST_BASE_DIR/dev-$i"
+	done
+
+	set_tunable32 PREFETCH_DISABLE $prefetch_disable
+	set_tunable32 REBUILD_SCRUB_ENABLED $rebuild_scrub_enabled
+}
+
+function test_sequential_resilver # <pool> <parity> <dir>
+{
+	typeset pool=$1
+	typeset nparity=$2
+	typeset dir=$3
+
+	log_must zpool export $pool
+
+	for (( i=0; i<$nparity; i=i+1 )); do
+		log_must dd conv=notrunc if=/dev/zero of=$dir/dev-$i \
+		    bs=1M seek=4 count=$(($dev_size_mb-4))
+	done
+
+	log_must zpool import -o cachefile=none -d $dir $pool
+
+	for (( i=0; i<$nparity; i=i+1 )); do
+		spare=draid${nparity}-0-$i
+		log_must zpool replace -fsw $pool $dir/dev-$i $spare
+	done
+
+	log_must zpool scrub -w $pool
+
+	# When only a single child was overwritten the sequential resilver
+	# can fully repair the damange from parity and the scrub will have
+	# nothing to repair. When multiple children are silently damaged
+	# the sequential resilver will calculate the wrong data since only
+	# the parity information is used and it cannot be verified with
+	# the checksum. However, since only the resilvering devices are
+	# written to with the bad data a subsequent scrub will be able to
+	# fully repair the pool.
+	#
+	if [[ $nparity == 1 ]]; then
+		log_must check_pool_status $pool "scan" "repaired 0B"
+	else
+		log_mustnot check_pool_status $pool "scan" "repaired 0B"
+	fi
+
+	log_must check_pool_status $pool "errors" "No known data errors"
+	log_must check_pool_status $pool "scan" "with 0 errors"
+}
+
+log_onexit cleanup
+
+log_must set_tunable32 PREFETCH_DISABLE 1
+log_must set_tunable32 REBUILD_SCRUB_ENABLED 0
+
+# Disk files which will be used by pool
+for i in {0..$(($devs - 1))}; do
+	device=$TEST_BASE_DIR/dev-$i
+	log_must truncate -s ${dev_size_mb}M $device
+	disks[${#disks[*]}+1]=$device
+done
+
+# Disk file which will be attached
+log_must truncate -s 512M $TEST_BASE_DIR/dev-$devs
+
+for nparity in 1 2 3; do
+	raid=draid${nparity}:${nparity}s
+	dir=$TEST_BASE_DIR
+
+	log_must zpool create -f -o cachefile=none $TESTPOOL $raid ${disks[@]}
+	log_must zfs set primarycache=metadata $TESTPOOL
+
+	log_must zfs create $TESTPOOL/fs
+	log_must fill_fs /$TESTPOOL/fs 1 512 100 1024 R
+
+	log_must zfs create -o compress=on $TESTPOOL/fs2
+	log_must fill_fs /$TESTPOOL/fs2 1 512 100 1024 R
+
+	log_must zfs create -o compress=on -o recordsize=8k $TESTPOOL/fs3
+	log_must fill_fs /$TESTPOOL/fs3 1 512 100 1024 R
+
+	log_must zpool export $TESTPOOL
+	log_must zpool import -o cachefile=none -d $dir $TESTPOOL
+
+	log_must check_pool_status $TESTPOOL "errors" "No known data errors"
+
+	test_sequential_resilver $TESTPOOL $nparity $dir
+
+	log_must zpool destroy "$TESTPOOL"
+done
+
+log_pass "draid damaged device(s) test succeeded."
diff --git a/tests/zfs-tests/tests/functional/redundancy/redundancy_draid_spare1.ksh b/tests/zfs-tests/tests/functional/redundancy/redundancy_draid_spare1.ksh
index 3b7951596..8acee1567 100755
--- a/tests/zfs-tests/tests/functional/redundancy/redundancy_draid_spare1.ksh
+++ b/tests/zfs-tests/tests/functional/redundancy/redundancy_draid_spare1.ksh
@@ -40,7 +40,15 @@
 
 log_assert "Verify resilver to dRAID distributed spares"
 
-log_onexit cleanup
+function cleanup_tunable
+{
+	log_must set_tunable32 REBUILD_SCRUB_ENABLED 1
+	cleanup
+}
+
+log_onexit cleanup_tunable
+
+log_must set_tunable32 REBUILD_SCRUB_ENABLED 0
 
 for replace_mode in "healing" "sequential"; do
 
@@ -74,32 +82,15 @@ for replace_mode in "healing" "sequential"; do
 		log_must check_vdev_state $spare_vdev "ONLINE"
 		log_must check_hotspare_state $TESTPOOL $spare_vdev "INUSE"
 		log_must zpool detach $TESTPOOL $fault_vdev
-
-		resilver_cksum=$(cksum_pool $TESTPOOL)
-		if [[ $resilver_cksum != 0 ]]; then
-			log_must zpool status -v $TESTPOOL
-			log_fail "$replace_mode resilver "
-			    "cksum errors: $resilver_cksum"
-		fi
-
-		if [[ "$replace_mode" = "healing" ]]; then
-			log_must zpool scrub $TESTPOOL
-		fi
-
-		log_must wait_scrubbed $TESTPOOL
+		log_must verify_pool $TESTPOOL
 		log_must check_pool_status $TESTPOOL "scan" "repaired 0B"
 		log_must check_pool_status $TESTPOOL "scan" "with 0 errors"
 
-		scrub_cksum=$(cksum_pool $TESTPOOL)
-		if [[ $scrub_cksum != 0 ]]; then
-			log_must zpool status -v $TESTPOOL
-			log_fail "scrub cksum errors: $scrub_cksum"
-		fi
-
 		(( i += 1 ))
 	done
 
 	log_must is_data_valid $TESTPOOL
+	log_must check_pool_status $TESTPOOL "errors" "No known data errors"
 
 	cleanup
 done
diff --git a/tests/zfs-tests/tests/functional/redundancy/redundancy_draid_spare3.ksh b/tests/zfs-tests/tests/functional/redundancy/redundancy_draid_spare3.ksh
index 587a1be0a..28e8e3c6d 100755
--- a/tests/zfs-tests/tests/functional/redundancy/redundancy_draid_spare3.ksh
+++ b/tests/zfs-tests/tests/functional/redundancy/redundancy_draid_spare3.ksh
@@ -114,6 +114,9 @@ for replace_mode in "healing" "sequential"; do
 	log_must zpool detach $TESTPOOL $BASEDIR/vdev7
 	log_must check_vdev_state $TESTPOOL draid1-0-0 "ONLINE"
 	log_must check_hotspare_state $TESTPOOL draid1-0-0 "INUSE"
+	log_must verify_pool $TESTPOOL
+	log_must check_pool_status $TESTPOOL "scan" "repaired 0B"
+	log_must check_pool_status $TESTPOOL "scan" "with 0 errors"
 
 	# Distributed spare in mirror with original device faulted
 	log_must zpool offline -f $TESTPOOL $BASEDIR/vdev8
@@ -122,6 +125,9 @@ for replace_mode in "healing" "sequential"; do
 	log_must check_vdev_state $TESTPOOL spare-8 "DEGRADED"
 	log_must check_vdev_state $TESTPOOL draid1-0-1 "ONLINE"
 	log_must check_hotspare_state $TESTPOOL draid1-0-1 "INUSE"
+	log_must verify_pool $TESTPOOL
+	log_must check_pool_status $TESTPOOL "scan" "repaired 0B"
+	log_must check_pool_status $TESTPOOL "scan" "with 0 errors"
 
 	# Distributed spare in mirror with original device still online
 	log_must check_vdev_state $TESTPOOL $BASEDIR/vdev9 "ONLINE"
@@ -129,6 +135,9 @@ for replace_mode in "healing" "sequential"; do
 	log_must check_vdev_state $TESTPOOL spare-9 "ONLINE"
 	log_must check_vdev_state $TESTPOOL draid1-0-2 "ONLINE"
 	log_must check_hotspare_state $TESTPOOL draid1-0-2 "INUSE"
+	log_must verify_pool $TESTPOOL
+	log_must check_pool_status $TESTPOOL "scan" "repaired 0B"
+	log_must check_pool_status $TESTPOOL "scan" "with 0 errors"
 
 	# Normal faulted device replacement
 	new_vdev0="$BASEDIR/new_vdev0"
@@ -137,6 +146,9 @@ for replace_mode in "healing" "sequential"; do
 	log_must check_vdev_state $TESTPOOL $BASEDIR/vdev0 "FAULTED"
 	log_must zpool replace -w $flags $TESTPOOL $BASEDIR/vdev0 $new_vdev0
 	log_must check_vdev_state $TESTPOOL $new_vdev0 "ONLINE"
+	log_must verify_pool $TESTPOOL
+	log_must check_pool_status $TESTPOOL "scan" "repaired 0B"
+	log_must check_pool_status $TESTPOOL "scan" "with 0 errors"
 
 	# Distributed spare faulted device replacement
 	log_must zpool offline -f $TESTPOOL $BASEDIR/vdev2
@@ -145,6 +157,9 @@ for replace_mode in "healing" "sequential"; do
 	log_must check_vdev_state $TESTPOOL spare-2 "DEGRADED"
 	log_must check_vdev_state $TESTPOOL draid1-0-3 "ONLINE"
 	log_must check_hotspare_state $TESTPOOL draid1-0-3 "INUSE"
+	log_must verify_pool $TESTPOOL
+	log_must check_pool_status $TESTPOOL "scan" "repaired 0B"
+	log_must check_pool_status $TESTPOOL "scan" "with 0 errors"
 
 	# Normal online device replacement
 	new_vdev1="$BASEDIR/new_vdev1"
@@ -152,6 +167,9 @@ for replace_mode in "healing" "sequential"; do
 	log_must check_vdev_state $TESTPOOL $BASEDIR/vdev1 "ONLINE"
 	log_must zpool replace -w $flags $TESTPOOL $BASEDIR/vdev1 $new_vdev1
 	log_must check_vdev_state $TESTPOOL $new_vdev1 "ONLINE"
+	log_must verify_pool $TESTPOOL
+	log_must check_pool_status $TESTPOOL "scan" "repaired 0B"
+	log_must check_pool_status $TESTPOOL "scan" "with 0 errors"
 
 	# Distributed spare online device replacement (then fault)
 	log_must zpool replace -w $flags $TESTPOOL $BASEDIR/vdev3 draid1-0-4
@@ -161,35 +179,13 @@ for replace_mode in "healing" "sequential"; do
 	log_must zpool offline -f $TESTPOOL $BASEDIR/vdev3
 	log_must check_vdev_state $TESTPOOL $BASEDIR/vdev3 "FAULTED"
 	log_must check_vdev_state $TESTPOOL spare-3 "DEGRADED"
-
-	resilver_cksum=$(cksum_pool $TESTPOOL)
-	if [[ $resilver_cksum != 0 ]]; then
-		log_must zpool status -v $TESTPOOL
-		log_fail "$replace_mode resilver cksum errors: $resilver_cksum"
-	fi
-
-	if [[ "$replace_mode" = "healing" ]]; then
-		log_must zpool scrub -w $TESTPOOL
-	else
-		if [[ $(get_tunable REBUILD_SCRUB_ENABLED) -eq 0 ]]; then
-			log_must zpool scrub -w $TESTPOOL
-		else
-			log_must wait_scrubbed $TESTPOOL
-		fi
-	fi
-
-	log_must is_pool_scrubbed $TESTPOOL
-
-	scrub_cksum=$(cksum_pool $TESTPOOL)
-	if [[ $scrub_cksum != 0 ]]; then
-		log_must zpool status -v $TESTPOOL
-		log_fail "scrub cksum errors: $scrub_cksum"
-	fi
-
+	log_must verify_pool $TESTPOOL
 	log_must check_pool_status $TESTPOOL "scan" "repaired 0B"
 	log_must check_pool_status $TESTPOOL "scan" "with 0 errors"
 
+	# Verify the original data is valid
 	log_must is_data_valid $TESTPOOL
+	log_must check_pool_status $TESTPOOL "errors" "No known data errors"
 
 	cleanup
 done
author	Brian Behlendorf <[email protected]>	2021-05-20 15:05:26 -0700
committer	Brian Behlendorf <[email protected]>	2021-05-27 22:31:56 -0700
commit	33a06f27e65d0976ed227ea4af417d71d1c4a964 (patch)
tree	a4a44da55d9ef8b1836c5156e1e4eb66b20ffc16 /tests
parent	7ad8fc54079ba9b0870a67f23d91ebf9973c1cc6 (diff)