summaryrefslogtreecommitdiffstats
path: root/module/zfs
diff options
context:
space:
mode:
authorChristian Schwarz <[email protected]>2021-03-07 18:49:58 +0100
committerGitHub <[email protected]>2021-03-07 09:49:58 -0800
commit93e3658035030dedc4a25c25e8b410a549bafa74 (patch)
tree2496f58fac68821401cb34196ad0a393c199961f /module/zfs
parentb30cd705993be426d36cdf254ef216b28508fd8c (diff)
zvol: call zil_replaying() during replay
zil_replaying(zil, tx) has the side-effect of informing the ZIL that an entry has been replayed in the (still open) tx. The ZIL uses that information to record the replay progress in the ZIL header when that tx's txg syncs. ZPL log entries are not idempotent and logically dependent and thus calling zil_replaying() is necessary for correctness. For ZVOLs the question of correctness is more nuanced: ZVOL logs only TX_WRITE and TX_TRUNCATE, both of which are idempotent. Logical dependencies between two records exist only if the write or discard request had sync semantics or if the ranges affected by the records overlap. Thus, at a first glance, it would be correct to restart replay from the beginning if we crash before replay completes. But this does not address the following scenario: Assume one log record per LWB. The chain on disk is HDR -> 1:W(1, "A") -> 2:W(1, "B") -> 3:W(2, "X") -> 4:W(3, "Z") where N:W(O, C) represents log entry number N which is a TX_WRITE of C to offset A. We replay 1, 2 and 3 in one txg, sync that txg, then crash. Bit flips corrupt 2, 3, and 4. We come up again and restart replay from the beginning because we did not call zil_replaying() during replay. We replay 1 again, then interpret 2's invalid checksum as the end of the ZIL chain and call replay done. The replayed zvol content is "AX". If we had called zil_replaying() the HDR would have pointed to 3 and our resumed replay would not have replayed anything because 3 was corrupted, resulting in zvol content "BX". If 3 logically depends on 2 then the replay corrupted the ZVOL_OBJ's contents. This patch adds the zil_replaying() calls to the replay functions. Since the callbacks in the replay function need the zilog_t* pointer so that they can call zil_replaying() we open the ZIL while replaying in zvol_create_minor(). We also verify that replay has been done when on-demand-opening the ZIL on the first modifying bio. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Christian Schwarz <[email protected]> Closes #11667
Diffstat (limited to 'module/zfs')
-rw-r--r--module/zfs/zvol.c15
1 files changed, 14 insertions, 1 deletions
diff --git a/module/zfs/zvol.c b/module/zfs/zvol.c
index 7c6dae865..44f9832ce 100644
--- a/module/zfs/zvol.c
+++ b/module/zfs/zvol.c
@@ -473,7 +473,19 @@ zvol_replay_truncate(void *arg1, void *arg2, boolean_t byteswap)
offset = lr->lr_offset;
length = lr->lr_length;
- return (dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, offset, length));
+ dmu_tx_t *tx = dmu_tx_create(zv->zv_objset);
+ dmu_tx_mark_netfree(tx);
+ int error = dmu_tx_assign(tx, TXG_WAIT);
+ if (error != 0) {
+ dmu_tx_abort(tx);
+ } else {
+ zil_replaying(zv->zv_zilog, tx);
+ dmu_tx_commit(tx);
+ error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, offset,
+ length);
+ }
+
+ return (error);
}
/*
@@ -513,6 +525,7 @@ zvol_replay_write(void *arg1, void *arg2, boolean_t byteswap)
dmu_tx_abort(tx);
} else {
dmu_write(os, ZVOL_OBJ, offset, length, data, tx);
+ zil_replaying(zv->zv_zilog, tx);
dmu_tx_commit(tx);
}