diff options
author | Christian Schwarz <[email protected]> | 2021-03-07 18:49:58 +0100 |
---|---|---|
committer | GitHub <[email protected]> | 2021-03-07 09:49:58 -0800 |
commit | 93e3658035030dedc4a25c25e8b410a549bafa74 (patch) | |
tree | 2496f58fac68821401cb34196ad0a393c199961f /module/zfs | |
parent | b30cd705993be426d36cdf254ef216b28508fd8c (diff) |
zvol: call zil_replaying() during replay
zil_replaying(zil, tx) has the side-effect of informing the ZIL that an
entry has been replayed in the (still open) tx. The ZIL uses that
information to record the replay progress in the ZIL header when that
tx's txg syncs.
ZPL log entries are not idempotent and logically dependent and thus
calling zil_replaying() is necessary for correctness.
For ZVOLs the question of correctness is more nuanced: ZVOL logs only
TX_WRITE and TX_TRUNCATE, both of which are idempotent. Logical
dependencies between two records exist only if the write or discard
request had sync semantics or if the ranges affected by the records
overlap.
Thus, at a first glance, it would be correct to restart replay from
the beginning if we crash before replay completes. But this does not
address the following scenario:
Assume one log record per LWB.
The chain on disk is
HDR -> 1:W(1, "A") -> 2:W(1, "B") -> 3:W(2, "X") -> 4:W(3, "Z")
where N:W(O, C) represents log entry number N which is a TX_WRITE of C
to offset A.
We replay 1, 2 and 3 in one txg, sync that txg, then crash.
Bit flips corrupt 2, 3, and 4.
We come up again and restart replay from the beginning because
we did not call zil_replaying() during replay.
We replay 1 again, then interpret 2's invalid checksum as the end
of the ZIL chain and call replay done.
The replayed zvol content is "AX".
If we had called zil_replaying() the HDR would have pointed to 3
and our resumed replay would not have replayed anything because
3 was corrupted, resulting in zvol content "BX".
If 3 logically depends on 2 then the replay corrupted the ZVOL_OBJ's
contents.
This patch adds the zil_replaying() calls to the replay functions.
Since the callbacks in the replay function need the zilog_t* pointer
so that they can call zil_replaying() we open the ZIL while
replaying in zvol_create_minor(). We also verify that replay has
been done when on-demand-opening the ZIL on the first modifying
bio.
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Christian Schwarz <[email protected]>
Closes #11667
Diffstat (limited to 'module/zfs')
-rw-r--r-- | module/zfs/zvol.c | 15 |
1 files changed, 14 insertions, 1 deletions
diff --git a/module/zfs/zvol.c b/module/zfs/zvol.c index 7c6dae865..44f9832ce 100644 --- a/module/zfs/zvol.c +++ b/module/zfs/zvol.c @@ -473,7 +473,19 @@ zvol_replay_truncate(void *arg1, void *arg2, boolean_t byteswap) offset = lr->lr_offset; length = lr->lr_length; - return (dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, offset, length)); + dmu_tx_t *tx = dmu_tx_create(zv->zv_objset); + dmu_tx_mark_netfree(tx); + int error = dmu_tx_assign(tx, TXG_WAIT); + if (error != 0) { + dmu_tx_abort(tx); + } else { + zil_replaying(zv->zv_zilog, tx); + dmu_tx_commit(tx); + error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, offset, + length); + } + + return (error); } /* @@ -513,6 +525,7 @@ zvol_replay_write(void *arg1, void *arg2, boolean_t byteswap) dmu_tx_abort(tx); } else { dmu_write(os, ZVOL_OBJ, offset, length, data, tx); + zil_replaying(zv->zv_zilog, tx); dmu_tx_commit(tx); } |