aboutsummaryrefslogtreecommitdiffstats
path: root/module/zfs
diff options
context:
space:
mode:
authorRob N <[email protected]>2023-11-29 04:53:04 +1100
committerBrian Behlendorf <[email protected]>2023-11-28 12:59:00 -0800
commit2a953e0ac928563166ad7caa89e48b8d257724d4 (patch)
treedbf9588adc24bca4713bba620e8a17236047b417 /module/zfs
parente4985bf5a1581e77d1e9e07df98ce6468345ae51 (diff)
dmu_buf_will_clone: fix race in transition back to NOFILL
Previously, dmu_buf_will_clone() would roll back any dirty record, but would not clean out the modified data nor reset the state before releasing the lock. That leaves the last-written data in db_data, but the dbuf in the wrong state. This is eventually corrected when the dbuf state is made NOFILL, and dbuf_noread() called (which clears out the old data), but at this point its too late, because the lock was already dropped with that invalid state. Any caller acquiring the lock before the call into dmu_buf_will_not_fill() can find what appears to be a clean, readable buffer, and would take the wrong state from it: it should be getting the data from the cloned block, not from earlier (unwritten) dirty data. Even after the state was switched to NOFILL, the old data was still not cleaned out until dbuf_noread(), which is another gap for a caller to take the lock and read the wrong data. This commit fixes all this by properly cleaning up the previous state and then setting the new state before dropping the lock. The DBUF_VERIFY() calls confirm that the dbuf is in a valid state when the lock is down. Sponsored-by: Klara, Inc. Sponsored-By: OpenDrives Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pawel Jakub Dawidek <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #15566 Closes #15526
Diffstat (limited to 'module/zfs')
-rw-r--r--module/zfs/dbuf.c10
1 files changed, 9 insertions, 1 deletions
diff --git a/module/zfs/dbuf.c b/module/zfs/dbuf.c
index f2831a0e8..5a7fe42b6 100644
--- a/module/zfs/dbuf.c
+++ b/module/zfs/dbuf.c
@@ -2700,15 +2700,23 @@ dmu_buf_will_clone(dmu_buf_t *db_fake, dmu_tx_t *tx)
* writes and clones into this block.
*/
mutex_enter(&db->db_mtx);
+ DBUF_VERIFY(db);
VERIFY(!dbuf_undirty(db, tx));
ASSERT3P(dbuf_find_dirty_eq(db, tx->tx_txg), ==, NULL);
if (db->db_buf != NULL) {
arc_buf_destroy(db->db_buf, db);
db->db_buf = NULL;
+ dbuf_clear_data(db);
}
+
+ db->db_state = DB_NOFILL;
+ DTRACE_SET_STATE(db, "allocating NOFILL buffer for clone");
+
+ DBUF_VERIFY(db);
mutex_exit(&db->db_mtx);
- dmu_buf_will_not_fill(db_fake, tx);
+ dbuf_noread(db);
+ (void) dbuf_dirty(db, tx);
}
void