ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()

`lseek(SEEK_DATA | SEEK_HOLE)` are only accurate when the on-disk blocks reflect all writes, i.e. when there are no dirty data blocks. To ensure this, if the target dnode is dirty, they wait for the open txg to be synced, so we can call them "stabilizing operations". If they cause txg_wait_synced often, it can be detrimental to performance. Typically, a group of files are all modified, and then SEEK_DATA/HOLE are performed on them. In this case, the first SEEK does a txg_wait_synced(), and subsequent SEEKs don't need to wait, so performance is good. However, if a workload involves an interleaved metadata modification, the subsequent SEEK may do a txg_wait_synced() unnecessarily. For example, if we do a `read()` syscall to each file before we do its SEEK. This applies even with `relatime=on`, when the `read()` is the first read after the last write. The txg_wait_synced() is unnecessary because the SEEK operations only care that the structure of the tree of indirect and data blocks is up to date on disk. They don't care about metadata like the contents of the bonus or spill blocks. (They also don't care if an existing data block is modified, but this would be more involved to filter out.) This commit changes the behavior of SEEK_DATA/HOLE operations such that they do not call txg_wait_synced() if there is only a pending change to the bonus or spill block. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #13368 Issue #14594 Issue #14512 Issue #14009
author: Matthew Ahrens <[email protected]> 2023-03-14 14:30:29 -0700
committer: GitHub <[email protected]> 2023-03-14 14:30:29 -0700
commit: 519851122b1703b8445ec17bc89b347cea965bb9 (patch)
tree: d6a23896d587a3d1d4e80475e38088773ee02f48 /module/zfs
parent: 78289b84589e632d87504df6a9c63b5ac694d2f9 (diff)
1 files changed, 16 insertions, 5 deletions
diff --git a/module/zfs/dnode.c b/module/zfs/dnode.c
index 0fc49018c..d334386e0 100644
--- a/module/zfs/dnode.c
+++ b/module/zfs/dnode.c
@@ -1764,20 +1764,29 @@ dnode_try_claim(objset_t *os, uint64_t object, int slots)
 }
 
 /*
- * Checks if the dnode contains any uncommitted dirty records.
+ * Checks if the dnode might contain any uncommitted changes to data blocks.
+ * Dirty metadata (e.g. bonus buffer) does not count.
  */
 boolean_t
 dnode_is_dirty(dnode_t *dn)
 {
 	mutex_enter(&dn->dn_mtx);
-
 	for (int i = 0; i < TXG_SIZE; i++) {
-		if (multilist_link_active(&dn->dn_dirty_link[i])) {
+		list_t *list = &dn->dn_dirty_records[i];
+		for (dbuf_dirty_record_t *dr = list_head(list);
+		    dr != NULL; dr = list_next(list, dr)) {
+			if (dr->dr_dbuf == NULL ||
+			    (dr->dr_dbuf->db_blkid != DMU_BONUS_BLKID &&
+			    dr->dr_dbuf->db_blkid != DMU_SPILL_BLKID)) {
+				mutex_exit(&dn->dn_mtx);
+				return (B_TRUE);
+			}
+		}
+		if (dn->dn_free_ranges[i] != NULL) {
 			mutex_exit(&dn->dn_mtx);
 			return (B_TRUE);
 		}
 	}
-
 	mutex_exit(&dn->dn_mtx);
 
 	return (B_FALSE);
@@ -2640,7 +2649,9 @@ dnode_next_offset(dnode_t *dn, int flags, uint64_t *offset,
 		rw_enter(&dn->dn_struct_rwlock, RW_READER);
 
 	if (dn->dn_phys->dn_nlevels == 0) {
-		error = SET_ERROR(ESRCH);
+		if (!(flags & DNODE_FIND_HOLE)) {
+			error = SET_ERROR(ESRCH);
+		}
 		goto out;
 	}
author	Matthew Ahrens <[email protected]>	2023-03-14 14:30:29 -0700
committer	GitHub <[email protected]>	2023-03-14 14:30:29 -0700
commit	519851122b1703b8445ec17bc89b347cea965bb9 (patch)
tree	d6a23896d587a3d1d4e80475e38088773ee02f48 /module/zfs
parent	78289b84589e632d87504df6a9c63b5ac694d2f9 (diff)