File incorrectly zeroed when receiving incremental stream that toggles -L

Background: By increasing the recordsize property above the default of 128KB, a filesystem may have "large" blocks. By default, a send stream of such a filesystem does not contain large WRITE records, instead it decreases objects' block sizes to 128KB and splits the large blocks into 128KB blocks, allowing the large-block filesystem to be received by a system that does not support the `large_blocks` feature. A send stream generated by `zfs send -L` (or `--large-block`) preserves the large block size on the receiving system, by using large WRITE records. When receiving an incremental send stream for a filesystem with large blocks, if the send stream's -L flag was toggled, a bug is encountered in which the file's contents are incorrectly zeroed out. The contents of any blocks that were not modified by this send stream will be lost. "Toggled" means that the previous send used `-L`, but this incremental does not use `-L` (-L to no-L); or that the previous send did not use `-L`, but this incremental does use `-L` (no-L to -L). Changes: This commit addresses the problem with several changes to the semantics of zfs send/receive: 1. "-L to no-L" incrementals are rejected. If the previous send used `-L`, but this incremental does not use `-L`, the `zfs receive` will fail with this error message: incremental send stream requires -L (--large-block), to match previous receive. 2. "no-L to -L" incrementals are handled correctly, preserving the smaller (128KB) block size of any already-received files that used large blocks on the sending system but were split by `zfs send` without the `-L` flag. 3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`. This feature indicates that we can correctly handle "no-L to -L" incrementals. This flag is currently not set on any send streams. In the future, we intend for incremental send streams of snapshots that have large blocks to use `-L` by default, and these streams will also have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams from the default use of `zfs send` won't encounter the bug mentioned above, because they can't be received by software with the bug. Implementation notes: To facilitate accessing the ZPL's generation number, `zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and restructured to fill in a struct with ZPL-specific info including owner and generation. In the "no-L to -L" case, if this is a compressed send stream (from `zfs send -cL`), large WRITE records that are being written to small (128KB) blocksize files need to be decompressed so that they can be written split up into multiple blocks. The zio pipeline will recompress each smaller block individually. A new test case, `send-L_toggle`, is added, which tests the "no-L to -L" case and verifies that we get an error for the "-L to no-L" case. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #6224 Closes #10383
author: Matthew Ahrens <[email protected]> 2020-06-09 10:41:01 -0700
committer: GitHub <[email protected]> 2020-06-09 10:41:01 -0700
commit: 7bcb7f0840d1857370dd1f9ee0ad48f9b7939dfd (patch)
tree: 5582990412f2058fe8b796dbe240205bba027dd0 /include
parent: 6722be2823b5ef39d647e440541806c72b3dbf9b (diff)
5 files changed, 44 insertions, 19 deletions
diff --git a/include/sys/dmu.h b/include/sys/dmu.h
index 139f3cbdf..5174bdc45 100644
--- a/include/sys/dmu.h
+++ b/include/sys/dmu.h
@@ -1013,10 +1013,17 @@ extern int dmu_snapshot_realname(objset_t *os, char *name, char *real,
 extern int dmu_dir_list_next(objset_t *os, int namelen, char *name,
     uint64_t *idp, uint64_t *offp);
 
-typedef int objset_used_cb_t(dmu_object_type_t bonustype,
-    void *bonus, uint64_t *userp, uint64_t *groupp, uint64_t *projectp);
+typedef struct zfs_file_info {
+	uint64_t zfi_user;
+	uint64_t zfi_group;
+	uint64_t zfi_project;
+	uint64_t zfi_generation;
+} zfs_file_info_t;
+
+typedef int file_info_cb_t(dmu_object_type_t bonustype, const void *data,
+    struct zfs_file_info *zoi);
 extern void dmu_objset_register_type(dmu_objset_type_t ost,
-    objset_used_cb_t *cb);
+    file_info_cb_t *cb);
 extern void dmu_objset_set_user(objset_t *os, void *user_ptr);
 extern void *dmu_objset_get_user(objset_t *os);
 
diff --git a/include/sys/dmu_objset.h b/include/sys/dmu_objset.h
index 9b6614e98..a77131ef1 100644
--- a/include/sys/dmu_objset.h
+++ b/include/sys/dmu_objset.h
@@ -254,6 +254,8 @@ boolean_t dmu_objset_projectquota_enabled(objset_t *os);
 boolean_t dmu_objset_projectquota_present(objset_t *os);
 boolean_t dmu_objset_projectquota_upgradable(objset_t *os);
 void dmu_objset_id_quota_upgrade(objset_t *os);
+int dmu_get_file_info(objset_t *os, dmu_object_type_t bonustype,
+    const void *data, zfs_file_info_t *zfi);
 
 int dmu_fsname(const char *snapname, char *buf);
 
diff --git a/include/sys/fs/zfs.h b/include/sys/fs/zfs.h
index ecdfd42d0..575a4af51 100644
--- a/include/sys/fs/zfs.h
+++ b/include/sys/fs/zfs.h
@@ -1336,6 +1336,7 @@ typedef enum {
 	ZFS_ERR_EXPORT_IN_PROGRESS,
 	ZFS_ERR_BOOKMARK_SOURCE_NOT_ANCESTOR,
 	ZFS_ERR_STREAM_TRUNCATED,
+	ZFS_ERR_STREAM_LARGE_BLOCK_MISMATCH,
 } zfs_errno_t;
 
 /*
diff --git a/include/sys/zfs_ioctl.h b/include/sys/zfs_ioctl.h
index d4ffe70bb..78d33deda 100644
--- a/include/sys/zfs_ioctl.h
+++ b/include/sys/zfs_ioctl.h
@@ -107,6 +107,22 @@ typedef enum drr_headertype {
 #define	DMU_BACKUP_FEATURE_RAW			(1 << 24)
 /* flag #25 is reserved for the ZSTD compression feature */
 #define	DMU_BACKUP_FEATURE_HOLDS		(1 << 26)
+/*
+ * The SWITCH_TO_LARGE_BLOCKS feature indicates that we can receive
+ * incremental LARGE_BLOCKS streams (those with WRITE records of >128KB) even
+ * if the previous send did not use LARGE_BLOCKS, and thus its large blocks
+ * were split into multiple 128KB WRITE records.  (See
+ * flush_write_batch_impl() and receive_object()).  Older software that does
+ * not support this flag may encounter a bug when switching to large blocks,
+ * which causes files to incorrectly be zeroed.
+ *
+ * This flag is currently not set on any send streams.  In the future, we
+ * intend for incremental send streams of snapshots that have large blocks to
+ * use LARGE_BLOCKS by default, and these streams will also have the
+ * SWITCH_TO_LARGE_BLOCKS feature set. This ensures that streams from the
+ * default use of "zfs send" won't encounter the bug mentioned above.
+ */
+#define	DMU_BACKUP_FEATURE_SWITCH_TO_LARGE_BLOCKS (1 << 27)
 
 /*
  * Mask of all supported backup features
@@ -116,7 +132,7 @@ typedef enum drr_headertype {
     DMU_BACKUP_FEATURE_RESUMING | DMU_BACKUP_FEATURE_LARGE_BLOCKS | \
     DMU_BACKUP_FEATURE_COMPRESSED | DMU_BACKUP_FEATURE_LARGE_DNODE | \
     DMU_BACKUP_FEATURE_RAW | DMU_BACKUP_FEATURE_HOLDS | \
-	DMU_BACKUP_FEATURE_REDACTED)
+    DMU_BACKUP_FEATURE_REDACTED | DMU_BACKUP_FEATURE_SWITCH_TO_LARGE_BLOCKS)
 
 /* Are all features in the given flag word currently supported? */
 #define	DMU_STREAM_SUPPORTED(x)	(!((x) & ~DMU_BACKUP_FEATURE_MASK))
diff --git a/include/sys/zfs_quota.h b/include/sys/zfs_quota.h
index ec4dc8f16..b215b8dd0 100644
--- a/include/sys/zfs_quota.h
+++ b/include/sys/zfs_quota.h
@@ -24,23 +24,22 @@
 
 #include <sys/dmu.h>
 #include <sys/fs/zfs.h>
-#include <sys/zfs_vfsops.h>
 
-extern int zfs_space_delta_cb(dmu_object_type_t bonustype, void *data,
-    uint64_t *userp, uint64_t *groupp, uint64_t *projectp);
+struct zfsvfs;
+struct zfs_file_info_t;
 
-extern int zfs_userspace_one(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
-    const char *domain, uint64_t rid, uint64_t *valuep);
-extern int zfs_userspace_many(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
-    uint64_t *cookiep, void *vbuf, uint64_t *bufsizep);
-extern int zfs_set_userquota(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
-    const char *domain, uint64_t rid, uint64_t quota);
+extern int zpl_get_file_info(dmu_object_type_t,
+    const void *, struct zfs_file_info *);
 
-extern boolean_t zfs_id_overobjquota(zfsvfs_t *zfsvfs, uint64_t usedobj,
-    uint64_t id);
-extern boolean_t zfs_id_overblockquota(zfsvfs_t *zfsvfs, uint64_t usedobj,
-    uint64_t id);
-extern boolean_t zfs_id_overquota(zfsvfs_t *zfsvfs, uint64_t usedobj,
-    uint64_t id);
+extern int zfs_userspace_one(struct zfsvfs *, zfs_userquota_prop_t,
+    const char *, uint64_t, uint64_t *);
+extern int zfs_userspace_many(struct zfsvfs *, zfs_userquota_prop_t,
+    uint64_t *, void *, uint64_t *);
+extern int zfs_set_userquota(struct zfsvfs *, zfs_userquota_prop_t,
+    const char *, uint64_t, uint64_t);
+
+extern boolean_t zfs_id_overobjquota(struct zfsvfs *, uint64_t, uint64_t);
+extern boolean_t zfs_id_overblockquota(struct zfsvfs *, uint64_t, uint64_t);
+extern boolean_t zfs_id_overquota(struct zfsvfs *, uint64_t, uint64_t);
 
 #endif
author	Matthew Ahrens <[email protected]>	2020-06-09 10:41:01 -0700
committer	GitHub <[email protected]>	2020-06-09 10:41:01 -0700
commit	7bcb7f0840d1857370dd1f9ee0ad48f9b7939dfd (patch)
tree	5582990412f2058fe8b796dbe240205bba027dd0 /include
parent	6722be2823b5ef39d647e440541806c72b3dbf9b (diff)