diff options
author | Matthew Ahrens <[email protected]> | 2020-04-10 10:39:55 -0700 |
---|---|---|
committer | GitHub <[email protected]> | 2020-04-10 10:39:55 -0700 |
commit | c618f87cd2e96438468a391246d63ba1803f35c8 (patch) | |
tree | bdd9beb37d34e04c17543d99e10e21980f0f760a /cmd/zstream/zstream.c | |
parent | 77f6826b83b7e27f0996f6d192202c36f65e41fd (diff) |
Add `zstream redup` command to convert deduplicated send streams
Deduplicated send and receive is deprecated. To ease migration to the
new dedup-send-less world, the commit adds a `zstream redup` utility to
convert deduplicated send streams to normal streams, so that they can
continue to be received indefinitely.
The new `zstream` command also replaces the functionality of
`zstreamdump`, by way of the `zstream dump` subcommand. The
`zstreamdump` command is replaced by a shell script which invokes
`zstream dump`.
The way that `zstream redup` works under the hood is that as we read the
send stream, we build up a hash table which maps from `<GUID, object,
offset> -> <file_offset>`.
Whenever we see a WRITE record, we add a new entry to the hash table,
which indicates where in the stream file to find the WRITE record for
this block. (The key is `drr_toguid, drr_object, drr_offset`.)
For entries other than WRITE_BYREF, we pass them through unchanged
(except for the running checksum, which is recalculated).
For WRITE_BYREF records, we change them to WRITE records. We find the
referenced WRITE record by looking in the hash table (for the record
with key `drr_refguid, drr_refobject, drr_refoffset`), and then reading
the record header and payload from the specified offset in the stream
file. This is why the stream can not be a pipe. The found WRITE record
replaces the WRITE_BYREF record, with its `drr_toguid`, `drr_object`,
and `drr_offset` fields changed to be the same as the WRITE_BYREF's
(i.e. we are writing the same logical block, but with the data supplied
by the previous WRITE record).
This algorithm requires memory proportional to the number of WRITE
records (same as `zfs send -D`), but the size per WRITE record is
relatively low (40 bytes, vs. 72 for `zfs send -D`). A 1TB send stream
with 8KB blocks (`recordsize=8k`) would use around 5GB of RAM to
"redup".
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #10124
Closes #10156
Diffstat (limited to 'cmd/zstream/zstream.c')
-rw-r--r-- | cmd/zstream/zstream.c | 61 |
1 files changed, 61 insertions, 0 deletions
diff --git a/cmd/zstream/zstream.c b/cmd/zstream/zstream.c new file mode 100644 index 000000000..95578c97c --- /dev/null +++ b/cmd/zstream/zstream.c @@ -0,0 +1,61 @@ +/* + * CDDL HEADER START + * + * This file and its contents are supplied under the terms of the + * Common Development and Distribution License ("CDDL"), version 1.0. + * You may only use this file in accordance with the terms of version + * 1.0 of the CDDL. + * + * A full copy of the text of the CDDL should have accompanied this + * source. A copy of the CDDL is also available via the Internet at + * http://www.illumos.org/license/CDDL. + * + * CDDL HEADER END + */ + +/* + * Copyright (c) 2020 by Delphix. All rights reserved. + */ +#include <sys/types.h> +#include <sys/stat.h> +#include <fcntl.h> +#include <ctype.h> +#include <stdio.h> +#include <stdlib.h> +#include <strings.h> +#include <unistd.h> +#include <libintl.h> +#include <stddef.h> +#include <libzfs.h> +#include "zstream.h" + +void +zstream_usage(void) +{ + (void) fprintf(stderr, + "usage: zstream command args ...\n" + "Available commands are:\n" + "\n" + "\tzstream dump [-vCd] FILE\n" + "\t... | zstream dump [-vCd]\n" + "\n" + "\tzstream redup [-v] FILE | ...\n"); + exit(1); +} + +int +main(int argc, char *argv[]) +{ + if (argc < 2) + zstream_usage(); + + char *subcommand = argv[1]; + + if (strcmp(subcommand, "dump") == 0) { + return (zstream_do_dump(argc - 1, argv + 1)); + } else if (strcmp(subcommand, "redup") == 0) { + return (zstream_do_redup(argc - 1, argv + 1)); + } else { + zstream_usage(); + } +} |