aboutsummaryrefslogtreecommitdiffstats
path: root/man
Commit message (Collapse)AuthorAgeFilesLines
* Correct man page datesRichard Laager2019-05-085-5/+5
| | | | | | | | | | | Various changes (many by me) have been made to the man pages without bumping their dates. I have now corrected them based on the last commit to each file. I also added the script I used to make these changes. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8710
* Make zfs_special_class_metadata_reserve_pct into a parameterDeHackEd2019-05-071-0/+14
| | | | | | | | Exported and documented a new module parameter. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: DHE <[email protected]> Closes #8706
* Fix send/recv lost spill blockBrian Behlendorf2019-05-071-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When receiving a DRR_OBJECT record the receive_object() function needs to determine how to handle a spill block associated with the object. It may need to be removed or kept depending on how the object was modified at the source. This determination is currently accomplished using a heuristic which takes in to account the DRR_OBJECT record and the existing object properties. This is a problem because there isn't quite enough information available to do the right thing under all circumstances. For example, when only the block size changes the spill block is removed when it should be kept. What's needed to resolve this is an additional flag in the DRR_OBJECT which indicates if the object being received references a spill block. The DRR_OBJECT_SPILL flag was added for this purpose. When set then the object references a spill block and it must be kept. Either it is update to date, or it will be replaced by a subsequent DRR_SPILL record. Conversely, if the object being received doesn't reference a spill block then any existing spill block should always be removed. Since previous versions of ZFS do not understand this new flag additional DRR_SPILL records will be inserted in to the stream. This has the advantage of being fully backward compatible. Existing ZFS systems receiving this stream will recreate the spill block if it was incorrectly removed. Updated ZFS versions will correctly ignore the additional spill blocks which can be identified by checking for the DRR_SPILL_UNMODIFIED flag. The small downside to this approach is that is may increase the size of the stream and of the received snapshot on previous versions of ZFS. Additionally, when receiving streams generated by previous unpatched versions of ZFS spill blocks may still be lost. OpenZFS-issue: https://www.illumos.org/issues/9952 FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233277 Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8668
* Cleanup special/dedup languageRichard Laager2019-05-071-6/+6
| | | | | | | | | | | | | This standardizes the language on "deduplication tables" rather than "dedup data" (which might be read as the data blocks rather than the DDT). Likewise, it standardizes on "small file blocks". It also standardizes on "normal" rather than using both "normal" and "general" in the same paragraph. I also replaced "non-specified" with the more explicit "non-dedup/special". Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8713
* OpenZFS 10473 - zfs(1M) missing cross-reference to zfs-program(1M)Richard Laager2019-05-061-1/+2
| | | | | | | | | | | | | | | Authored by: Jason King <[email protected]> Reviewed by: Toomas Soome <[email protected]> Reviewed by: Andy Fiddaman <[email protected]> Reviewed by: Peter Tribble <[email protected]> Reviewed by: Gergő Mihály Doma <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Richard Laager <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/10473 OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/736e67003 Closes #8711
* Add feature check for 'zpool resilver' commandTom Caputi2019-05-021-1/+3
| | | | | | | | | | | | The 'zpool resilver' command requires that the resilver_defer feature is active on the pool. Unfortunately, the check for this was left out of the original patch. This commit simply corrects this so that the command properly returns an error in this case. Reviewed by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8700
* Fix estimated scrub completion timeTom Caputi2019-05-011-0/+4
| | | | | | | | | | | | | | | | | | | Currently, it is possible for the 'zpool scrub' command to progress slightly beyond 100% due to concurrent changes happening on the live pool. This behavior is expected, but the userspace code for 'zpool status' would subtract the expected amount of data from the amount of data already scrubbed, resulting in a negative integer being casted to a large positive one. This number was then used to calculate the estimated completion time, resulting in wildly wrong results. This code changes the behavior so that 'zpool status' does not attempt to report an estimate during this period. Reviewed by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8611 Closes #8687
* Clarify that deduped data is encryptedRichard Laager2019-04-301-1/+2
| | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8691
* Clarify and improve encryption documentationRichard Laager2019-04-241-13/+14
| | | | | | | | | | | | | | | | | - Remove the language that "all user data" is encrypted. This is to avoid misunderstandings or arguments about what is "user data", especially in light of "user properties". - Document that properties are unencrypted. - Document that snapshot names are unencrypted. - For consistency with the rest of the zfs.8 man page, use "ZFS" as the generic noun, not (bolded) "zfs". The latter refers to the command. Likewise, use "ZFS" instead of "the kernel module". - Give "a passphrase" as an example of a "user's key". Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8652
* Duplicate the encryption copies=3 limitationRichard Laager2019-04-241-0/+5
| | | | | | | | | This adds the encryption copies=3 limitation language into the copies property section. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8651
* Deprecate dedupdittoRichard Laager2019-04-241-0/+3
| | | | | | | | | | This documents, in zpool.8, that dedupditto is deprecated and will be made to have no effect in a future release. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8650
* Eliminate useless double-bolding in man pagesRichard Laager2019-04-242-33/+33
| | | | | | | | | | As far as I know and can tell from testing, \fB\fB...\fR\fR is exactly equivalent to \fB...\fR. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Alphabetize zpool-features.5 by short nameRichard Laager2019-04-241-286/+287
| | | | | | | | | | The features are sorted in the en_US locale, not the C locale. Specifically, that means that bookmark_v2 comes _after_ bookmarks. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Standardize .RE placement in zpool-features.5Richard Laager2019-04-241-19/+6
| | | | | | | | | | This command is being used to unindent, so it should be at the end of each block. This is consistent with the other man pages. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Add missing formatting to sha512 in zpool-features.5Richard Laager2019-04-241-0/+3
| | | | | | | Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Correct GUID of large_blocks in zpool-features.5Richard Laager2019-04-241-1/+1
| | | | | | | | | It is org.open-zfs:large_blocks (plural). Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Change wording in zfs-module-parameters.5Tom Caputi2019-04-241-2/+2
| | | | | | | Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Document that hole_birth is effectively uselessRichard Laager2019-04-241-0/+8
| | | | | | | | | | | | | | The first sentence of this commit comes from the wiki, and was originally written by: Rich Ercolani <[email protected]> with changes by: Tom Caputi <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641 Closes #8642
* Document send_holes_without_birth_timeRichard Laager2019-04-241-5/+14
| | | | | | | Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Correct bookmark_v2 dependenciesRichard Laager2019-04-241-2/+2
| | | | | | | | | | | encryption depends on bookmark_v2. bookmark_v2 depends on bookmarks. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Fix formatting for multi_vdev_crash_dump in zpool-features.5Richard Laager2019-04-241-3/+3
| | | | | | | | | | This needs to use tabs instead of spaces to display correctly (i.e. with things lined up). Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Fix incorrect use of .Nm directive for ZPOOL_VDEV_NAME_GUID in zpool(8)Tomohiro Kusumi2019-04-231-2/+2
| | | | | | | | | It should only affect "zpool". Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #8644
* Code improvement and bug fixes for QAT supportcfzhu2019-04-161-13/+39
| | | | | | | | | | | | | | | | | | | | 1. Support QAT when ZFS is root file-system: When ZFS module is loaded before QAT started, the QAT can be started again in post-process, e.g.: echo 0 > /sys/module/zfs/parameters/zfs_qat_compress_disable echo 0 > /sys/module/zfs/parameters/zfs_qat_encrypt_disable echo 0 > /sys/module/zfs/parameters/zfs_qat_checksum_disable 2. Verify alder checksum of the de-compress result 3. Allocate Digest, IV and AAD buffer in physical contiguous memory by QAT_PHYS_CONTIG_ALLOC. 4. Update the documentation for zfs_qat_compress_disable, zfs_qat_checksum_disable, zfs_qat_encrypt_disable. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Weigang Li <[email protected]> Signed-off-by: Chengfeix Zhu <[email protected]> Closes #8323 Closes #8610
* Add option [-V|--version] to emit version stringTerraTech2019-04-162-2/+34
| | | | | | | | | | | | | | | | | | | | | Add the 'zfs version' and 'zpool version' subcommands to display the version of the user space utilities and loaded zfs kernel module. For example: $ zfs version zfs-0.8.0-rc3_169_g67e0366b88 zfs-kmod-0.8.0-rc3_169_g67e0366b88 The '-V' and '--version' aliases were added to support the common convention of using 'zfs --version` to obtain the version information. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: TerraTech <[email protected]> Closes #2501 Closes #8567
* zfs allow refreservation needed for zfs create -VRichard Laager2019-04-161-1/+3
| | | | | | | | | | | | When creating a non-sparse volume, zfs create sets a refreservation. Accordingly, one needs the "refreservation" ability in addition to the "create" ability in order to create a non-sparse volume. Reviewed-by: Brian Behlendorf <[email protected]> Reported-by: github.com/homerlinux Reported-by: Matthew Ahrens <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8531 Closes #8624
* Clarify GRUB's lack of support for sha512, skein, edonrRichard Laager2019-04-162-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | zfs.8 correctly said that GRUB did not support them, but zpool-features.5 said that "Booting off pools...is supported." Now, zpool-features.5 discusses GRUB specifically and indicates its lack of support for these features. Also, I have clarified the wording in both places to indicate that the pool feature cannot be used. It's not a filesystem dataset thing, but pool-wide. I described this as "cannot be used". I think technically the feature can be enabled, just not active. However, the effect is essentially the same: you cannot enable those checksum algorithms on any dataset in the pool, so you might as well not enable the feature (which is just pointing a loaded gun at your foot). In the past, an argument could be made that having all the features enabled was useful for simplicity, as long as you didn't activate the GRUB-incompatible features, but that's getting less and less realistic over time. A user can still do that, but we should not encourage that. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8626 Closes #8446
* Reword the dedup limitation for Edon-RRichard Laager2019-04-161-2/+2
| | | | | | | | | | The old wording was effectively "You can not use this (except you can)", which just seems confusing. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8626
* Consistently captialize GUID for featuresRichard Laager2019-04-161-5/+5
| | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8626
* Fix the spelling of deferredRichard Laager2019-04-161-2/+2
| | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8626
* Update zdb.8's fsck referenceRichard Laager2019-04-161-4/+2
| | | | | | | | | | On Linux, this is in man section 8, not 1M. Also, there is no fsdb on Linux, so I removed that. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8626
* Refer to commands consistently in zpool-features.5Richard Laager2019-04-161-19/+19
| | | | | | | | | | | | This had a mix of command vs subcommand, quoted vs not quoted, and bolded vs. not bolded command names. Also, fix man page sections from 1M (Solaris) to 8 (Linux). Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8626
* Eliminate most mentions of "special"Richard Laager2019-04-162-13/+13
| | | | | | | | | | | | | | Previously, the "spare" vdev type was described as "A special pseudo-vdev which...". I wanted to eliminate the word "special" from that, now that the allocation_classes feature exists and there is such a thing as a "special vdev". I ended up eliminating almost all instances of the word "special" that are not referencing the allocation_classes feature. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8626
* Hint about zpool free vs zfs availableJosh Soref2019-04-041-1/+25
| | | | | | | | | Also describe free/allocated/fragmentation Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Josh Soref <[email protected]> Closes #7565 Closes #8483
* Fix man(1) warningsJosh Soref2019-04-021-3/+3
| | | | | | | | | | | The macOS man app strenuously objects to blank lines in man files. mdoc warning: Empty input line #xyz Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: bunder2015 <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Josh Soref <[email protected]> Closes #8559
* Update raw send documentationTom Caputi2019-03-291-4/+14
| | | | | | | | | | This patch simply clarifies some of the limitations related to raw sends in the man page. No functional changes. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jason Cohen <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8503 Closes #8544
* Add TRIM supportBrian Behlendorf2019-03-292-12/+203
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Contributions-by: Saso Kiselkov <[email protected]> Contributions-by: Tim Chase <[email protected]> Contributions-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8419 Closes #598
* Correct a very minor grammar issueEvan Allrich2019-03-261-2/+2
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Evan Allrich <[email protected]> Closes #8535
* MMP interval and fail_intervals in uberblockOlaf Faaland2019-03-211-24/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When Multihost is enabled, and a pool is imported, uberblock writes include ub_mmp_delay to allow an importing node to calculate the duration of an activity test. This value, is not enough information. If zfs_multihost_fail_intervals > 0 on the node with the pool imported, the safe minimum duration of the activity test is well defined, but does not depend on ub_mmp_delay: zfs_multihost_fail_intervals * zfs_multihost_interval and if zfs_multihost_fail_intervals == 0 on that node, there is no such well defined safe duration, but the importing host cannot tell whether mmp_delay is high due to I/O delays, or due to a very large zfs_multihost_interval setting on the host which last imported the pool. As a result, it may use a far longer period for the activity test than is necessary. This patch renames ub_mmp_sequence to ub_mmp_config and uses it to record the zfs_multihost_interval and zfs_multihost_fail_intervals values, as well as the mmp sequence. This allows a shorter activity test duration to be calculated by the importing host in most situations. These values are also added to the multihost_history kstat records. It calculates the activity test duration differently depending on whether the new fields are present or not; for importing pools with only ub_mmp_delay, it uses (zfs_multihost_interval + ub_mmp_delay) * zfs_multihost_import_intervals Which results in an activity test duration less sensitive to the leaf count. In addition, it makes a few other improvements: * It updates the "sequence" part of ub_mmp_config when MMP writes in between syncs occur. This allows an importing host to detect MMP on the remote host sooner, when the pool is idle, as it is not limited to the granularity of ub_timestamp (1 second). * It issues writes immediately when zfs_multihost_interval is changed so remote hosts see the updated value as soon as possible. * It fixes a bug where setting zfs_multihost_fail_intervals = 1 results in immediate pool suspension. * Update tests to verify activity check duration is based on recorded tunable values, not tunable values on importing host. * Update tests to verify the expected number of uberblocks have valid MMP fields - fail_intervals, mmp_interval, mmp_seq (sequence number), that sequence number is incrementing, and that uberblock values match tunable settings. Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7842
* Multiple DVA Scrubbing FixTom Caputi2019-03-151-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there is an issue in the sequential scrub code which prevents self healing from working in some cases. The scrub code will split up all DVA copies of a bp and issue each of them separately. The problem is that, since each of the DVAs is no longer associated with the others, the self healing code doesn't have the opportunity to repair problems that show up in one of the DVAs with the data from the others. This patch fixes this issue by ensuring that all IOs issued by the sequential scrub code include all DVAs. Initially, only the first DVA of each is attempted. If an issue arises, the IO is retried with all available copies, giving the self healing code a chance to correct the issue. To test this change, this patch also adds the ability for zinject to specify individual DVAs to inject read errors into. We then add a new test case that utilizes this functionality to ensure scrubs and self-healing reads can handle and transparently fix issues with individual copies of blocks. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8453
* Add separate aggregation limit for non-rotating mediaAlexander Motin2019-03-131-0/+11
| | | | | | | | | | | | | | | Before sequential scrub patches ZFS never aggregated I/Os above 128KB. Sequential scrub bumped that to 1MB, supposedly to reduce number of head seeks for spinning disks. But for SSDs it makes little to no sense, especially on FreeBSD, where due to MAXPHYS limitation device will likely still see bunch of 128KB I/Os instead of one large. Having more strict aggregation limit for SSDs allows to avoid allocation of large memory buffer and copy to/from it, that is a serious problem when throughput reaches gigabytes per second. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Elling <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #8494
* Detect and prevent mixed raw and non-raw sendsTom Caputi2019-03-131-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there is an issue in the raw receive code where raw receives are allowed to happen on top of previously non-raw received datasets. This is a problem because the source-side dataset doesn't know about how the blocks on the destination were encrypted. As a result, any MAC in the objset's checksum-of-MACs tree that is a parent of both blocks encrypted on the source and blocks encrypted by the destination will be incorrect. This will result in authentication errors when we decrypt the dataset. This patch fixes this issue by adding a new check to the raw receive code. The code now maintains an "IVset guid", which acts as an identifier for the set of IVs used to encrypt a given snapshot. When a snapshot is raw received, the destination snapshot will take this value from the DRR_BEGIN payload. Non-raw receives and normal "zfs snap" operations will cause ZFS to generate a new IVset guid. When a raw incremental stream is received, ZFS will check that the "from" IVset guid in the stream matches that of the "from" destination snapshot. If they do not match, the code will error out the receive, preventing the problem. This patch requires an on-disk format change to add the IVset guids to snapshots and bookmarks. As a result, this patch has errata handling and a tunable to help affected users resolve the issue with as little interruption as possible. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8308
* Add bookmark v2 on-disk featureTom Caputi2019-03-131-0/+21
| | | | | | | | | | | | | | This patch adds the bookmark v2 feature to the on-disk format. This feature will be needed for the upcoming redacted sends and for an upcoming fix that for raw receives. The feature is not currently used by any code and thus this change is a no-op, aside from the fact that the user can now enable the feature. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Issue #8308
* Increase default zfs_multihost_fail_intervals and import_intervalsOlaf Faaland2019-03-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By default, when multihost is enabled for a pool, the pool is suspended if (zfs_multihost_fail_intervals*zfs_multihost_interval) ms pass without a successful MMP write. This is the recommended configuration. The default value for zfs_multihost_fail_intervals has been 5, and the default value for zfs_multihost_interval has been 1000, so pool suspension occurred at 5 seconds. There have been multiple cases where a single misbehaving device in a pool triggered a SCSI reset, and all I/O paused for 5-6 seconds. This in turn caused MMP to suspend the pool. In the cases observed, the rest of the devices were healthy and the pool was otherwise correctly performing I/O. The reset was handled correctly by ZFS, and by suspending the pool MMP made replacing the device more difficult as well as forcing the host to be rebooted. Increase the default value of zfs_multihost_fail_intervals to 10, so that MMP tolerates up to 10 seconds of failed MMP writes before suspending the pool. Increase the default value of zfs_multihost_import_intervals to 20, to maintain the 2:1 safety factor. This results in a force import taking approximately 20 seconds when MMP is enabled, with default values. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7709 Closes #8495
* Do not resume a pool if multihost is enabledOlaf Faaland2019-02-281-0/+3
| | | | | | | | | | | When multihost is enabled, and a pool is suspended, return EINVAL in response to "zpool clear <pool>". The pool may have been imported on another host while I/O was suspended. Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6933 Closes #8460
* Warn user about accidentally sharing devicesOlaf Faaland2019-02-281-5/+30
| | | | | | | | | | | | | | Improve the man page text to warn the user about the risk of adding the same device to multiple pools via simultaneous "zpool create", "zpool add", "zpool replace", etc. State that MMP/multihost does not protect against these scenarios. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6473 Closes #8457
* abd_alloc should use scatter for >1K allocationsMatthew Ahrens2019-02-281-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | abd_alloc() normally does scatter allocations, thus solving the problem that ABD originally set out to: the bulk of ZFS's allocations are single pages, which are faster to allocate and free, and don't suffer from internal fragmentation (and the inability to reclaim memory because some buffers in the slab are still allocated). However, the current code does linear allocations for 4KB and smaller allocations, defeating the purpose of ABD. Scatter ABD's use at least one page each, so sub-page allocations waste some space when allocated as scatter (e.g. 2KB scatter allocation wastes half of each page). Using linear ABD's for small allocations means that they will be put on slabs which contain many allocations. This can improve memory efficiency, but it also makes it much harder for ARC evictions to actually free pages, because all the buffers on one slab need to be freed in order for the slab (and underlying pages) to be freed. Typically, 512B and 1KB kmem caches have 16 buffers per slab, so it's possible for them to actually waste more memory than scatter (one page per buf = wasting 3/4 or 7/8th; one buf per slab = wasting 15/16th). Spill blocks are typically 512B and are heavily used on systems running selinux with the default dnode size and the `xattr=sa` property set. By default we will use linear allocations for 512B and 1KB, and scatter allocations for larger (1.5KB and up). Reviewed-by: George Melikov <[email protected]> Reviewed-by: DHE <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8455
* zfs.8 has wrong description of "zfs program -t"Matthew Ahrens2019-02-262-12/+13
| | | | | | | | | | | | | | The "-t" argument to "zfs program" specifies a limit on the number of LUA instructions that can be executed. The zfs.8 manpage has the wrong description. It should be updated to match what's in zfs-program.8 Also fix the formatting of the zfs help message. Reviewed by: Allan Jude <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8410
* zfs(8): improve document of compression behavioursDeHackEd2019-02-251-0/+13
| | | | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed by: Allan Jude <[email protected]> Reviewed-by: bunder2015 <[email protected]> Signed-off-by: DHE <[email protected]> Closes #4660 Closes #8423
* Clarify zpool iostat statistics reportingkpande2019-02-211-4/+9
| | | | | | | | | | | Document expected behavior for zpool iostat statistics reporting. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Allan Jude <[email protected]> Signed-off-by: Kash Pande <[email protected]> Closes #2888 Closes #8417
* Fix '-T u|d' descriptions in zpool(8)Anatoly Borodin2019-02-211-4/+4
| | | | | | | | | | | | | | | | | | | In -T u|d Display a time stamp. Specify -u for a printed representation of the internal representation of time. See time(2). Specify -d for standard date format. See date(1). 'Specify u' and 'Specify d' should be used instead. `zpool list -T -u` does not work. Bring the descriptions in `zpool list` and `zpool status` in sync with `zpool iostat`. Reviewed by: Allan Jude <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Anatoly Borodin <[email protected]> Closes #8438