summaryrefslogtreecommitdiffstats
path: root/man
Commit message (Collapse)AuthorAgeFilesLines
* Fix typo in zpool-features.5, section bookmark_writtenMatthew Ahrens2019-07-031-1/+1
| | | | | | | | | | The full property name includes "delphix", not "delphxi". Reviewed-by: Richard Laager <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8985
* Fix zfs(8) mandoc errorsloli10K2019-07-021-1/+0
| | | | | | | | | | | | | | This was accidentally introduced in 765d1f06: mandoc: ./man/man8/zfs.8: ERROR: skipping item outside list: It Ar filesystem Ns | Ns Ar mountpoint mandoc: ./man/man8/zfs.8: ERROR: skipping item outside list: It Xo mandoc: ./man/man8/zfs.8: ERROR: skipping end of block that is not open: Xc mandoc: ./man/man8/zfs.8: ERROR: skipping item outside list: It Xo mandoc: ./man/man8/zfs.8: ERROR: skipping end of block that is not open: Xc Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8980
* Add 'zfs umount -u' for encrypted datasetsTom Caputi2019-06-281-5/+8
| | | | | | | | | | | This patch adds the ability for the user to unload keys for datasets as they are being unmounted. This is analogous to 'zfs mount -l'. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes: #8917 Closes: #8952
* Remove dedupditto functionalityMatthew Ahrens2019-06-191-11/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If dedup is in use, the `dedupditto` property can be set, causing ZFS to keep an extra copy of data that is referenced many times (>100x). The idea was that this data is more important than other data and thus we want to be really sure that it is not lost if the disk experiences a small amount of random corruption. ZFS (and system administrators) rely on the pool-level redundancy to protect their data (e.g. mirroring or RAIDZ). Since the user/sysadmin doesn't have control over what data will be offered extra redundancy by dedupditto, this extra redundancy is not very useful. The bulk of the data is still vulnerable to loss based on the pool-level redundancy. For example, if particle strikes corrupt 0.1% of blocks, you will either be saved by mirror/raidz, or you will be sad. This is true even if dedupditto saved another 0.01% of blocks from being corrupted. Therefore, the dedupditto functionality is rarely enabled (i.e. the property is rarely set), and it fulfills its promise of increased redundancy even more rarely. Additionally, this feature does not work as advertised (on existing releases), because scrub/resilver did not repair the extra (dedupditto) copy (see https://github.com/zfsonlinux/zfs/pull/8270). In summary, this seldom-used feature doesn't work, and even if it did it wouldn't provide useful data protection. It has a non-trivial maintenance burden (again see https://github.com/zfsonlinux/zfs/pull/8270). We should remove the dedupditto functionality. For backwards compatibility with the existing CLI, "zpool set dedupditto" will still "succeed" (exit code zero), but won't have any effect. For backwards compatibility with existing pools that had dedupditto enabled at some point, the code will still be able to understand dedupditto blocks and free them when appropriate. However, ZFS won't write any new dedupditto blocks. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Issue #8270 Closes #8310
* Implement Redacted Send/ReceivePaul Dagnelie2019-06-193-8/+340
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Redacted send/receive allows users to send subsets of their data to a target system. One possible use case for this feature is to not transmit sensitive information to a data warehousing, test/dev, or analytics environment. Another is to save space by not replicating unimportant data within a given dataset, for example in backup tools like zrepl. Redacted send/receive is a three-stage process. First, a clone (or clones) is made of the snapshot to be sent to the target. In this clone (or clones), all unnecessary or unwanted data is removed or modified. This clone is then snapshotted to create the "redaction snapshot" (or snapshots). Second, the new zfs redact command is used to create a redaction bookmark. The redaction bookmark stores the list of blocks in a snapshot that were modified by the redaction snapshot(s). Finally, the redaction bookmark is passed as a parameter to zfs send. When sending to the snapshot that was redacted, the redaction bookmark is used to filter out blocks that contain sensitive or unwanted information, and those blocks are not included in the send stream. When sending from the redaction bookmark, the blocks it contains are considered as candidate blocks in addition to those blocks in the destination snapshot that were modified since the creation_txg of the redaction bookmark. This step is necessary to allow the target to rehydrate data in the case where some blocks are accidentally or unnecessarily modified in the redaction snapshot. The changes to bookmarks to enable fast space estimation involve adding deadlists to bookmarks. There is also logic to manage the life cycles of these deadlists. The new size estimation process operates in cases where previously an accurate estimate could not be provided. In those cases, a send is performed where no data blocks are read, reducing the runtime significantly and providing a byte-accurate size estimate. Reviewed-by: Dan Kimmel <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Prashanth Sreenivasa <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Chris Williamson <[email protected]> Reviewed-by: Pavel Zhakarov <[email protected]> Reviewed-by: Sebastien Roy <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #7958
* panic in removal_remap test on 4K devicesMatthew Ahrens2019-06-131-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the zfs_remove_max_segment tunable is changed to be not a multiple of the sector size, then the device removal code will malfunction and try to create mappings that are smaller than one sector, leading to a panic. On debug bits this assertion will fail in spa_vdev_copy_segment(): ASSERT3U(DVA_GET_ASIZE(&dst), ==, size); On nondebug, the system panics with a stack like: metaslab_free_concrete() metaslab_free_impl() metaslab_free_impl_cb() vdev_indirect_remap() free_from_removing_vdev() metaslab_free_impl() metaslab_free_dva() metaslab_free() Fortunately, the default for zfs_remove_max_segment is 1MB, so this can't occur by default. We hit it during this test because removal_remap.ksh changes zfs_remove_max_segment to 1KB. When testing on 4KB-sector disks, we hit the bug. This change makes the zfs_remove_max_segment tunable more robust, automatically rounding it up to a multiple of the sector size. We also turn some key assertions into VERIFY's so that similar bugs would be caught before they are encoded on disk (and thus avoid a panic-reboot-loop). Reviewed-by: Sean Eric Fagan <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: Sebastien Roy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> External-issue: DLPX-61342 Closes #8893
* compress metadata in later sync passesMatthew Ahrens2019-06-131-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Starting in sync pass 5 (zfs_sync_pass_dont_compress), we disable compression (including of metadata). Ostensibly this helps the sync passes to converge (i.e. for a sync pass to not need to allocate anything because it is 100% overwrites). However, in practice it increases the average number of sync passes, because when we turn compression off, a lot of block's size will change and thus we have to re-allocate (not overwrite) them. It also increases the number of 128KB allocations (e.g. for indirect blocks and spacemaps) because these will not be compressed. The 128K allocations are especially detrimental to performance on highly fragmented systems, which may have very few free segments of this size, and may need to load new metaslabs to satisfy 128K allocations. We should increase zfs_sync_pass_dont_compress. In practice on a highly fragmented system we see a few 5-pass txg's, a tiny number of 6-pass txg's, and no txg's with more than 6 passes. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> External-issue: DLPX-63431 Closes #8892
* looping in metaslab_block_picker impacts performance on fragmented poolsMatthew Ahrens2019-06-131-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On fragmented pools with high-performance storage, the looping in metaslab_block_picker() can become the performance-limiting bottleneck. When looking for a larger block (e.g. a 128K block for the ZIL), we may search through many free segments (up to hundreds of thousands) to find one that is large enough to satisfy the allocation. This can take a long time (up to dozens of ms), and is done while holding the ms_lock, which other threads may spin waiting for. When this performance problem is encountered, profiling will show high CPU time in metaslab_block_picker, as well as in mutex_enter from various callers. The problem is very evident on a test system with a sync write workload with 8K writes to a recordsize=8k filesystem, with 4TB of SSD storage, 84% full and 88% fragmented. It has also been observed on production systems with 90TB of storage, 76% full and 87% fragmented. The fix is to change metaslab_df_alloc() to search only up to 16MB from the previous allocation (of this alignment). After that, we will pick a segment that is of the exact size requested (or larger). This reduces the number of iterations to a few hundred on fragmented pools (a ~100x improvement). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> External-issue: DLPX-62324 Closes #8877
* fat zap should prefetch when iteratingMatthew Ahrens2019-06-121-0/+25
| | | | | | | | | | | | | | | | | When iterating over a ZAP object, we're almost always certain to iterate over the entire object. If there are multiple leaf blocks, we can realize a performance win by issuing reads for all the leaf blocks in parallel when the iteration begins. For example, if we have 10,000 snapshots, "zfs destroy -nv pool/fs@1%9999" can take 30 minutes when the cache is cold. This change provides a >3x performance improvement, by issuing the reads for all ~64 blocks of each ZAP object in parallel. Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> External-issue: DLPX-58347 Closes #8862
* make zil max block size tunableMatthew Ahrens2019-06-101-0/+12
| | | | | | | | | | | | | | | | | | | | | | | We've observed that on some highly fragmented pools, most metaslab allocations are small (~2-8KB), but there are some large, 128K allocations. The large allocations are for ZIL blocks. If there is a lot of fragmentation, the large allocations can be hard to satisfy. The most common impact of this is that we need to check (and thus load) lots of metaslabs from the ZIL allocation code path, causing sync writes to wait for metaslabs to load, which can take a second or more. In the worst case, we may not be able to satisfy the allocation, in which case the ZIL will resort to txg_wait_synced() to ensure the change is on disk. To provide a workaround for this, this change adds a tunable that can reduce the size of ZIL blocks. External-issue: DLPX-61719 Reviewed-by: George Wilson <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8865
* Reduced IOPS when all vdevs are in the zfs_mg_fragmentation_thresholdSerapheim Dimitropoulos2019-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Historically while doing performance testing we've noticed that IOPS can be significantly reduced when all vdevs in the pool are hitting the zfs_mg_fragmentation_threshold percentage. Specifically in a hypothetical pool with two vdevs, what can happen is the following: Vdev A would go above that threshold and only vdev B would be used. Then vdev B would pass that threshold but vdev A would go below it (we've been freeing from A to allocate to B). The allocations would go back and forth utilizing one vdev at a time with IOPS taking a hit. Empirically, we've seen that our vdev selection for allocations is good enough that fragmentation increases uniformly across all vdevs the majority of the time. Thus we set the threshold percentage high enough to avoid hitting the speed bump on pools that are being pushed to the edge. We effectively disable its effect in the majority of the cases but we don't remove (at least for now) just in case we hit any weird behavior in the future. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8859
* Fixed a small typo in man/man1/raidz_test.1Peter Wirdemo2019-06-051-1/+1
| | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Chris Dunlop <[email protected]> Signed-off-by: Peter Wirdemo <[email protected]> Closes #8855
* Correct man page datesRichard Laager2019-05-085-5/+5
| | | | | | | | | | | Various changes (many by me) have been made to the man pages without bumping their dates. I have now corrected them based on the last commit to each file. I also added the script I used to make these changes. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8710
* Make zfs_special_class_metadata_reserve_pct into a parameterDeHackEd2019-05-071-0/+14
| | | | | | | | Exported and documented a new module parameter. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: DHE <[email protected]> Closes #8706
* Fix send/recv lost spill blockBrian Behlendorf2019-05-071-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When receiving a DRR_OBJECT record the receive_object() function needs to determine how to handle a spill block associated with the object. It may need to be removed or kept depending on how the object was modified at the source. This determination is currently accomplished using a heuristic which takes in to account the DRR_OBJECT record and the existing object properties. This is a problem because there isn't quite enough information available to do the right thing under all circumstances. For example, when only the block size changes the spill block is removed when it should be kept. What's needed to resolve this is an additional flag in the DRR_OBJECT which indicates if the object being received references a spill block. The DRR_OBJECT_SPILL flag was added for this purpose. When set then the object references a spill block and it must be kept. Either it is update to date, or it will be replaced by a subsequent DRR_SPILL record. Conversely, if the object being received doesn't reference a spill block then any existing spill block should always be removed. Since previous versions of ZFS do not understand this new flag additional DRR_SPILL records will be inserted in to the stream. This has the advantage of being fully backward compatible. Existing ZFS systems receiving this stream will recreate the spill block if it was incorrectly removed. Updated ZFS versions will correctly ignore the additional spill blocks which can be identified by checking for the DRR_SPILL_UNMODIFIED flag. The small downside to this approach is that is may increase the size of the stream and of the received snapshot on previous versions of ZFS. Additionally, when receiving streams generated by previous unpatched versions of ZFS spill blocks may still be lost. OpenZFS-issue: https://www.illumos.org/issues/9952 FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233277 Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8668
* Cleanup special/dedup languageRichard Laager2019-05-071-6/+6
| | | | | | | | | | | | | This standardizes the language on "deduplication tables" rather than "dedup data" (which might be read as the data blocks rather than the DDT). Likewise, it standardizes on "small file blocks". It also standardizes on "normal" rather than using both "normal" and "general" in the same paragraph. I also replaced "non-specified" with the more explicit "non-dedup/special". Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8713
* OpenZFS 10473 - zfs(1M) missing cross-reference to zfs-program(1M)Richard Laager2019-05-061-1/+2
| | | | | | | | | | | | | | | Authored by: Jason King <[email protected]> Reviewed by: Toomas Soome <[email protected]> Reviewed by: Andy Fiddaman <[email protected]> Reviewed by: Peter Tribble <[email protected]> Reviewed by: Gergő Mihály Doma <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Richard Laager <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/10473 OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/736e67003 Closes #8711
* Add feature check for 'zpool resilver' commandTom Caputi2019-05-021-1/+3
| | | | | | | | | | | | The 'zpool resilver' command requires that the resilver_defer feature is active on the pool. Unfortunately, the check for this was left out of the original patch. This commit simply corrects this so that the command properly returns an error in this case. Reviewed by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8700
* Fix estimated scrub completion timeTom Caputi2019-05-011-0/+4
| | | | | | | | | | | | | | | | | | | Currently, it is possible for the 'zpool scrub' command to progress slightly beyond 100% due to concurrent changes happening on the live pool. This behavior is expected, but the userspace code for 'zpool status' would subtract the expected amount of data from the amount of data already scrubbed, resulting in a negative integer being casted to a large positive one. This number was then used to calculate the estimated completion time, resulting in wildly wrong results. This code changes the behavior so that 'zpool status' does not attempt to report an estimate during this period. Reviewed by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8611 Closes #8687
* Clarify that deduped data is encryptedRichard Laager2019-04-301-1/+2
| | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8691
* Clarify and improve encryption documentationRichard Laager2019-04-241-13/+14
| | | | | | | | | | | | | | | | | - Remove the language that "all user data" is encrypted. This is to avoid misunderstandings or arguments about what is "user data", especially in light of "user properties". - Document that properties are unencrypted. - Document that snapshot names are unencrypted. - For consistency with the rest of the zfs.8 man page, use "ZFS" as the generic noun, not (bolded) "zfs". The latter refers to the command. Likewise, use "ZFS" instead of "the kernel module". - Give "a passphrase" as an example of a "user's key". Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8652
* Duplicate the encryption copies=3 limitationRichard Laager2019-04-241-0/+5
| | | | | | | | | This adds the encryption copies=3 limitation language into the copies property section. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8651
* Deprecate dedupdittoRichard Laager2019-04-241-0/+3
| | | | | | | | | | This documents, in zpool.8, that dedupditto is deprecated and will be made to have no effect in a future release. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8650
* Eliminate useless double-bolding in man pagesRichard Laager2019-04-242-33/+33
| | | | | | | | | | As far as I know and can tell from testing, \fB\fB...\fR\fR is exactly equivalent to \fB...\fR. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Alphabetize zpool-features.5 by short nameRichard Laager2019-04-241-286/+287
| | | | | | | | | | The features are sorted in the en_US locale, not the C locale. Specifically, that means that bookmark_v2 comes _after_ bookmarks. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Standardize .RE placement in zpool-features.5Richard Laager2019-04-241-19/+6
| | | | | | | | | | This command is being used to unindent, so it should be at the end of each block. This is consistent with the other man pages. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Add missing formatting to sha512 in zpool-features.5Richard Laager2019-04-241-0/+3
| | | | | | | Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Correct GUID of large_blocks in zpool-features.5Richard Laager2019-04-241-1/+1
| | | | | | | | | It is org.open-zfs:large_blocks (plural). Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Change wording in zfs-module-parameters.5Tom Caputi2019-04-241-2/+2
| | | | | | | Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Document that hole_birth is effectively uselessRichard Laager2019-04-241-0/+8
| | | | | | | | | | | | | | The first sentence of this commit comes from the wiki, and was originally written by: Rich Ercolani <[email protected]> with changes by: Tom Caputi <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641 Closes #8642
* Document send_holes_without_birth_timeRichard Laager2019-04-241-5/+14
| | | | | | | Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Correct bookmark_v2 dependenciesRichard Laager2019-04-241-2/+2
| | | | | | | | | | | encryption depends on bookmark_v2. bookmark_v2 depends on bookmarks. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Fix formatting for multi_vdev_crash_dump in zpool-features.5Richard Laager2019-04-241-3/+3
| | | | | | | | | | This needs to use tabs instead of spaces to display correctly (i.e. with things lined up). Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8641
* Fix incorrect use of .Nm directive for ZPOOL_VDEV_NAME_GUID in zpool(8)Tomohiro Kusumi2019-04-231-2/+2
| | | | | | | | | It should only affect "zpool". Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #8644
* Code improvement and bug fixes for QAT supportcfzhu2019-04-161-13/+39
| | | | | | | | | | | | | | | | | | | | 1. Support QAT when ZFS is root file-system: When ZFS module is loaded before QAT started, the QAT can be started again in post-process, e.g.: echo 0 > /sys/module/zfs/parameters/zfs_qat_compress_disable echo 0 > /sys/module/zfs/parameters/zfs_qat_encrypt_disable echo 0 > /sys/module/zfs/parameters/zfs_qat_checksum_disable 2. Verify alder checksum of the de-compress result 3. Allocate Digest, IV and AAD buffer in physical contiguous memory by QAT_PHYS_CONTIG_ALLOC. 4. Update the documentation for zfs_qat_compress_disable, zfs_qat_checksum_disable, zfs_qat_encrypt_disable. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Weigang Li <[email protected]> Signed-off-by: Chengfeix Zhu <[email protected]> Closes #8323 Closes #8610
* Add option [-V|--version] to emit version stringTerraTech2019-04-162-2/+34
| | | | | | | | | | | | | | | | | | | | | Add the 'zfs version' and 'zpool version' subcommands to display the version of the user space utilities and loaded zfs kernel module. For example: $ zfs version zfs-0.8.0-rc3_169_g67e0366b88 zfs-kmod-0.8.0-rc3_169_g67e0366b88 The '-V' and '--version' aliases were added to support the common convention of using 'zfs --version` to obtain the version information. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: TerraTech <[email protected]> Closes #2501 Closes #8567
* zfs allow refreservation needed for zfs create -VRichard Laager2019-04-161-1/+3
| | | | | | | | | | | | When creating a non-sparse volume, zfs create sets a refreservation. Accordingly, one needs the "refreservation" ability in addition to the "create" ability in order to create a non-sparse volume. Reviewed-by: Brian Behlendorf <[email protected]> Reported-by: github.com/homerlinux Reported-by: Matthew Ahrens <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8531 Closes #8624
* Clarify GRUB's lack of support for sha512, skein, edonrRichard Laager2019-04-162-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | zfs.8 correctly said that GRUB did not support them, but zpool-features.5 said that "Booting off pools...is supported." Now, zpool-features.5 discusses GRUB specifically and indicates its lack of support for these features. Also, I have clarified the wording in both places to indicate that the pool feature cannot be used. It's not a filesystem dataset thing, but pool-wide. I described this as "cannot be used". I think technically the feature can be enabled, just not active. However, the effect is essentially the same: you cannot enable those checksum algorithms on any dataset in the pool, so you might as well not enable the feature (which is just pointing a loaded gun at your foot). In the past, an argument could be made that having all the features enabled was useful for simplicity, as long as you didn't activate the GRUB-incompatible features, but that's getting less and less realistic over time. A user can still do that, but we should not encourage that. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8626 Closes #8446
* Reword the dedup limitation for Edon-RRichard Laager2019-04-161-2/+2
| | | | | | | | | | The old wording was effectively "You can not use this (except you can)", which just seems confusing. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8626
* Consistently captialize GUID for featuresRichard Laager2019-04-161-5/+5
| | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8626
* Fix the spelling of deferredRichard Laager2019-04-161-2/+2
| | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8626
* Update zdb.8's fsck referenceRichard Laager2019-04-161-4/+2
| | | | | | | | | | On Linux, this is in man section 8, not 1M. Also, there is no fsdb on Linux, so I removed that. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8626
* Refer to commands consistently in zpool-features.5Richard Laager2019-04-161-19/+19
| | | | | | | | | | | | This had a mix of command vs subcommand, quoted vs not quoted, and bolded vs. not bolded command names. Also, fix man page sections from 1M (Solaris) to 8 (Linux). Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8626
* Eliminate most mentions of "special"Richard Laager2019-04-162-13/+13
| | | | | | | | | | | | | | Previously, the "spare" vdev type was described as "A special pseudo-vdev which...". I wanted to eliminate the word "special" from that, now that the allocation_classes feature exists and there is such a thing as a "special vdev". I ended up eliminating almost all instances of the word "special" that are not referencing the allocation_classes feature. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8626
* Hint about zpool free vs zfs availableJosh Soref2019-04-041-1/+25
| | | | | | | | | Also describe free/allocated/fragmentation Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Josh Soref <[email protected]> Closes #7565 Closes #8483
* Fix man(1) warningsJosh Soref2019-04-021-3/+3
| | | | | | | | | | | The macOS man app strenuously objects to blank lines in man files. mdoc warning: Empty input line #xyz Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: bunder2015 <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Josh Soref <[email protected]> Closes #8559
* Update raw send documentationTom Caputi2019-03-291-4/+14
| | | | | | | | | | This patch simply clarifies some of the limitations related to raw sends in the man page. No functional changes. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jason Cohen <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8503 Closes #8544
* Add TRIM supportBrian Behlendorf2019-03-292-12/+203
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Contributions-by: Saso Kiselkov <[email protected]> Contributions-by: Tim Chase <[email protected]> Contributions-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8419 Closes #598
* Correct a very minor grammar issueEvan Allrich2019-03-261-2/+2
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Evan Allrich <[email protected]> Closes #8535
* MMP interval and fail_intervals in uberblockOlaf Faaland2019-03-211-24/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When Multihost is enabled, and a pool is imported, uberblock writes include ub_mmp_delay to allow an importing node to calculate the duration of an activity test. This value, is not enough information. If zfs_multihost_fail_intervals > 0 on the node with the pool imported, the safe minimum duration of the activity test is well defined, but does not depend on ub_mmp_delay: zfs_multihost_fail_intervals * zfs_multihost_interval and if zfs_multihost_fail_intervals == 0 on that node, there is no such well defined safe duration, but the importing host cannot tell whether mmp_delay is high due to I/O delays, or due to a very large zfs_multihost_interval setting on the host which last imported the pool. As a result, it may use a far longer period for the activity test than is necessary. This patch renames ub_mmp_sequence to ub_mmp_config and uses it to record the zfs_multihost_interval and zfs_multihost_fail_intervals values, as well as the mmp sequence. This allows a shorter activity test duration to be calculated by the importing host in most situations. These values are also added to the multihost_history kstat records. It calculates the activity test duration differently depending on whether the new fields are present or not; for importing pools with only ub_mmp_delay, it uses (zfs_multihost_interval + ub_mmp_delay) * zfs_multihost_import_intervals Which results in an activity test duration less sensitive to the leaf count. In addition, it makes a few other improvements: * It updates the "sequence" part of ub_mmp_config when MMP writes in between syncs occur. This allows an importing host to detect MMP on the remote host sooner, when the pool is idle, as it is not limited to the granularity of ub_timestamp (1 second). * It issues writes immediately when zfs_multihost_interval is changed so remote hosts see the updated value as soon as possible. * It fixes a bug where setting zfs_multihost_fail_intervals = 1 results in immediate pool suspension. * Update tests to verify activity check duration is based on recorded tunable values, not tunable values on importing host. * Update tests to verify the expected number of uberblocks have valid MMP fields - fail_intervals, mmp_interval, mmp_seq (sequence number), that sequence number is incrementing, and that uberblock values match tunable settings. Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7842