aboutsummaryrefslogtreecommitdiffstats
path: root/tests/runfiles
Commit message (Collapse)AuthorAgeFilesLines
...
* ZTS: test reported checksum errors for ZEDRob Wing2022-12-021-1/+2
| | | | | | | | | | | Test checksum error reporting to ZED via the call paths vdev_raidz_io_done_unrecoverable() and zio_checksum_verify(). Sponsored-by: Seagate Technology LLC Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Wing <[email protected]> Closes #14190
* Python3: replace `distutils` with `sysconfig`Damian Szuberski2022-11-281-0/+4
| | | | | | | | | | | | | | | | | | | | - `distutils` module is long time deprecated and already deleted from the CPython mainline. - To remain compatible with Debian/Ubuntu Python3 packaging style, try `distutils.sysconfig.get_python_path(0,0)` first with fallback on `sysconfig.get_path('purelib')` - pyzfs_unittest suite is run unconditionally as a part of ZTS. - Add pyzfs_unittest suite to sanity tests. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: szubersk <[email protected]> Closes #12833 Closes #13280 Closes #14177
* Fix setting the large_block feature after receiving a snapshotGeorge Amanakis2022-11-181-1/+1
| | | | | | | | | | | | | We are not allowed to dirty a filesystem when done receiving a snapshot. In this case the flag SPA_FEATURE_LARGE_BLOCKS will not be set on that filesystem since the filesystem is not on dp_dirty_datasets, and a subsequent encrypted raw send will fail. Fix this by checking in dsl_dataset_snapshot_sync_impl() if the feature needs to be activated and do so if appropriate. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #13699 Closes #13782
* Handle and detect #13709's unlock regression (#14161)Rich Ercolani2022-11-151-1/+1
| | | | | | | | | | | | | | | | In #13709, as in #11294 before it, it turns out that 63a26454 still had the same failure mode as when it was first landed as d1d47691, and fails to unlock certain datasets that formerly worked. Rather than reverting it again, let's add handling to just throw out the accounting metadata that failed to unlock when that happens, as well as a test with a pre-broken pool image to ensure that we never get bitten by this again. Fixes: #13709 Signed-off-by: Rich Ercolani <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
* Add ability to recompress send streams with new compression algorithmPaul Dagnelie2022-11-101-6/+6
| | | | | | | | | | | | | | | | | | | As new compression algorithms are added to ZFS, it could be useful for people to recompress data with new algorithms. There is currently no mechanism to do this aside from copying the data manually into a new filesystem with the new algorithm enabled. This tool allows the transformation to happen through zfs send, allowing it to be done efficiently to remote systems and in an incremental fashion. A new zstream command is added that decompresses WRITE records and then recompresses them with a provided algorithm, and then re-emits the modified send stream. It may also be possible to re-compress embedded block pointers, but that was not attempted for the initial version. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #14106
* Support idmapped mount in user namespaceyouzhongyang2022-11-081-1/+1
| | | | | | | | | | | | | | | | | | Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". Update the OpenZFS accordingly to improve the idmapped mount support. This pull request contains the following changes: - xattr setter functions are fixed to take mnt_ns argument. Without this, cp -p would fail for an idmapped mount in a user namespace. - idmap_util is enhanced/fixed for its use in a user ns context. - One test case added to test idmapped mount in a user ns. Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Youzhong Yang <[email protected]> Closes #14097
* zfs_rename: support RENAME_* flagsAleksa Sarai2022-10-281-0/+4
| | | | | | | | | | | | | | | | | | | | | | Implement support for Linux's RENAME_* flags (for renameat2). Aside from being quite useful for userspace (providing race-free ways to exchange paths and implement mv --no-clobber), they are used by overlayfs and are thus required in order to use overlayfs-on-ZFS. In order for us to represent the new renameat2(2) flags in the ZIL, we create two new transaction types for the two flags which need transactional-level support (RENAME_EXCHANGE and RENAME_WHITEOUT). RENAME_NOREPLACE does not need any ZIL support because we know that if the operation succeeded before creating the ZIL entry, there was no file to be clobbered and thus it can be treated as a regular TX_RENAME. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Aleksa Sarai <[email protected]> Closes #12209 Closes #14070
* Support idmapped mountyouzhongyang2022-10-191-0/+4
| | | | | | | | | | | | Adds support for idmapped mounts. Supported as of Linux 5.12 this functionality allows user and group IDs to be remapped without changing their state on disk. This can be useful for portable home directories and a variety of container related use cases. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Youzhong Yang <[email protected]> Closes #12923 Closes #13671
* Enforce "-F" flag on resuming recv of full/newfs on existing datasetJitendra Patidar2022-09-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | When receiving full/newfs on existing dataset, then it should be done with "-F" flag. Its enforced for initial receive in checks done in zfs_receive_one function of libzfs. Similarly, on resuming full/newfs recv on existing dataset, it should be done with "-F" flag. When dataset doesn't exist, then full/new recv is done on newly created dataset and it's marked INCONSISTENT. But when receiving on existing dataset, recv is first done on %recv and its marked INCONSISTENT. Existing dataset is not marked INCONSISTENT. Resume of full/newfs receive with dataset not INCONSISTENT indicates that its resuming newfs on existing dataset. So, enforce "-F" flag in this case. Also return an error from dmu_recv_resume_begin_check() in zfs kernel, when its resuming full/newfs recv without force. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Jitendra Patidar <[email protected]> Closes #13856 Closes #13857
* Add Linux posix_fadvise supportFinix19792022-09-081-0/+4
| | | | | | | | | | | | | | | | | The purpose of this PR is to accepts fadvise ioctl from userland to do read-ahead by demand. It could dramatically improve sequential read performance especially when primarycache is set to metadata or zfs_prefetch_disable is 1. If the file is mmaped, generic_fadvise is also called for page cache read-ahead besides dmu_prefetch. Only POSIX_FADV_WILLNEED and POSIX_FADV_SEQUENTIAL are supported in this PR currently. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Finix Yan <[email protected]> Closes #13694
* Add zilstat script to report zil kstats in a user friendly mannerAmeer Hamza2022-09-022-2/+4
| | | | | | | | | | | | Added a python script to process both global and per dataset zil kstats and report them in a user friendly manner similar to arcstat and dbufstat. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Richard Elling <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #13704
* Fix zpool status in case of unloaded keysGeorge Amanakis2022-08-221-1/+1
| | | | | | | | | | When scrubbing an encrypted filesystem with unloaded key still report an error in zpool status. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #13675 Closes #13717
* Add snapshots_changed as propertyUmer Saleem2022-08-022-2/+3
| | | | | | | | | | | | | | | | | | | Make dd_snap_cmtime property persistent across mount and unmount operations by storing in ZAP and restore the value from ZAP on hold into dd_snap_cmtime instead of updating it. Expose dd_snap_cmtime as 'snapshots_changed' property that provides a mechanism to quickly determine whether snapshot list for dataset has changed without having to mount a dataset or iterate the snapshot list. It specifies the time at which a snapshot for a dataset was last created or deleted. This allows us to be more efficient how often we query snapshots. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #13635
* Implement a new type of zfs receive: corrective receive (-c)Alek P2022-07-281-1/+2
| | | | | | | | | | | | | | This type of recv is used to heal corrupted data when a replica of the data already exists (in the form of a send file for example). With the provided send stream, corrective receive will read from disk blocks described by the WRITE records. When any of the reads come back with ECKSUM we use the data from the corresponding WRITE record to rewrite the corrupted block. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Zuchowski <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> Closes #9372
* Scrub mirror children without BPsBrian Behlendorf2022-06-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When scrubbing a raidz/draid pool, which contains a replacing or sparing mirror with multiple online children, only one child will be read. This is not normally a serious concern because the DTL records are used to determine where a good copy of the data is. As long as the data can be read from one child the mirror vdev will use it to repair gaps in any of its children. Furthermore, even if the data which was read is corrupt the raidz code will detect this and issue its own repair I/O to correct the damage in the mirror vdev. However, in the scenario where the DTL is wrong due to silent data corruption (say due to overwriting one child) and the scrub happens to read from a child with good data, then the other damaged mirror child will not be detected nor repaired. While this is possible for both raidz and draid vdevs, it's most pronounced when using draid. This is because by default the zed will sequentially rebuild a draid pool to a distributed spare, and the distributed spare half of the mirror is always preferred since it delivers better performance. This means the damaged half of the mirror will go undetected even after scrubbing. For system administrations this behavior is non-intuitive and in a worst case scenario could result in the only good copy of the data being unknowingly detached from the mirror. This change resolves the issue by reading all replacing/sparing mirror children when scrubbing. When the BP isn't available for verification, then compare the data buffers from each child. They must all be identical, if not there's silent damage and an error is returned to prompt the top-level vdev to issue a repair I/O to rewrite the data on all of the mirror children. Since we can't tell which child was wrong a checksum error is logged against the replacing or sparing mirror vdev. Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #13555
* Add Linux namespace delegation supportWill Andrews2022-06-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows ZFS datasets to be delegated to a user/mount namespace Within that namespace, only the delegated datasets are visible Works very similarly to Zones/Jailes on other ZFS OSes As a user: ``` $ unshare -Um $ zfs list no datasets available $ echo $$ 1234 ``` As root: ``` # zfs list NAME ZONED MOUNTPOINT containers off /containers containers/host off /containers/host containers/host/child off /containers/host/child containers/host/child/gchild off /containers/host/child/gchild containers/unpriv on /unpriv containers/unpriv/child on /unpriv/child containers/unpriv/child/gchild on /unpriv/child/gchild # zfs zone /proc/1234/ns/user containers/unpriv ``` Back to the user namespace: ``` $ zfs list NAME USED AVAIL REFER MOUNTPOINT containers 129M 47.8G 24K /containers containers/unpriv 128M 47.8G 24K /unpriv containers/unpriv/child 128M 47.8G 128M /unpriv/child ``` Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Will Andrews <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Mateusz Piotrowski <[email protected]> Co-authored-by: Allan Jude <[email protected]> Co-authored-by: Mateusz Piotrowski <[email protected]> Sponsored-by: Buddy <https://buddy.works> Closes #12263
* zvol: Support blk-mq for better performanceTony Hutter2022-06-092-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for the kernel's block multiqueue (blk-mq) interface in the zvol block driver. blk-mq creates multiple request queues on different CPUs rather than having a single request queue. This can improve zvol performance with multithreaded reads/writes. This implementation uses the blk-mq interfaces on 4.13 or newer kernels. Building against older kernels will fall back to the older BIO interfaces. Note that you must set the `zvol_use_blk_mq` module param to enable the blk-mq API. It is disabled by default. In addition, this commit lets the zvol blk-mq layer process whole `struct request` IOs at a time, rather than breaking them down into their individual BIOs. This reduces dbuf lock contention and overhead versus the legacy zvol submit_bio() codepath. sequential dd to one zvol, 8k volblocksize, no O_DIRECT: legacy submit_bio() 292MB/s write 453MB/s read this commit 453MB/s write 885MB/s read It also introduces a new `zvol_blk_mq_chunks_per_thread` module parameter. This parameter represents how many volblocksize'd chunks to process per each zvol thread. It can be used to tune your zvols for better read vs write performance (higher values favor write, lower favor read). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ahelenia Ziemiańska <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #13148 Issue #12483
* Introduce BLAKE3 checksums as an OpenZFS featureTino Reichardt2022-06-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds BLAKE3 checksums to OpenZFS, it has similar performance to Edon-R, but without the caveats around the latter. Homepage of BLAKE3: https://github.com/BLAKE3-team/BLAKE3 Wikipedia: https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE3 Short description of Wikipedia: BLAKE3 is a cryptographic hash function based on Bao and BLAKE2, created by Jack O'Connor, Jean-Philippe Aumasson, Samuel Neves, and Zooko Wilcox-O'Hearn. It was announced on January 9, 2020, at Real World Crypto. BLAKE3 is a single algorithm with many desirable features (parallelism, XOF, KDF, PRF and MAC), in contrast to BLAKE and BLAKE2, which are algorithm families with multiple variants. BLAKE3 has a binary tree structure, so it supports a practically unlimited degree of parallelism (both SIMD and multithreading) given enough input. The official Rust and C implementations are dual-licensed as public domain (CC0) and the Apache License. Along with adding the BLAKE3 hash into the OpenZFS infrastructure a new benchmarking file called chksum_bench was introduced. When read it reports the speed of the available checksum functions. On Linux: cat /proc/spl/kstat/zfs/chksum_bench On FreeBSD: sysctl kstat.zfs.misc.chksum_bench This is an example output of an i3-1005G1 test system with Debian 11: implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 1196 1602 1761 1749 1762 1759 1751 skein-generic 546 591 608 615 619 612 616 sha256-generic 240 300 316 314 304 285 276 sha512-generic 353 441 467 476 472 467 426 blake3-generic 308 313 313 313 312 313 312 blake3-sse2 402 1289 1423 1446 1432 1458 1413 blake3-sse41 427 1470 1625 1704 1679 1607 1629 blake3-avx2 428 1920 3095 3343 3356 3318 3204 blake3-avx512 473 2687 4905 5836 5844 5643 5374 Output on Debian 5.10.0-10-amd64 system: (Ryzen 7 5800X) implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 1840 2458 2665 2719 2711 2723 2693 skein-generic 870 966 996 992 1003 1005 1009 sha256-generic 415 442 453 455 457 457 457 sha512-generic 608 690 711 718 719 720 721 blake3-generic 301 313 311 309 309 310 310 blake3-sse2 343 1865 2124 2188 2180 2181 2186 blake3-sse41 364 2091 2396 2509 2463 2482 2488 blake3-avx2 365 2590 4399 4971 4915 4802 4764 Output on Debian 5.10.0-9-powerpc64le system: (POWER 9) implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 1213 1703 1889 1918 1957 1902 1907 skein-generic 434 492 520 522 511 525 525 sha256-generic 167 183 187 188 188 187 188 sha512-generic 186 216 222 221 225 224 224 blake3-generic 153 152 154 153 151 153 153 blake3-sse2 391 1170 1366 1406 1428 1426 1414 blake3-sse41 352 1049 1212 1174 1262 1258 1259 Output on Debian 5.10.0-11-arm64 system: (Pi400) implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 487 603 629 639 643 641 641 skein-generic 271 299 303 308 309 309 307 sha256-generic 117 127 128 130 130 129 130 sha512-generic 145 165 170 172 173 174 175 blake3-generic 81 29 71 89 89 89 89 blake3-sse2 112 323 368 379 380 371 374 blake3-sse41 101 315 357 368 369 364 360 Structurally, the new code is mainly split into these parts: - 1x cross platform generic c variant: blake3_generic.c - 4x assembly for X86-64 (SSE2, SSE4.1, AVX2, AVX512) - 2x assembly for ARMv8 (NEON converted from SSE2) - 2x assembly for PPC64-LE (POWER8 converted from SSE2) - one file for switching between the implementations Note the PPC64 assembly requires the VSX instruction set and the kfpu_begin() / kfpu_end() calls on PowerPC were updated accordingly. Reviewed-by: Felix Dörre <[email protected]> Reviewed-by: Ahelenia Ziemiańska <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Co-authored-by: Rich Ercolani <[email protected]> Closes #10058 Closes #12918
* tests: add zfs_unshare_008_pos checking whitespace escapingнаб2022-05-121-1/+1
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #13165
* Adding ZTS test for O_APPENDBrian Atkinson2022-05-112-5/+5
| | | | | | | | | | | | | | | Commit 63b18e4 fixed an issue in zpl_aio_write() to make sure that kiocb->ki_pos was updated correctly when opening a file with O_APPEND. Adding a test to verify O_APPEND functionality with lseek can make sure that all other distros/kernel versions also have the correct behavior. Also moved the threadappends_001_pos test into this append test directory in functional ZTS directory. This way the two append tests are together for organization purposes. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Brian Atkinson <[email protected]> Closes #13424
* autoconf: use include directives instead of recursing down tests (mostly)наб2022-05-101-9/+0
| | | | | | | | | | | | | | | Only down to tests/zfs-tests/tests, but pull out C programs into the main Makefile ‒ this means we get correct dependency tracking for all programs (and parallelise across them) dist diff: -zfs-2.1.99/tests/zfs-tests/tests/stress/ -zfs-2.1.99/tests/zfs-tests/tests/stress/Makefile.am -zfs-2.1.99/tests/zfs-tests/tests/stress/Makefile.in Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #13316
* Speed up WB_SYNC_NONE when a WB_SYNC_ALL occurs simultaneouslyShaan Nobee2022-05-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Page writebacks with WB_SYNC_NONE can take several seconds to complete since they wait for the transaction group to close before being committed. This is usually not a problem since the caller does not need to wait. However, if we're simultaneously doing a writeback with WB_SYNC_ALL (e.g via msync), the latter can block for several seconds (up to zfs_txg_timeout) due to the active WB_SYNC_NONE writeback since it needs to wait for the transaction to complete and the PG_writeback bit to be cleared. This commit deals with 2 cases: - No page writeback is active. A WB_SYNC_ALL page writeback starts and even completes. But when it's about to check if the PG_writeback bit has been cleared, another writeback with WB_SYNC_NONE starts. The sync page writeback ends up waiting for the non-sync page writeback to complete. - A page writeback with WB_SYNC_NONE is already active when a WB_SYNC_ALL writeback starts. The WB_SYNC_ALL writeback ends up waiting for the WB_SYNC_NONE writeback. The fix works by carefully keeping track of active sync/non-sync writebacks and committing when beneficial. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Shaan Nobee <[email protected]> Closes #12662 Closes #12790
* Improve zpool status output, list all affected datasetsGeorge Amanakis2022-04-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, determining which datasets are affected by corruption is a manual process. The primary difficulty in reporting the list of affected snapshots is that since the error was initially found, the snapshot where the error originally occurred in, may have been deleted. To solve this issue, we add the ID of the head dataset of the original snapshot which the error was detected in, to the stored error report. Then any time a filesystem is deleted, the errors associated with it are deleted as well. Any time a clone promote occurs, we modify reports associated with the original head to refer to the new head. The stored error reports are identified by this head ID, the birth time of the block which the error occurred in, as well as some information about the error itself are also stored. Once this information is stored, we can find the set of datasets affected by an error by walking back the list of snapshots in the given head until we find one with the appropriate birth txg, and then traverse through the snapshots of the clone family, terminating a branch if the block was replaced in a given snapshot. Then we report this information back to libzfs, and to the zpool status command, where it is displayed as follows: pool: test state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:00:00 with 800 errors on Fri Dec 3 08:27:57 2021 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 sdb ONLINE 0 0 1.58K errors: Permanent errors have been detected in the following files: test@1:/test.0.0 /test/test.0.0 /test/1clone/test.0.0 A new feature flag is introduced to mark the presence of this change, as well as promotion and backwards compatibility logic. This is an updated version of #9175. Rebase required fixing the tests, updating the ABI of libzfs, updating the man pages, fixing bugs, fixing the error returns, and updating the old on-disk error logs to the new format when activating the feature. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Co-authored-by: TulsiJain <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #9175 Closes #12812
* Corrected oversight in ZERO_RANGE behaviorRich Ercolani2022-04-201-1/+1
| | | | | | | | | | | | | | It turns out, no, in fact, ZERO_RANGE and PUNCH_HOLE do have differing semantics in some ways - in particular, one requires KEEP_SIZE, and the other does not. Also added a zero-range test to catch this, corrected a flaw that made the punch-hole test succeed vacuously, and a typo in file_write. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #13329 Closes #13338
* tests: move C test helpers into test cmdнаб2022-04-012-2/+6
| | | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #13259
* Allow zfs send to exclude datasetsSean Eric Fagan2022-03-181-1/+2
| | | | | | | | | | | | | | | | | | Add support for a -exclude/-X option to `zfs send` to allow dataset hierarchies to be excluded. Snapshots can be excluded using a channel program; however, this can result in failures with 'zfs send -R'; this option allows them to be excluded. Fortunately, this required a change only to cmd/zfs/zfs_main.c, using the already-existing callback argument to zfs_send() that is currently unused. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Christian Schwarz <[email protected]> Reviewed-by: Ahelenia Ziemiańska <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Sean Eric Fagan <[email protected]> Signed-off-by: Sean Eric Fagan <[email protected]> Closes #13158
* tests: validate getsubopt(3) expulsionнаб2022-03-152-3/+4
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #12996
* Fix ENOSPC when unlinking multiple files from full poolBrian Behlendorf2022-03-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | When unlinking multiple files from a pool at 100% capacity, it was possible for ENOSPC to be returned after the first unlink. e.g. rm -f /mnt/fs/test1.0.0 /mnt/fs/test1.1.0 /mnt/fs/test1.2.0 rm: cannot remove '/mnt/fs/test1.1.0': No space left on device rm: cannot remove '/mnt/fs/test1.2.0': No space left on device After waiting for the pending deferred frees from the first unlink to be processed the remaining files can then be unlinked. This is caused by the quota limit in dsl_dir_tempreserve_impl() being temporarily decreased to the allocatable pool capacity less any deferred free space. This is resolved using the existing mechanism of returning ERESTART when over quota as long as we know enough space will shortly be available after processing the pending deferred frees. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #13172
* Expose additional file level attributesUmer Saleem2022-03-073-5/+5
| | | | | | | | | | | | | | | | | | | | | ZFS allows to update and retrieve additional file level attributes for FreeBSD. This commit allows additional file level attributes to be updated and retrieved for Linux. These include the flags stored in the upper half of z_pflags only. Two new IOCTLs have been added for this purpose. ZFS_IOC_GETDOSFLAGS can be used to retrieve the attributes, while ZFS_IOC_SETDOSFLAGS can be used to update the attributes. Attributes that are allowed to be updated include ZFS_IMMUTABLE, ZFS_APPENDONLY, ZFS_NOUNLINK, ZFS_ARCHIVE, ZFS_NODUMP, ZFS_SYSTEM, ZFS_HIDDEN, ZFS_READONLY, ZFS_REPARSE, ZFS_OFFLINE and ZFS_SPARSE. Flags can be or'd together while calling ZFS_IOC_SETDOSFLAGS. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #13118
* ZTS: Move largest_pool_001_pos.ksh to Linux runfileBrian Behlendorf2022-03-012-6/+6
| | | | | | | | | | | | | | On FreeBSD pools are not allowed to be created using vdevs which are backed by ZFS volumes. This configuration is not recommended for any supported platform, nevertheless the largest_pool_001_pos.ksh test case makes use of it as a convenience. This causes the test case to fail reliably on FreeBSD. The layout is still tolerated on Linux so only perform this test on Linux. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed by: George Melikov <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #13166
* log xattr=sa create/remove/update to ZILJitendra Patidar2022-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As such, there are no specific synchronous semantics defined for the xattrs. But for xattr=on, it does log to ZIL and zil_commit() is done, if sync=always is set on dataset. This provides sync semantics for xattr=on with sync=always set on dataset. For the xattr=sa implementation, it doesn't log to ZIL, so, even with sync=always, xattrs are not guaranteed to be synced before xattr call returns to caller. So, xattr can be lost if system crash happens, before txg carrying xattr transaction is synced. This change adds xattr=sa logging to ZIL on xattr create/remove/update and xattrs are synced to ZIL (zil_commit() done) for sync=always. This makes xattr=sa behavior similar to xattr=on. Implementation notes: The actual logging is fairly straight-forward and does not warrant additional explanation. However, it has been 14 years since we last added new TX types to the ZIL [1], hence this is the first time we do it after the introduction of zpool features. Therefore, here is an overview of the feature activation and deactivation workflow: 1. The feature must be enabled. Otherwise, we don't log the new record type. This ensures compatibility with older software. 2. The feature is activated per-dataset, since the ZIL is per-dataset. 3. If the feature is enabled and dataset is not for zvol, any append to the ZIL chain will activate the feature for the dataset. Likewise for starting a new ZIL chain. 4. A dataset that doesn't have a ZIL chain has the feature deactivated. We ensure (3) by activating on the first zil_commit() after the feature was enabled. Since activating the features requires waiting for txg sync, the first zil_commit() after enabling the feature will be slower than usual. The downside is that this is really a conservative approximation: even if we never append a 'TX_SETSAXATTR' to the ZIL chain, we pay the penalty for feature activation. The upside is that the user is in control of when we pay the penalty, i.e., upon enabling the feature. We ensure (4) by hooking into zil_sync(), where ZIL destroy actually happens. One more piece on feature activation, since it's spread across multiple functions: zil_commit() zil_process_commit_list() if lwb == NULL // first zil_commit since zil_open zil_create() if no log block pointer in ZIL header: if feature enabled and not active: // CASE 1 enable, COALESCE txg wait with dmu_tx that allocated the log block else // log block was allocated earlier than this zil_open if feature enabled and not active: // CASE 2 enable, EXPLICIT txg wait else // already have an in-DRAM LWB if feature enabled and not active: // this happens when we enable the feature after zil_create // CASE 3 enable, EXPLICIT txg wait [1] https://github.com/illumos/illumos-gate/commit/da6c28aaf62fa55f0fdb8004aa40f88f23bf53f0 Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Christian Schwarz <[email protected]> Reviewed-by: Ahelenia Ziemiańska <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jitendra Patidar <[email protected]> Closes #8768 Closes #9078
* Enable encrypted raw sending to pools with greater ashiftGeorge Amanakis2022-02-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with ashift=12 results to failure when mounting pool2/encrypted (Input/Output error). Notably, the opposite, raw sending from a greater ashift to a lower one does not fail. This happens because zio_compress_write() falsely checks only ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in encrypted raw send streams. In this case it rounds up the psize and if not equal to the zio->io_size it modifies the block by zeroing out the extra bytes. Because this happens in a SA attr. registration object (type=46), the decryption fails upon mounting the filesystem, and zpool status falsely reports an error. Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT before deciding whether to zero-pad a block. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #13067 Closes #13074
* Cross-platform xattr user namespace compatibilityRyan Moeller2022-02-152-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ZFS on Linux originally implemented xattr namespaces in a way that is incompatible with other operating systems. On illumos, xattrs do not have namespaces. Every xattr name is visible. FreeBSD has two universally defined namespaces: EXTATTR_NAMESPACE_USER and EXTATTR_NAMESPACE_SYSTEM. The system namespace is used for protected FreeBSD-specific attributes such as MAC labels and pnfs state. These attributes have the namespace string "freebsd:system:" prefixed to the name in the encoding scheme used by ZFS. The user namespace is used for general purpose user attributes and obeys normal access control mechanisms. These attributes have no namespace string prefixed, so xattrs written on illumos are accessible in the user namespace on FreeBSD, and xattrs written to the user namespace on FreeBSD are accessible by the same name on illumos. Linux has several xattr namespaces. On Linux, ZFS encodes the namespace in the xattr name for every namespace, including the user namespace. As a consequence, an xattr in the user namespace with the name "foo" is stored by ZFS with the name "user.foo" and therefore appears on FreeBSD and illumos to have the name "user.foo" rather than "foo". Conversely, none of the xattrs written on FreeBSD or illumos are accessible on Linux unless the name happens to be prefixed with one of the Linux xattr namespaces, in which case the namespace is stripped from the name. This makes xattrs entirely incompatible between Linux and other platforms. We want to make the encoding of user namespace xattrs compatible across platforms. A critical requirement of this compatibility is for xattrs from existing pools from FreeBSD and illumos to be accessible by the same names in the user namespace on Linux. It is also necessary that existing pools with xattrs written by Linux retain access to those xattrs by the same names on Linux. Making user namespace xattrs from Linux accessible by the correct names on other platforms is important. The handling of other namespaces is not required to be consistent. Add a fallback mechanism for listing and getting xattrs to treat xattrs as being in the user namespace if they do not match a known prefix. Do not allow setting or getting xattrs with a name that is prefixed with one of the namespace names used by ZFS on supported platforms. Allow choosing between legacy illumos and FreeBSD compatibility and legacy Linux compatibility with a new tunable. This facilitates replication and migration of pools between hosts with different compatibility needs. The tunable controls whether or not to prefix the namespace to the name. If the xattr is already present with the alternate prefix, remove it so only the new version persists. By default the platform's existing convention is used. Reviewed-by: Christian Schwarz <[email protected]> Reviewed-by: Ahelenia Ziemiańska <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11919
* Receive checks should allow unencrypted child datasetsAttila Fülöp2022-02-091-1/+2
| | | | | | | | | | | | | | | | | | | | dmu_recv_begin_check() unconditionally sets the DS_HOLD_FLAG_DECRYPT flag before calling dsl_dataset_hold_flags(). If the key on the receiving side isn't loaded or the send stream contains embedded blocks, the receive check fails for a stream which is perfectly valid and could be received without any problem. This seems like a remnant of the initial design, where unencrypted datasets below encrypted ones weren't allowed. Add a condition to set `DS_HOLD_FLAG_DECRYPT` only for encrypted datasets, modify an existing test to detect this regression and add a test for raw replication streams. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Amanakis <[email protected]> Co-authored-by: George Amanakis <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes #13033 Closes #13076
* Linux 5.16 compat: don't use XSTATE_XSAVE to save FPU stateAttila Fülöp2022-02-091-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Linux 5.16 moved XSTATE_XSAVE and XSTATE_XRESTORE out of our reach, so add our own XSAVE{,OPT,S} code and use it for Linux 5.16. Please note that this differs from previous behavior in that it won't handle exceptions created by XSAVE an XRSTOR. This is sensible for three reasons. - Exceptions during XSAVE and XRSTOR can only occur if the feature is not supported or enabled or the memory operand isn't aligned on a 64 byte boundary. If this happens something else went terribly wrong, and it may be better to stop execution. - Previously we just printed a warning and didn't handle the fault, this is arguable for the above reason. - All other *SAVE instruction also don't handle exceptions, so this at least aligns behavior. Finally add a test to catch such a regression in the future. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes #13042 Closes #13059
* Fix clearing set-uid and set-gid bits on a file when replying a writePawel Jakub Dawidek2022-02-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POSIX requires that set-uid and set-gid bits to be removed when an unprivileged user writes to a file and ZFS does that during normal operation. The problem arrises when the write is stored in the ZIL and replayed. During replay we have no access to original credentials of the process doing the write, so zfs_write() will be performed with the root credentials. When root is doing the write set-uid and set-gid bits are not removed from the file. To correct that, log a separate TX_SETATTR entry that removed those bits on first write to such file. Idea from: Christian Schwarz Add test for ZIL replay of setuid/setgid clearing. Improve various edge cases when clearing setid bits: - The setid bits can be readded during a single write, so make sure to check for them on every chunk write. - Log TX_SETATTR record at most once per transaction group (if the setid bits are keep coming back). - Move zfs_log_setattr() outside of zp->z_acl_lock. Reviewed-by: Dan McDonald <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Christian Schwarz <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #13027
* Report dnodes with faulty bonuslenGeorge Amanakis2022-02-031-1/+2
| | | | | | | | | | | | | | | | | | | In files created/modified before 4254acb there may be a corruption of xattrs which is not reported during scrub and normal send/receive. It manifests only as an error when raw sending/receiving. This happens because currently only the raw receive path checks for discrepancies between the dnode bonus length and the spill pointer flag. In case we encounter a dnode whose bonus length is greater than the predicted one, we should report an error. Modify in this regard dnode_sync() with an assertion at the end, dump_dnode() to error out, dsl_scan_recurse() to report errors during a scrub, and zstream to report a warning when dumping. Also added a test to verify spill blocks are sent correctly in a raw send. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #12720 Closes #13014
* Introduce a flag to skip comparing the local mac when raw sendingGeorge Amanakis2022-01-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Raw receiving a snapshot back to the originating dataset is currently impossible because of user accounting being present in the originating dataset. One solution would be resetting user accounting when raw receiving on the receiving dataset. However, to recalculate it we would have to dirty all dnodes, which may not be preferable on big datasets. Instead, we rely on the os_phys flag OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is incomplete when raw receiving. Thus, on the next mount of the receiving dataset the local mac protecting user accounting is zeroed out. The flag is then cleared when user accounting of the raw received snapshot is calculated. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #12981 Closes #10523 Closes #11221 Closes #11294 Closes #12594 Issue #11300
* Linux: Implement FS_IOC_GETVERSIONRyan Moeller2021-12-171-0/+4
| | | | | | | | | | Provide access to file generation number on Linux. Add test coverage. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ahelenia Ziemiańska <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12856
* zfs, libzfs: diff: accept -h/ZFS_DIFF_NO_MANGLE, disabling path escapingнаб2021-12-131-1/+1
| | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rich Ercolani <[email protected]> Signed-off-by: Ahelenia Ziemiańska <[email protected]> Closes #12829
* Temporarily remove tests from sanity runfileJohn Wren Kennedy2021-12-011-8/+7
| | | | | | | | | | | With the addition of functionality to rerun failing tests, some tests that fail only sometimes still fail often enough to degrade the reliability of the sanity runs. Remove them from the runfile until they reliably pass. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: John Kennedy <[email protected]> Closes #12814
* pam_zfs_key: tests: check if zfs load-key works on short passphrasesAttila Fülöp2021-11-301-1/+1
| | | | | | | | | | | | | | | The pam_zfs_key pam module does not enforce a minimum password length while changing the user password and thus the users home dataset passphrase. To not end up with a dateset `zfs load-key` can't load the key for, `zfs load-key` should not enforce a minimum passphrase length. This adds a test for that. Reviewed-by: Felix Dörre <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes #12765 Closes #12651 Closes #12656
* Enable edonr in FreeBSDRich Ercolani2021-11-162-5/+1
| | | | | | | | | | | The code is integrated, builds fine, runs fine, there's not really any reason not to. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12735
* zhack: Add repair label optionFedor Uporov2021-11-111-0/+6
| | | | | | | | | | In case if all label checksums will be invalid on any vdev, the pool will become unimportable. The zhack with newly added cli options could be used to restore label checksums and make pool importable again. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fedor Uporov <[email protected]> Closes #2510 Closes #12686
* zdb: Report bad label checksumFedor Uporov2021-11-101-2/+3
| | | | | | | | | | | | In case if all label checksums will be invalid on any vdev, the pool will become unimportable. From other side zdb with -l option will not provide any useful information why it happened. Add notifications about corrupted label checksums. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Fedor Uporov <[email protected]> Closes #2509 Closes #12685
* Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistencyBrian Behlendorf2021-11-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | When using lseek(2) to report data/holes memory mapped regions of the file were ignored. This could result in incorrect results. To handle this zfs_holey_common() was updated to asynchronously writeback any dirty mmap(2) regions prior to reporting holes. Additionally, while not strictly required, the dn_struct_rwlock is now held over the dirty check to prevent the dnode structure from changing. This ensures that a clean dnode can't be dirtied before the data/hole is located. The range lock is now also taken to ensure the call cannot race with zfs_write(). Furthermore, the code was refactored to provide a dnode_is_dirty() helper function which checks the dnode for any dirty records to determine its dirtiness. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Rich Ercolani <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #11900 Closes #12724
* Normalize property names for zfs receiveRich Ercolani2021-10-291-0/+1
| | | | | | | | | | | | | | | It turns out, userland is much more happy with aliased property names than the kernel is. So let's normalize those to the expected names before we pass them off. Added a test case hacked up from the other recv -o/-x test that fails on unpatched git and passes here. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12607 Closes #12609
* libshare: nfs: pass through ipv6 addresses in bracket notationfelixdoerre2021-10-201-1/+1
| | | | | | | | | | Recognize when the host part of a sharenfs attribute is an ipv6 Literal and pass that through without modification. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Felix Dörre <[email protected]> Closes: #11171 Closes #11939 Closes: #1894
* Added test for being able to read various variants of zstdRich Ercolani2021-09-201-1/+1
| | | | | | | | | | | | | As detailed in #12022 and #12008, it turns out the current zstd implementation is quite nonportable, and results in various configurations of ondisk header that only each platform can read. So I've added a test which contains a dataset with a file written by Linux/x86_64 and one written by FBSD/ppc64. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Kennedy <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12030
* ZTS: Enable punch-hole tests on FreeBSDKa Ho Ng2021-08-302-1/+5
| | | | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Ka Ho Ng <[email protected]> Sponsored-by: The FreeBSD Foundation Closes #12458