aboutsummaryrefslogtreecommitdiffstats
path: root/configure.ac
Commit message (Collapse)AuthorAgeFilesLines
* Defer new resilvers until the current one endsTom Caputi2018-10-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Currently, if a resilver is triggered for any reason while an existing one is running, zfs will immediately restart the existing resilver from the beginning to include the new drive. This causes problems for system administrators when a drive fails while another is already resilvering. In this case, the optimal thing to do to reduce risk of data loss is to wait for the current resilver to end before immediately replacing the second failed drive, which allows the system to operate with two incomplete drives for the minimum amount of time. This patch introduces the resilver_defer feature that essentially does this for the admin without forcing them to wait and monitor the resilver manually. The change requires an on-disk feature since we must mark drives that are part of a deferred resilver in the vdev config to ensure that we do not assume they are done resilvering when an existing resilver completes. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: @mmaybee Signed-off-by: Tom Caputi <[email protected]> Closes #7732
* Fixes for procfs files backed by linked listsJohn Gallagher2018-09-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | There are some issues with the way the seq_file interface is implemented for kstats backed by linked lists (zfs_dbgmsgs and certain per-pool debugging info): * We don't account for the fact that seq_file sometimes visits a node multiple times, which results in missing messages when read through procfs. * We don't keep separate state for each reader of a file, so concurrent readers will receive incorrect results. * We don't account for the fact that entries may have been removed from the list between read syscalls, so reading from these files in procfs can cause the system to crash. This change fixes these issues and adds procfs_list, a wrapper around a linked list which abstracts away the details of implementing the seq_file interface for a list and exposing the contents of the list through procfs. Reviewed by: Don Brady <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Brad Lewis <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: John Gallagher <[email protected]> External-issue: LX-1211 Closes #7819
* Pool allocation classesDon Brady2018-09-051-0/+1
| | | | | | | | | | | | | | | | | | | | Allocation Classes add the ability to have allocation classes in a pool that are dedicated to serving specific block categories, such as DDT data, metadata, and small file blocks. A pool can opt-in to this feature by adding a 'special' or 'dedup' top-level VDEV. Reviewed by: Pavel Zakharov <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Håkan Johansson <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: DHE <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Gregor Kopka <[email protected]> Reviewed-by: Kash Pande <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #5182
* Add basic zfs ioc input nvpair validationDon Brady2018-09-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want newer versions of libzfs_core to run against an existing zfs kernel module (i.e. a deferred reboot or module reload after an update). Programmatically document, via a zfs_ioc_key_t, the valid arguments for the ioc commands that rely on nvpair input arguments (i.e. non legacy commands from libzfs_core). Automatically verify the expected pairs before dispatching a command. This initial phase focuses on the non-legacy ioctls. A follow-on change can address the legacy ioctl input from the zfs_cmd_t. The zfs_ioc_key_t for zfs_keys_channel_program looks like: static const zfs_ioc_key_t zfs_keys_channel_program[] = { {"program", DATA_TYPE_STRING, 0}, {"arg", DATA_TYPE_UNKNOWN, 0}, {"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL}, {"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL}, {"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL}, }; Introduce four input errors to identify specific input failures (in addition to generic argument value errors like EINVAL, ERANGE, EBADF, and E2BIG). ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #7780
* Add zfs module feature and property info to sysfsDon Brady2018-09-021-0/+1
| | | | | | | | | | | | | | | | | | | | | This extends our sysfs '/sys/module/zfs' entry to include feature and property attributes. The primary consumer of this information is user processes, like the zfs CLI, that need to know what the current loaded ZFS module supports. The libzfs binary will consult this information when instantiating the zfs and zpool property tables and the pool features table. This introduces 4 kernel objects (dirs) into '/sys/module/zfs' with corresponding attributes (files): features.runtime features.pool properties.dataset properties.pool Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #7706
* Direct IO supportBrian Behlendorf2018-08-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Direct IO via the O_DIRECT flag was originally introduced in XFS by IRIX for database workloads. Its purpose was to allow the database to bypass the page and buffer caches to prevent unnecessary IO operations (e.g. readahead) while preventing contention for system memory between the database and kernel caches. On Illumos, there is a library function called directio(3C) that allows user space to provide a hint to the file system that Direct IO is useful, but the file system is free to ignore it. The semantics are also entirely a file system decision. Those that do not implement it return ENOTTY. Since the semantics were never defined in any standard, O_DIRECT is implemented such that it conforms to the behavior described in the Linux open(2) man page as follows. 1. Minimize cache effects of the I/O. By design the ARC is already scan-resistant which helps mitigate the need for special O_DIRECT handling. Data which is only accessed once will be the first to be evicted from the cache. This behavior is in consistent with Illumos and FreeBSD. Future performance work may wish to investigate the benefits of immediately evicting data from the cache which has been read or written with the O_DIRECT flag. Functionally this behavior is very similar to applying the 'primarycache=metadata' property per open file. 2. O_DIRECT _MAY_ impose restrictions on IO alignment and length. No additional alignment or length restrictions are imposed. 3. O_DIRECT _MAY_ perform unbuffered IO operations directly between user memory and block device. No unbuffered IO operations are currently supported. In order to support features such as transparent compression, encryption, and checksumming a copy must be made to transform the data. 4. O_DIRECT _MAY_ imply O_DSYNC (XFS). O_DIRECT does not imply O_DSYNC for ZFS. Callers must provide O_DSYNC to request synchronous semantics. 5. O_DIRECT _MAY_ disable file locking that serializes IO operations. Applications should avoid mixing O_DIRECT and normal IO or mmap(2) IO to the same file. This is particularly true for overlapping regions. All I/O in ZFS is locked for correctness and this locking is not disabled by O_DIRECT. However, concurrently mixing O_DIRECT, mmap(2), and normal I/O on the same file is not recommended. This change is implemented by layering the aops->direct_IO operations on the existing AIO operations. Code already existed in ZFS on Linux for bypassing the page cache when O_DIRECT is specified. References: * http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch02s09.html * https://blogs.oracle.com/roch/entry/zfs_and_directio * https://ext4.wiki.kernel.org/index.php/Clarifying_Direct_IO's_Semantics * https://illumos.org/man/3c/directio Reviewed-by: Richard Elling <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #224 Closes #7823
* Use posix format for dist tarballsBrian Behlendorf2018-08-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | Traditionally Automake has defaulted to the V7 tar format when creating tarballs for distributions. One of the many limitions of this format is a 99 character maximum path + file name limit. This can cause problems when adding new test cases to the ZTS due to the depth of the sub-tree and descriptive test names. This change switches the build system to the posix (aliased as pax) tar format which conforms to the POSIX.1-2001 specification. This format does not suffer from the V7 limitations, was designed to be compatible, and will become the default format in future versions of GNU tar. https://www.gnu.org/software/tar/manual/html_chapter/tar_8.html As part of this change the blockfiles directories which were originally removed due to this limit have been readded. Reviewed by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7767
* OpenZFS 9166 - zfs storage pool checkpointSerapheim Dimitropoulos2018-06-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Details about the motivation of this feature and its usage can be found in this blogpost: https://sdimitro.github.io/post/zpool-checkpoint/ A lightning talk of this feature can be found here: https://www.youtube.com/watch?v=fPQA8K40jAM Implementation details can be found in big block comment of spa_checkpoint.c Side-changes that are relevant to this commit but not explained elsewhere: * renames members of "struct metaslab trees to be shorter without losing meaning * space_map_{alloc,truncate}() accept a block size as a parameter. The reason is that in the current state all space maps that we allocate through the DMU use a global tunable (space_map_blksz) which defauls to 4KB. This is ok for metaslab space maps in terms of bandwirdth since they are scattered all over the disk. But for other space maps this default is probably not what we want. Examples are device removal's vdev_obsolete_sm or vdev_chedkpoint_sm from this review. Both of these have a 1:1 relationship with each vdev and could benefit from a bigger block size. Porting notes: * The part of dsl_scan_sync() which handles async destroys has been moved into the new dsl_process_async_destroys() function. * Remove "VERIFY(!(flags & FWRITE))" in "kernel.c" so zhack can write to block device backed pools. * ZTS: * Fix get_txg() in zpool_sync_001_pos due to "checkpoint_txg". * Don't use large dd block sizes on /dev/urandom under Linux in checkpoint_capacity. * Adopt Delphix-OS's setting of 4 (spa_asize_inflation = SPA_DVAS_PER_BP + 1) for the checkpoint_capacity test to speed its attempts to fill the pool * Create the base and nested pools with sync=disabled to speed up the "setup" phase. * Clear labels in test pool between checkpoint tests to avoid duplicate pool issues. * The import_rewind_device_replaced test has been marked as "known to fail" for the reasons listed in its DISCLAIMER. * New module parameters: zfs_spa_discard_memory_limit, zfs_remove_max_bytes_pause (not documented - debugging only) vdev_max_ms_count (formerly metaslabs_per_vdev) vdev_min_ms_count Authored by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Richard Lowe <[email protected]> Ported-by: Tim Chase <[email protected]> Signed-off-by: Tim Chase <[email protected]> OpenZFS-issue: https://illumos.org/issues/9166 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7159fdb8 Closes #7570
* ZTS: Adopt OpenZFS test analysis scriptBrian Behlendorf2018-06-201-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adopt and extend the OpenZFS ZTS results analysis script for use with ZFS on Linux. This allows for automatic analysis of tests which may be skipped for a variety or reasons or which are not entirely reliable. In addition to the list of 'known' failures, which have been updated for ZFS on Linux, there in a new 'maybe' section. This mapping include tests which might be correctly skipped depending on the test environment. This may be because of a missing dependency or lack of required kernel support. This list also includes tests which normally pass but might on occasion fail for a harmless reason. The script was also extended include a reason for why a given test might be skipped or may fail. The reason will be included after the test in the "results other than PASS that are expected" section. For failures it is preferable to set the reason to the GitHub issue number and for skipped tests several generic reasons are available. You may also specify a custom reason if needed. All tests were added back in to the linux.run file even if they are expected to failed. There is value in running tests which may not pass, the expected results for these tests has been encoded in the new analysis script. All tests which were disabled because they ran more slowly on a 32-bit system have been re-enabled. Developers working on 32-bit systems should assess what it reasonable for their environment. The unnecessary dependency on physical block devices was removed for the checksum, grow_pool, and grow_replicas test groups so they are no longer skipped. Updated the filetest_001_pos test case to run properly now that it is enabled and moved the grow tests in to a single directory. Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7638
* Tunable directory for zfs runtime scriptsAntonio Russo2018-06-071-1/+0
| | | | | | | | | | | | | zpool and zed place scripts in subdirectories of libexecdir. Some distributions locate architecture independent scripts in other locations (e.g. Debian). To avoid these paths getting out of sync, centralize the definitions. Build zfs-test's default.cfg by Makefile. Use the new directory logic building tests/zfs-tests/include/default.cfg.in. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #7597
* Add pool state /proc entry, "SUSPENDED" poolsTony Hutter2018-06-061-0/+1
| | | | | | | | | | | | | | | | | | | 1. Add a proc entry to display the pool's state: $ cat /proc/spl/kstat/zfs/tank/state ONLINE This is done without using the spa config locks, so it will never hang. 2. Fix 'zpool status' and 'zpool list -o health' output to print "SUSPENDED" instead of "ONLINE" for suspended pools. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Richard Elling <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #7331 Closes #7563
* Explicitly state supported Linux versionsAntonio Russo2018-05-301-2/+2
| | | | | | | | | | | | | | | | | | | Add META tags Linux-Maximum and Linux-Minimum. One pain point for package maintainers is ensuring the compatibility of the packaged version of ZFS with the Linux kernel. By providing an authoritative compatibility guide in the source tree, maintainers can automate compatibility checks. Additionally, increase META string extraction specificity. configure.ac finds Name and Version by a very simple `grep`, which might conceivably find other fields. Require the string be at the beginning of a line, and be followed by a colon to avoid such confusions. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #7571
* Update build system and packagingBrian Behlendorf2018-05-291-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Minimal changes required to integrate the SPL sources in to the ZFS repository build infrastructure and packaging. Build system and packaging: * Renamed SPL_* autoconf m4 macros to ZFS_*. * Removed redundant SPL_* autoconf m4 macros. * Updated the RPM spec files to remove SPL package dependency. * The zfs package obsoletes the spl package, and the zfs-kmod package obsoletes the spl-kmod package. * The zfs-kmod-devel* packages were updated to add compatibility symlinks under /usr/src/spl-x.y.z until all dependent packages can be updated. They will be removed in a future release. * Updated copy-builtin script for in-kernel builds. * Updated DKMS package to include the spl.ko. * Updated stale AUTHORS file to include all contributors. * Updated stale COPYRIGHT and included the SPL as an exception. * Renamed README.markdown to README.md * Renamed OPENSOLARIS.LICENSE to LICENSE. * Renamed DISCLAIMER to NOTICE. Required code changes: * Removed redundant HAVE_SPL macro. * Removed _BOOT from nvpairs since it doesn't apply for Linux. * Initial header cleanup (removal of empty headers, refactoring). * Remove SPL repository clone/build from zimport.sh. * Use of DEFINE_RATELIMIT_STATE and DEFINE_SPINLOCK removed due to build issues when forcing C99 compilation. * Replaced legacy ACCESS_ONCE with READ_ONCE. * Include needed headers for `current` and `EXPORT_SYMBOL`. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> TEST_ZIMPORT_SKIP="yes" Closes #7556
* Adopt pyzfs from ClusterHQloli10K2018-05-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit introduces several changes: * Update LICENSE and project information * Give a good PEP8 talk to existing Python source code * Add RPM/DEB packaging for pyzfs * Fix some outstanding issues with the existing pyzfs code caused by changes in the ABI since the last time the code was updated * Integrate pyzfs Python unittest with the ZFS Test Suite * Add missing libzfs_core functions: lzc_change_key, lzc_channel_program, lzc_channel_program_nosync, lzc_load_key, lzc_receive_one, lzc_receive_resumable, lzc_receive_with_cmdprops, lzc_receive_with_header, lzc_reopen, lzc_send_resume, lzc_sync, lzc_unload_key, lzc_remap Note: this commit slightly changes zfs_ioc_unload_key() ABI. This allow to differentiate the case where we tried to unload a key on a non-existing dataset (ENOENT) from the situation where a dataset has no key loaded: this is consistent with the "change" case where trying to zfs_ioc_change_key() from a dataset with no key results in EACCES. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7230
* Fix 'zfs remap <poolname@snapname>'LOLi2018-04-191-0/+1
| | | | | | | | | | | | Only filesystems and volumes are valid 'zfs remap' parameters: when passed a snapshot name zfs_remap_indirects() does not handle the EINVAL returned from libzfs_core, which results in failing an assertion and consequently crashing. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7454
* Fix ENOSPC in "Handle zap_add() failures in ..."Chunwei Chen2018-04-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit cc63068 caused ENOSPC error when copy a large amount of files between two directories. The reason is that the patch limits zap leaf expansion to 2 retries, and return ENOSPC when failed. The intent for limiting retries is to prevent pointlessly growing table to max size when adding a block full of entries with same name in different case in mixed mode. However, it turns out we cannot use any limit on the retry. When we copy files from one directory in readdir order, we are copying in hash order, one leaf block at a time. Which means that if the leaf block in source directory has expanded 6 times, and you copy those entries in that block, by the time you need to expand the leaf in destination directory, you need to expand it 6 times in one go. So any limit on the retry will result in error where it shouldn't. Note that while we do use different salt for different directories, it seems that the salt/hash function doesn't provide enough randomization to the hash distance to prevent this from happening. Since cc63068 has already been reverted. This patch adds it back and removes the retry limit. Also, as it turn out, failing on zap_add() has a serious side effect for mzap_upgrade(). When upgrading from micro zap to fat zap, it will call zap_add() to transfer entries one at a time. If it hit any error halfway through, the remaining entries will be lost, causing those files to become orphan. This patch add a VERIFY to catch it. Reviewed-by: Sanjeev Bagewadi <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Albert Lee <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #7401 Closes #7421
* OpenZFS 7614, 9064 - zfs device evacuation/removalMatthew Ahrens2018-04-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OpenZFS 7614 - zfs device evacuation/removal OpenZFS 9064 - remove_mirror should wait for device removal to complete This project allows top-level vdevs to be removed from the storage pool with "zpool remove", reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now "indirect") vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become "obsolete" because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been "remapped" in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be "remapped" to their new (concrete) locations if possible. This process can be accelerated by using the "zfs remap" command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. At the moment, only mirrors and simple top-level vdevs can be removed and no removal is allowed if any of the top-level vdevs are raidz. Porting Notes: * Avoid zero-sized kmem_alloc() in vdev_compact_children(). The device evacuation code adds a dependency that vdev_compact_children() be able to properly empty the vdev_child array by setting it to NULL and zeroing vdev_children. Under Linux, kmem_alloc() and related functions return a sentinel pointer rather than NULL for zero-sized allocations. * Remove comment regarding "mpt" driver where zfs_remove_max_segment is initialized to SPA_MAXBLOCKSIZE. Change zfs_condense_indirect_commit_entry_delay_ticks to zfs_condense_indirect_commit_entry_delay_ms for consistency with most other tunables in which delays are specified in ms. * ZTS changes: Use set_tunable rather than mdb Use zpool sync as appropriate Use sync_pool instead of sync Kill jobs during test_removal_with_operation to allow unmount/export Don't add non-disk names such as "mirror" or "raidz" to $DISKS Use $TEST_BASE_DIR instead of /tmp Increase HZ from 100 to 1000 which is more common on Linux removal_multiple_indirection.ksh Reduce iterations in order to not time out on the code coverage builders. removal_resume_export: Functionally, the test case is correct but there exists a race where the kernel thread hasn't been fully started yet and is not visible. Wait for up to 1 second for the removal thread to be started before giving up on it. Also, increase the amount of data copied in order that the removal not finish before the export has a chance to fail. * MMP compatibility, the concept of concrete versus non-concrete devices has slightly changed the semantics of vdev_writeable(). Update mmp_random_leaf_impl() accordingly. * Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool feature which is not supported by OpenZFS. * Added support for new vdev removal tracepoints. * Test cases removal_with_zdb and removal_condense_export have been intentionally disabled. When run manually they pass as intended, but when running in the automated test environment they produce unreliable results on the latest Fedora release. They may work better once the upstream pool import refectoring is merged into ZoL at which point they will be re-enabled. Authored by: Matthew Ahrens <[email protected]> Reviewed-by: Alex Reece <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Reviewed by: Richard Laager <[email protected]> Reviewed by: Tim Chase <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Garrett D'Amore <[email protected]> Ported-by: Tim Chase <[email protected]> Signed-off-by: Tim Chase <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/7614 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb Closes #6900
* Add 'zpool split' coverage to the ZFS Test SuiteLOLi2018-04-121-0/+1
| | | | | | | | | | | | | This change adds five new tests to the ZTS: * zpool_split_cliargs: verify command line options and arguments * zpool_split_devices: verify zpool split accepts a device list * zpool_split_encryption: verify zpool can split encrypted pools * zpool_split_props: verify zpool split can set property values * zpool_split_vdevs: verify vdev layout when splitting the pool Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7409
* systemd mount generator and tracking ZEDLETAntonio Russo2018-04-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | zfs-mount-generator implements the "systemd generator" protocol, producing systemd.mount units from the cached outputs of zfs list, during early boot, integrating with systemd. Each pool has an indpendent cache of the command zfs list -H -oname,mountpoint,canmount -tfilesystem -r $pool which is kept synchronized by the ZEDLET history_event-zfs-list-cacher.sh Datasets not in the cache will be loaded later in the boot process by zfs-mount.service, including pools without a cache. Among other things, this allows for complex mount hierarchies. Reviewed-by: Fabian Grünbichler <[email protected]> Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #7329
* Fix mmap / libaio deadlockBrian Behlendorf2018-03-281-0/+1
| | | | | | | | | | | | | | Calling uiomove() in mappedread() under the page lock can result in a deadlock if the user space page needs to be faulted in. Resolve the issue by dropping the page lock before the uiomove(). The inode range lock protects against concurrent updates via zfs_read() and zfs_write(). Reviewed-by: Albert Lee <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7335 Closes #7339
* Add JSON output support to channel programsAlek P2018-03-191-0/+1
| | | | | | | | | | | | | | | | | | | The changes piggyback JSON output support on top of channel programs (#6558). This way the JSON output support is targeted to scripting use cases and is easily maintainable since it really only touches one function (zfs_do_channel_program()). This patch ports Joyent's JSON nvlist library from illumos to enable easy JSON printing of channel program output nvlist. To keep the delta small I also took advantage of the fact that printing in zfs_do_channel_program() was almost always done before exiting the program. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> Closes #7281
* Take user namespaces into account in policy checksWolfgang Bumiller2018-03-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Change file related checks to use user namespaces and make sure involved uids/gids are mappable in the current namespace. Note that checks without file ownership information will still not take user namespaces into account, as some of these should be handled via 'zfs allow' (otherwise root in a user namespace could issue commands such as `zpool export`). This also adds an initial user namespace regression test for the setgid bit loss, with a user_ns_exec helper usable in further tests. Additionally, configure checks for the required user namespace related features are added for: * ns_capable * kuid/kgid_has_mapping() * user_ns in cred_t Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Wolfgang Bumiller <[email protected]> Closes #6800 Closes #7270
* Project Quota on ZFSNasf-Fan2018-02-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Project quota is a new ZFS system space/object usage accounting and enforcement mechanism. Similar as user/group quota, project quota is another dimension of system quota. It bases on the new object attribute - project ID. Project ID is a numerical value to indicate to which project an object belongs. An object only can belong to one project though you (the object owner or privileged user) can change the object project ID via 'chattr -p' or 'zfs project [-s] -p' explicitly. The object also can inherit the project ID from its parent when created if the parent has the project inherit flag (that can be set via 'chattr +P' or 'zfs project -s [-p]'). By accounting the spaces/objects belong to the same project, we can know how many spaces/objects used by the project. And if we set the upper limit then we can control the spaces/objects that are consumed by such project. It is useful when multiple groups and users cooperate for the same project, or a user/group needs to participate in multiple projects. Support the following commands and functionalities: zfs set projectquota@project zfs set projectobjquota@project zfs get projectquota@project zfs get projectobjquota@project zfs get projectused@project zfs get projectobjused@project zfs projectspace zfs allow projectquota zfs allow projectobjquota zfs allow projectused zfs allow projectobjused zfs unallow projectquota zfs unallow projectobjquota zfs unallow projectused zfs unallow projectobjused chattr +/-P chattr -p project_id lsattr -p This patch also supports tree quota based on the project quota via "zfs project" commands set as following: zfs project [-d|-r] <file|directory ...> zfs project -C [-k] [-r] <file|directory ...> zfs project -c [-0] [-d|-r] [-p id] <file|directory ...> zfs project [-p id] [-r] [-s] <file|directory ...> For "df [-i] $DIR" command, if we set INHERIT (project ID) flag on the $DIR, then the proejct [obj]quota and [obj]used values for the $DIR's project ID will be shown as the total/free (avail) resource. Keep the same behavior as EXT4/XFS does. Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by Ned Bass <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fan Yong <[email protected]> TEST_ZIMPORT_POOLS="zol-0.6.1 zol-0.6.2 master" Change-Id: Ib4f0544602e03fb61fd46a849d7ba51a6005693c Closes #6290
* OpenZFS 7431 - ZFS Channel ProgramsChris Williamson2018-02-081-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | Authored by: Chris Williamson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Approved by: Garrett D'Amore <[email protected]> Ported-by: Don Brady <[email protected]> Ported-by: John Kennedy <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/7431 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/dfc11533 Porting Notes: * The CLI long option arguments for '-t' and '-m' don't parse on linux * Switched from kmem_alloc to vmem_alloc in zcp_lua_alloc * Lua implementation is built as its own module (zlua.ko) * Lua headers consumed directly by zfs code moved to 'include/sys/lua/' * There is no native setjmp/longjump available in stock Linux kernel. Brought over implementations from illumos and FreeBSD * The get_temporary_prop() was adapted due to VFS platform differences * Use of inline functions in lua parser to reduce stack usage per C call * Skip some ZFS Test Suite ZCP tests on sparc64 to avoid stack overflow
* Add dbuf hash and dbuf cache kstatsGiuseppe Di Natale2018-01-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | Introduce kstats about the dbuf hash and dbuf cache to make it easier to inspect state. This should help with debugging and understanding of these portions of the codebase. Correct format of dbuf kstat file. Introduce a dbc column to dbufs kstat to indicate if a dbuf is in the dbuf cache. Introduce field filtering in the dbufstat python script. Introduce a no header option to the dbufstat python script. Introduce a test case to test basic mru->mfu list movement in the ARC. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #6906
* Extend deadman logicBrian Behlendorf2018-01-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The intent of this patch is extend the existing deadman code such that it's flexible enough to be used by both ztest and on production systems. The proposed changes include: * Added a new `zfs_deadman_failmode` module option which is used to dynamically control the behavior of the deadman. It's loosely modeled after, but independant from, the pool failmode property. It can be set to wait, continue, or panic. * wait - Wait for the "hung" I/O (default) * continue - Attempt to recover from a "hung" I/O * panic - Panic the system * Added a new `zfs_deadman_ziotime_ms` module option which is analogous to `zfs_deadman_synctime_ms` except instead of applying to a pool TXG sync it applies to zio_wait(). A default value of 300s is used to define a "hung" zio. * The ztest deadman thread has been re-enabled by default, aligned with the upstream OpenZFS code, and then extended to terminate the process when it takes significantly longer to complete than expected. * The -G option was added to ztest to print the internal debug log when a fatal error is encountered. This same option was previously added to zdb in commit fa603f82. Update zloop.sh to unconditionally pass -G to obtain additional debugging. * The FM_EREPORT_ZFS_DELAY event which was previously posted when the deadman detect a "hung" pool has been replaced by a new dedicated FM_EREPORT_ZFS_DEADMAN event. * The proposed recovery logic attempts to restart a "hung" zio by calling zio_interrupt() on any outstanding leaf zios. We may want to further restrict this to zios in either the ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages. Calling zio_interrupt() is expected to only be useful for cases when an IO has been submitted to the physical device but for some reasonable the completion callback hasn't been called by the lower layers. This shouldn't be possible but has been observed and may be caused by kernel/driver bugs. * The 'zfs_deadman_synctime_ms' default value was reduced from 1000s to 600s. * Depending on how ztest fails there may be no cache file to move. This should not be considered fatal, collect the logs which are available and carry on. * Add deadman test cases for spa_deadman() and zio_wait(). * Increase default zfs_deadman_checktime_ms to 60s. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6999
* contrib/initramfs: switch to automakeLOLi2017-11-071-0/+3
| | | | | | | Use automake to build initramfs scripts and hooks. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6761
* Allow 'zpool events' filtering by pool nameLOLi2017-10-261-0/+1
| | | | | | | | | | | | | | Additionally add four new tests: * zpool_events_clear: verify 'zpool events -c' functionality * zpool_events_cliargs: verify command line options and arguments * zpool_events_follow: verify 'zpool events -f' * zpool_events_poolname: verify events filtering by pool name Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #3285 Closes #6762
* Added no_scrub_restart flag to zpool reopenArkadiusz Bubała2017-10-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | Added -n flag to zpool reopen that allows a running scrub operation to continue if there is a device with Dirty Time Log. By default if a component device has a DTL and zpool reopen is executed all running scan operations will be restarted. Added functional tests for `zpool reopen` Tests covers following scenarios: * `zpool reopen` without arguments, * `zpool reopen` with pool name as argument, * `zpool reopen` while scrubbing, * `zpool reopen -n` while scrubbing, * `zpool reopen -n` while resilvering, * `zpool reopen` with bad arguments. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Arkadiusz Bubała <[email protected]> Closes #6076 Closes #6746
* Encryption patch follow-upTom Caputi2017-10-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * PBKDF2 implementation changed to OpenSSL implementation. * HKDF implementation moved to its own file and tests added to ensure correctness. * Removed libzfs's now unnecessary dependency on libzpool and libicp. * Ztest can now create and test encrypted datasets. This is currently disabled until issue #6526 is resolved, but otherwise functions as advertised. * Several small bug fixes discovered after enabling ztest to run on encrypted datasets. * Fixed coverity defects added by the encryption patch. * Updated man pages for encrypted send / receive behavior. * Fixed a bug where encrypted datasets could receive DRR_WRITE_EMBEDDED records. * Minor code cleanups / consolidation. Signed-off-by: Tom Caputi <[email protected]>
* Add 'zfs diff' coverage to the ZFS Test SuiteLOLi2017-09-281-0/+1
| | | | | | | | | | | | | This change adds four new tests to the ZTS: * zfs_diff_changes: verify type of changes diplayed (-, +, R and M) * zfs_diff_cliargs: verify command line options and arguments * zfs_diff_timestamp: verify 'zfs diff -t' * zfs_diff_types: verify type of objects (files, dirs, pipes...) Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Wren Kennedy <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #6686
* Fix some ZFS Test Suite issuesLOLi2017-09-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add 'zfs bookmark' coverage (zfs_bookmark_cliargs) * Add OpenZFS 8166 coverage (zpool_scrub_offline_device) * Fix "busy" zfs_mount_remount failures * Fix bootfs_003_pos, bootfs_004_neg, zdb_005_pos local cleanup * Update usage of $KEEP variable, add get_all_pools() function * Enable history_008_pos and rsend_019_pos (non-32bit builders) * Enable zfs_copies_005_neg, update local cleanup * Fix zfs_send_007_pos (large_dnode + OpenZFS 8199) * Fix rollback_003_pos (use dataset name, not mountpoint, to unmount) * Update default_raidz_setup() to work properly with more than 3 disks * Use $TEST_BASE_DIR instead of hardcoded (/var)/tmp for file VDEVs * Update usage of /dev/random to /dev/urandom Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Issue #6086 Closes #5658 Closes #6143 Closes #6421 Closes #6627 Closes #6632
* Add support for "--enable-code-coverage" optionPrakash Surya2017-09-221-1/+1
| | | | | | | | | | | | | | | | | | This change adds support for a new option that can be passed to the configure script: "--enable-code-coverage". Further, the "--enable-gcov" option has been removed, as this new option provides the same functionality (plus more). When using this new option the following make targets are available: * check-code-coverage * code-coverage-capture * code-coverage-clean Note: these make targets can only be run from the root of the project. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Closes #6670
* Add configure option to enable gcov analysisBrian Behlendorf2017-09-151-0/+1
| | | | | | | | | * Add configure option to enable gcov analysis. * Includes a few minor ctime fixes. * Add codecov.yml configuration. Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6642
* Implement --enable-debuginfo to force debuginfoRichard Yao2017-08-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Inspection of a Ubuntu 14.04 x64 system revealed that the config file used to build the kernel image differs from the config file used to build kernel modules by the presence of CONFIG_DEBUG_INFO=y: This in itself is insufficient to show that the kernel is built with debuginfo, but a cursory analysis of the debuginfo provided and the size of the kernel strongly suggests that it was built with CONFIG_DEBUG_INFO=y while the modules were not. Installing linux-image-$(uname -r)-dbgsym had no obvious effect on the debuginfo provided by either the modules or the kernel. The consequence is that issue reports from distributions such as Ubuntu and its derivatives build kernel modules without debuginfo contain nonsensical backtraces. It is therefore desireable to force generation of debuginfo, so we implement --enable-debuginfo. Since the build system can build both userspace components and kernel modules, the generic --enable-debuginfo option will force debuginfo for both. However, it also supports --enable-debuginfo=kernel and --enable-debuginfo=user for finer grained control. Enabling debuginfo for the kernel modules works by injecting CONFIG_DEBUG_INFO=y into the make environment. This is enables generation of debuginfo by the kernel build systems on all Linux kernels, but the build environment is slightly different int hat CONFIG_DEBUG_INFO has not been in the CPP. Adding -DCONFIG_DEBUG_INFO would fix that, but it would also cause build failures on kernels where CONFIG_DEBUG_INFO=y is already set. That would complicate its use in DKMS environments that support a range of kernels and is therefore undesireable. We could write a compatibility shim to enable CONFIG_DEBUG_INFO only when it is explicitly disabled, but we forgo doing that because it is unnecessary. Nothing in ZoL or the kernel uses CONFIG_DEBUG_INFO in the CPP at this time and that is unlikely to change. Enabling debuginfo for the userspace components is done by injecting -g into CPPFLAGS. This is not necessary because the build system honors the environment's CPPFLAGS by appending them to the actual CPPFLAGS used, but it is supported for consistency. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #2734
* Retire legacy test infrastructureBrian Behlendorf2017-08-151-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Removed zpios kmod, utility, headers and man page. * Removed unused scripts zpios-profile/*, zpios-test/*, zpool-config/*, smb.sh, zpios-sanity.sh, zpios-survey.sh, zpios.sh, and zpool-create.sh. * Removed zfs-script-config.sh.in. When building 'make' generates a common.sh with in-tree path information from the common.sh.in template. This file and sourced by the test scripts and used for in-tree testing, it is not included in the packages. When building packages 'make install' uses the same template to create a new common.sh which is appropriate for the packaging. * Removed unused functions/variables from scripts/common.sh.in. Only minimal path information and configuration environment variables remain. * Removed unused scripts from scripts/ directory. * Remaining shell scripts in the scripts directory updated to cleanly pass shellcheck and added to checked scripts. * Renamed tests/test-runner/cmd/ to tests/test-runner/bin/ to match install location name. * Removed last traces of the --enable-debug-dmu-tx configure options which was retired some time ago. Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6509
* Native Encryption for ZFS on LinuxTom Caputi2017-08-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #494 Closes #5769
* Add libtpool (thread pools)Brian Behlendorf2017-08-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OpenZFS provides a library called tpool which implements thread pools for user space applications. Porting this library means the zpool utility no longer needs to borrow the kernel mutex and taskq interfaces from libzpool. This code was updated to use the tpool library which behaves in a very similar fashion. Porting libtpool was relatively straight forward and minimal modifications were needed. The core changes were: * Fully convert the library to use pthreads. * Updated signal handling. * lmalloc/lfree converted to calloc/free * Implemented portable pthread_attr_clone() function. Finally, update the build system such that libzpool.so is no longer linked in to zfs(8), zpool(8), etc. All that is required is libzfs to which the zcommon soures were added (which is the way it always should have been). Removing the libzpool dependency resulted in several build issues which needed to be resolved. * Moved zfeature support to module/zcommon/zfeature_common.c * Moved ratelimiting to to module/zfs/zfs_ratelimit.c * Moved get_system_hostid() to lib/libspl/gethostid.c * Removed use of cmn_err() in zcommon source * Removed dprintf_setup() call from zpool_main.c and zfs_main.c * Removed highbit() and lowbit() * Removed unnecessary library dependencies from Makefiles * Removed fletcher-4 kstat in user space * Added sha2 support explicitly to libzfs * Added highbit64() and lowbit64() to zpool_util.c Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6442
* Add zgenhostid utility scriptOlaf Faaland2017-07-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Turning the multihost property on requires that a hostid be set to allow ZFS to determine when a foreign system is attemping to import a pool. The error message instructing the user to set a hostid refers to genhostid(1). Genhostid(1) is not available on SUSE Linux. This commit adds a script modeled after genhostid(1) for those users. Zgenhostid checks for an /etc/hostid file; if it does not exist, it creates one and stores a value. If the user has provided a hostid as an argument, that value is used. Otherwise, a random hostid is generated and stored. This differs from the CENTOS 6/7 versions of genhostid, which overwrite the /etc/hostid file even though their manpages state otherwise. A man page for zgenhostid is added. The one for genhostid is in (1), but I put zgenhostid in (8) because I believe it's more appropriate. The mmp tests are modified to use zgenhostid to set the hostid instead of using the spl_hostid module parameter. zgenhostid will not replace an existing /etc/hostid file, so new mmp_clear_hostid calls are required. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6358 Closes #6379
* Multi-modifier protection (MMP)Olaf Faaland2017-07-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add multihost=on|off pool property to control MMP. When enabled a new thread writes uberblocks to the last slot in each label, at a set frequency, to indicate to other hosts the pool is actively imported. These uberblocks are the last synced uberblock with an updated timestamp. Property defaults to off. During tryimport, find the "best" uberblock (newest txg and timestamp) repeatedly, checking for change in the found uberblock. Include the results of the activity test in the config returned by tryimport. These results are reported to user in "zpool import". Allow the user to control the period between MMP writes, and the duration of the activity test on import, via a new module parameter zfs_multihost_interval. The period is specified in milliseconds. The activity test duration is calculated from this value, and from the mmp_delay in the "best" uberblock found initially. Add a kstat interface to export statistics about Multiple Modifier Protection (MMP) updates. Include the last synced txg number, the timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV label that received the last MMP update, and the VDEV path. Abbreviated output below. $ cat /proc/spl/kstat/zfs/mypool/multihost 31 0 0x01 10 880 105092382393521 105144180101111 txg timestamp mmp_delay vdev_guid vdev_label vdev_path 20468 261337 250274925 68396651780 3 /dev/sda 20468 261339 252023374 6267402363293 1 /dev/sdc 20468 261340 252000858 6698080955233 1 /dev/sdx 20468 261341 251980635 783892869810 2 /dev/sdy 20468 261342 253385953 8923255792467 3 /dev/sdd 20468 261344 253336622 042125143176 0 /dev/sdab 20468 261345 253310522 1200778101278 2 /dev/sde 20468 261346 253286429 0950576198362 2 /dev/sdt 20468 261347 253261545 96209817917 3 /dev/sds 20468 261349 253238188 8555725937673 3 /dev/sdb Add a new tunable zfs_multihost_history to specify the number of MMP updates to store history for. By default it is set to zero meaning that no MMP statistics are stored. When using ztest to generate activity, for automated tests of the MMP function, some test functions interfere with the test. For example, the pool is exported to run zdb and then imported again. Add a new ztest function, "-M", to alter ztest behavior to prevent this. Add new tests to verify the new functionality. Tests provided by Giuseppe Di Natale. Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Ned Bass <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #745 Closes #6279
* zpool iostat/status -c improvementsGiuseppe Di Natale2017-06-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Users can now provide their own scripts to be run with 'zpool iostat/status -c'. User scripts should be placed in ~/.zpool.d to be included in zpool's default search path. Provide a script which can be used with 'zpool iostat|status -c' that will return the type of device (hdd, sdd, file). Provide a script to get various values from smartctl when using 'zpool iostat/status -c'. Allow users to define the ZPOOL_SCRIPTS_PATH environment variable which can be used to override the default 'zpool iostat/status -c' search path. Allow the ZPOOL_SCRIPTS_ENABLED environment variable to enable or disable 'zpool status/iostat -c' functionality. Use the new smart script to provide the serial command. Install /etc/sudoers.d/zfs file which contains the sudoer rule for smartctl as a sample. Allow 'zpool iostat/status -c' tests to run in tree. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #6121 Closes #6153
* Add zpool events testsBrian Behlendorf2017-05-221-0/+2
| | | | | | | | | | | | | | | | | | * events_001_pos - Verify the expected events are generated when invoking the various zpool sub-commands. These events must appear in `zpool event` and be consumed by the ZED. * events_002_pos - Verify the ZED consumes events which were generated while it wasn't running when it is started. Additionally, verify that events are only processed once. As part of this change the default.cfg used by the test suite was changed to a default.cfg.in file. This was needed so the install location of all zed scripts, not only the enabled ones, could be reliably determined. Signed-off-by: Brian Behlendorf <[email protected]> Closes #6128
* Implemented zpool sync commandAlek P2017-05-191-0/+1
| | | | | | | | | | | This addition will enable us to sync an open TXG to the main pool on demand. The functionality is similar to 'sync(2)' but 'zpool sync' will return when data has hit the main storage instead of potentially just the ZIL as is the case with the 'sync(2)' cmd. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> Closes #6122
* OpenZFS 7503 - zfs-test should tail ::zfs_dbgmsg on test failureBrian Behlendorf2017-04-121-0/+1
| | | | | | | | | | | | | | | | | | | Authored by: Pavel Zakharov <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Approved by: Gordon Ross <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: George Melikov <[email protected]> Ported-by: Brian Behlendorf <[email protected]> Porting Notes: - Enable internal log for DEBUG builds and in zfs-tests.sh. - callbacks/zfs_dbgmsg.ksh - Dump interal log via kstat. - callbacks/zfs_dmesg.ksh - Dump dmesg log. - default.cfg - 'Test Suite Specific Commands' dropped. OpenZFS-issue: https://www.illumos.org/issues/7503 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/55a1300 Closes #6002
* OpenZFS 6865 - want zfs-tests cases for zpool labelclear commandYuri Pankov2017-04-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Authored by: Yuri Pankov <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: John Kennedy <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: loli10K <[email protected]> Ported-by: Brian Behlendorf <[email protected]> Porting Notes: - Updated 'zpool labelclear' and 'zdb -l' such that they attempt to find a vdev given solely its short name. This behavior is consistent with the upstream OpenZFS code and the test cases depend on it. The actual implementation differs slightly due to device naming conventions on Linux. - auto_online_001_pos, auto_replace_001_pos and add-o_ashift test cases updated to expect failure when no label exists. - read_efi_label() and zpool_label_disk_check() are read-only operations and should use O_RDONLY at open time to enforce this. - zpool_label_disk() and zpool_relabel_disk() write the partition information using O_DIRECT an fsync() and page cache invalidation to ensure a consistent view of the device. - dump_label() in zdb should invalidate the page cache in order to get the authoritative label from disk. OpenZFS-issue: https://www.illumos.org/issues/6865 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c95076c Closes #5981
* OpenZFS 7290 - ZFS test suite needs to control what utilities it can runJohn Wren Kennedy2017-04-061-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | Authored by: John Wren Kennedy <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Dan McDonald <[email protected]> Approved by: Gordon Ross <[email protected]> Ported-by: Brian Behlendorf <[email protected]> Ported-by: George Melikov <[email protected]> Porting Notes: - Utilities which aren't available under Linux have been removed. - Because of sudo's default secure path behavior PATH must be explicitly reset at the top of libtest.shlib. This avoids the need for all users to customize secure path on their system. - Updated ZoL infrastructure to manage constrained path - Updated all test cases - Check permissions for usergroup tests - When testing in-tree create links under bin/ - Update fault cleanup such that missing files during cleanup aren't fatal. - Configure su environment with constrained path OpenZFS-issue: https://www.illumos.org/issues/7290 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1d32ba6 Closes #5903
* OpenZFS 7793 - ztest fails assertion in dmu_tx_willuse_spaceBrian Behlendorf2017-03-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reviewed by: Steve Gonczi <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Ported-by: Brian Behlendorf <[email protected]> Background information: This assertion about tx_space_* verifies that we are not dirtying more stuff than we thought we would. We “need” to know how much we will dirty so that we can check if we should fail this transaction with ENOSPC/EDQUOT, in dmu_tx_assign(). While the transaction is open (i.e. between dmu_tx_assign() and dmu_tx_commit() — typically less than a millisecond), we call dbuf_dirty() on the exact blocks that will be modified. Once this happens, the temporary accounting in tx_space_* is unnecessary, because we know exactly what blocks are newly dirtied; we call dnode_willuse_space() to track this more exact accounting. The fundamental problem causing this bug is that dmu_tx_hold_*() relies on the current state in the DMU (e.g. dn_nlevels) to predict how much will be dirtied by this transaction, but this state can change before we actually perform the transaction (i.e. call dbuf_dirty()). This bug will be fixed by removing the assertion that the tx_space_* accounting is perfectly accurate (i.e. we never dirty more than was predicted by dmu_tx_hold_*()). By removing the requirement that this accounting be perfectly accurate, we can also vastly simplify it, e.g. removing most of the logic in dmu_tx_count_*(). The new tx space accounting will be very approximate, and may be more or less than what is actually dirtied. It will still be used to determine if this transaction will put us over quota. Transactions that are marked by dmu_tx_mark_netfree() will be excepted from this check. We won’t make an attempt to determine how much space will be freed by the transaction — this was rarely accurate enough to determine if a transaction should be permitted when we are over quota, which is why dmu_tx_mark_netfree() was introduced in 2014. We also won’t attempt to give “credit” when overwriting existing blocks, if those blocks may be freed. This allows us to remove the do_free_accounting logic in dbuf_dirty(), and associated routines. This logic attempted to predict what will be on disk when this txg syncs, to know if the overwritten block will be freed (i.e. exists, and has no snapshots). OpenZFS-issue: https://www.illumos.org/issues/7793 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3704e0a Upstream bugs: DLPX-32883a Closes #5804 Porting notes: - DNODE_SIZE replaced with DNODE_MIN_SIZE in dmu_tx_count_dnode(), Using the default dnode size would be slightly better. - DEBUG_DMU_TX wrappers and configure option removed. - Resolved _by_dnode() conflicts these changes have not yet been applied to OpenZFS.
* Add auto-online test for ZED/FMA as part of the ZTSSydney Vanda2017-02-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | Automated auto-online test to go along with ZED FMA integration (PR 4673) auto_online_001.pos works with real devices (sd- and mpath) and with non-real block devices (loop) by adding a scsi_debug device to the pool Note: In order for test group to run, ZED must not currently be running. Kernel 3.16.37 or higher needed for scsi_debug to work properly If timeout occurs on test using a scsi_debug device (error noticed on Ubuntu system), a reboot might be needed in order for test to pass. (more investigation into this) Also suppressed output from is_real_device/is_loop_device/is_mpath_device - was making the log file very cluttered with useless error messages "ie /dev/mapper/sdc is not a block device" from previous patch Reviewed-by: Don Brady <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: David Quigley <[email protected]> Signed-off-by: Sydney Vanda <[email protected]> Closes #5774
* OpenZFS 5704 - libzfs can only handle 255 file descriptorsSimon Klinkert2017-02-101-0/+1
| | | | | | | | | | | | | | | Authored by: Simon Klinkert <[email protected]> Reviewed by: Josef 'Jeff' Sipek <[email protected]> Reviewed by: John Kennedy <[email protected]> Approved by: Richard Lowe <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Ned Bass <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Ported-by: George Melikov <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/5704 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/bde3d61 Closes #5767
* Add test for chattrChunwei Chen2016-12-161-0/+1
| | | | Signed-off-by: Chunwei Chen <[email protected]>