summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Document guidelines for usage of zfs_dbgmsgSerapheim Dimitropoulos2019-01-181-1/+17
| | | | | | | | | Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8299
* dkms: Enable debuginfo option to be set with zfs sysconfig fileNeal Gompa (ニール・ゴンパ)2019-01-181-0/+4
| | | | | | | | | | | | | | | | On some Linux distributions, the kernel module build will not default to building with debuginfo symbols, which can make it difficult for debugging and testing. For this case, we provide a flag to override the build to force debuginfo to be produced for the kernel module build. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Neal Gompa <[email protected]> Co-authored-by: Simon Watson <[email protected]> Signed-off-by: Neal Gompa <[email protected]> Signed-off-by: Simon Watson <[email protected]> Closes #8304
* Off-by-one in zap_leaf_array_create()loli10K2019-01-183-16/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Trying to set user properties with their length 1 byte shorter than the maximum size triggers an assertion failure in zap_leaf_array_create(): panic[cpu0]/thread=ffffff000a092c40: assertion failed: num_integers * integer_size < (8<<10) (0x2000 < 0x2000), file: ../../common/fs/zfs/zap_leaf.c, line: 233 ffffff000a092500 genunix:process_type+167c35 () ffffff000a0925a0 zfs:zap_leaf_array_create+1d2 () ffffff000a092650 zfs:zap_entry_create+1be () ffffff000a092720 zfs:fzap_update+ed () ffffff000a0927d0 zfs:zap_update+1a5 () ffffff000a0928d0 zfs:dsl_prop_set_sync_impl+5c6 () ffffff000a092970 zfs:dsl_props_set_sync_impl+fc () ffffff000a0929b0 zfs:dsl_props_set_sync+79 () ffffff000a0929f0 zfs:dsl_sync_task_sync+10a () ffffff000a092a80 zfs:dsl_pool_sync+3a3 () ffffff000a092b50 zfs:spa_sync+4e6 () ffffff000a092c20 zfs:txg_sync_thread+297 () ffffff000a092c30 unix:thread_start+8 () This patch simply corrects the assertion. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8278
* Simplify spa_sync by breaking it up to smaller functionsSerapheim Dimitropoulos2019-01-182-169/+200
| | | | | | | | | | | | The point of this refactoring is to break the high-level conceptual steps of spa_sync() to their own helper functions. In general large functions can enhance readability if structured well, but in this case the amount of conceptual steps taken could use the help of helper functions. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8293
* ztest: scrub verificationBrian Behlendorf2019-01-181-6/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | By design ztest will never inject non-repairable damage in to the pool. Update the ztest_scrub() test case such that it waits for the scrub to complete and verifies the pool is always repairable. After enabling scrub verification two scenarios were encountered which are the result of how ztest manages failure injection. The first case is straight forward and pertains to detaching a mirror vdev. In this case, the pool must always be scrubbed prior the detach. Failure to do so can potentially lock in previously repairable data corruption by removing all good copies of a block leaving only damaged ones. The second is a little more subtle. The child/offset selection logic in ztest_fault_inject() depends on the calculated number of leaves always remaining constant between injection passes. This is true within a single execution of ztest, but when using zloop.sh random values are selected for each restart. Therefore, when ztest imports an existing pool it must be scrubbed before failure injection can be safely enabled. Otherwise it is possible that it will inject non-repairable damage. Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8269
* Fix error handling incallers of dbuf_hold_level()Tom Caputi2019-01-176-14/+37
| | | | | | | | | | | | Currently, the functions dbuf_prefetch_indirect_done() and dmu_assign_arcbuf_by_dnode() assume that dbuf_hold_level() cannot fail. In the event of an error the former will cause a NULL pointer dereference and the later will trigger a VERIFY. This patch adds error handling to these functions and their callers where necessary. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8291
* Remove unused vdev_t fieldsSerapheim Dimitropoulos2019-01-172-13/+0
| | | | | | | | | The following fields from the vdev_t struct are not used anywhere. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8285
* ztest: scrub ddt repairBrian Behlendorf2019-01-172-17/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ztest_ddt_repair() test is designed inflict damage to the ddt which can be repairable by a scrub. Unfortunately, this repair logic was broken at some point and it went undetected. This issue is not specific to ztest, but thankfully this extra redundancy is rarely enabled and even more rarely needed. The root cause was identified to be the ddt_bp_create() function called by dsl_scan_ddt_entry() which did not set the dedup bit of the generated block pointer. The consequence of this was that the ZIO_DDT_READ_PIPELINE was never enabled for the block pointer during the scrub, and the dedup ditto repair logic was never run. Note that for demand reads which don't rely on ddt_bp_create() the required pipeline stages would be enabled and the repair performed. This was resolved by unconditionally setting the dedup bit in ddt_bp_create(). This way all codes paths which may need to perform a repair from a block pointer generated from the dtt entry will be able too. The only exception is that the dedup bit is cleared in ddt_phys_free() which is required to avoid leaking space. Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8270
* Update vdev_is_spacemap_addressable() for new spacemap encodingSerapheim Dimitropoulos2019-01-162-18/+40
| | | | | | | | | | | | | | | | Since the new spacemap encoding was ported to ZoL that's no longer a limitation. This patch updates vdev_is_spacemap_addressable() that was performing that check. It also updates the appropriate test to ensure that the same functionality is tested. The test does so by creating pools that don't have the new spacemap encoding enabled - just the checkpoint feature. This patch also reorganizes that same tests in order to cut in half its memory consumption. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8286
* ztest: split block reconstructionBrian Behlendorf2019-01-165-8/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Increase the default allowed number of reconstruction attempts. There's not an exact right number for this setting. It needs to be set large enough to cover any realistic failure scenarios and small enough to avoid stalling the IO pipeline and invoking the dead man detection. The current value of 256 was empirically determined to be too low based on multi-day runs of ztest. The fault injection code would inject more damage than could be reconstructed given the relatively small number of attempts. However, in all observed cases the block could be reconstructed using a slightly higher limit. Based on local testing increasing the default value to 4096 was determined to strike the best balance. Checking all combinations takes less than 10s in the worst case, and has so far eliminated the vast majority of false positives detected by ztest. This delay is roughly on par with how long retries may be performed to a misbehaving HDD and was deemed to be reasonable. Better to err on the side of a brief delay rather than fail to reconstruct the data. Lastly, the -Y flag has been added to zdb to make it easy to try all possible combinations when performing split block reconstruction. For badly damaged blocks with 18 splits, they can be fully enumerated within a few minutes. This has been done to ensure permanent errors are never incorrectly reported when ztest verifies the pool with zdb. Reviewed by: Tom Caputi <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8271
* Make zdb results for checkpoint tests consistentSerapheim Dimitropoulos2019-01-161-2/+13
| | | | | | | | | | This patch exports and re-imports the pool when these tests are analyzed with zdb to get consistent results. Reviewed by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8292
* Disable 'zfs remap' commandBrian Behlendorf2019-01-1510-39/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The implementation of 'zfs remap' has proven to be problematic since it modifies the objset (but not its logical contents) by dirtying metadata without owning it. The consequence of which is that dmu_objset_remap_indirects() is vulnerable to certain races. For example, if we are in the middle of receiving into the filesystem while it is being remapped. Then it is possible we could evict the objset when the receive completes (see dsl_dataset_clone_swap_sync_impl, or dmu_recv_end_sync), but dmu_objset_remap_indirects() may be still using the objset. The result of which would be a panic. Extended runs of ztest(8) have exposed other possible races which can occur when using 'zfs remap'. Several of these have been fixed but there may be others which have not yet been encountered and diagnosed. Furthermore, the ability to manually remap a filesystem is no longer particularly useful now that the removal code can map large chunks. Coupled with the fact that explaining what this command does and why it may be useful requires a detailed understanding of the internals of device removal. These are details users should not be bothered with. Therefore, the 'zfs remap' command is being disabled but not entirely removed. It may be removed in the future or potentially reworked to address the issues described above. Since 'zfs remap' has never been part of a tagged release its removal is expected to have minimal impact. The ZTS tests have been updated to continue to exercise the command to prevent atrophy, but it has been removed entirely from ztest(8). Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8238
* Fix zio leak in dbuf_read()Tom Caputi2019-01-151-2/+11
| | | | | | | | | | | | | Currently, dbuf_read() may decide to create a zio_root which is used as a parent for any child zios created in dbuf_read_impl(). However, if there is an error in dbuf_read_impl(), this zio is never executed and ends up leaked. This patch simply ensures that we always execute the root zio, even i it has no real work to do. Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8267
* Verify .gitignore entriesloli10K2019-01-152-1/+8
| | | | | | | | | | | | This change adds a make target 'vcscheck' which scans the git workspace for new, untracked files missing from the .gitignore configuration; this is done to help prevent adding unwanted build artifacts to the source tree during development. Reviewed-by: Neal Gompa <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8281
* Tag 0.8.0-rc3zfs-0.8.0-rc3Brian Behlendorf2019-01-141-2/+2
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* Minor spelling correctionsBrian Behlendorf2019-01-132-8/+8
| | | | | | | | | | Some minor spelling mistakes and typos. No functional changes. Reviewed-by: Neal Gompa <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: bunder2015 <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8272
* Serialize ZTHR operations to eliminate racesSerapheim Dimitropoulos2019-01-137-130/+192
| | | | | | | | | | | Adds a new lock for serializing operations on zthrs. The commit also includes some code cleanup and refactoring. Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes #8229
* zfs filesystem skipped by df -hPaul Zuchowski2019-01-134-2/+79
| | | | | | | | | | On full pool when pool root filesystem references very few bytes, the f_blocks returned to statvfs is 0 but should be at least 1. Reviewed by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Zuchowski <[email protected]> Closes #8253 Closes #8254
* Add contrib/pyzfs/setup.py to .gitignoreTom Caputi2019-01-131-0/+1
| | | | | | | | | | As of 9ef798b77, setup.py is now generated from setup.py.in, but this file was never moved to the .gitignore. This patch simply corrects this issue. Reviewed-by: Neal Gompa <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8268
* Add pyzfs BuildRequires for mock(1)Brian Behlendorf2019-01-131-9/+14
| | | | | | | | When building pyzfs under mock the python-cffi and python-setuptools packages need to be installed and have been added to the BuildRequires. Reviewed-by: Neal Gompa <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8265
* ZTS: zpool_resilver_restartBrian Behlendorf2019-01-132-3/+3
| | | | | | | | | | Since the vdev initialize feature was integrated the ZTS zpool_resilver_restart test has been hitting its internal timeout more frequently. This happens most often on the coverage builder but not exclusively. Increasing the timeout for this test case prevents any false positives. Signed-off-by: Brian Behlendorf <[email protected]> Closes #8273
* Provide more flexible object allocation interfaceBrian Behlendorf2019-01-106-56/+149
| | | | | | | | | | | | | | | | | | | | | Object allocation performance can be improved for complex operations by providing an interface which returns the newly allocated dnode. This allows the caller to immediately use the dnode without incurring the expense of looking up the dnode by object number. The functions dmu_object_alloc_hold(), zap_create_hold(), and dmu_bonus_hold_by_dnode() were added for this purpose. The zap_create_* functions have been updated to take advantage of this new functionality. The dmu_bonus_hold_impl() function should really have never been included in sys/dmu.h and was removed. It's sole caller was converted to use dmu_bonus_hold_by_dnode(). The new symbols have been exported for use by Lustre. Reviewed-by: Tony Hutter <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8015
* Don't allow dnode allocation if dn_holds != 0Tom Caputi2019-01-101-0/+1
| | | | | | | | | | This patch simply fixes a small bug where dnode_hold_impl() could attempt to allocate a dnode that was in the process of being freed, but which still had active references. This patch simply adds the required check. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8249
* Removed suggestion to use root dataset as bootfsGregor Kopka2019-01-081-6/+0
| | | | | | | | | | | | | | | | | The dracut howto proposed to boot from the root dataset of a pool. Apart from this giving problems when booting (as the code seems to expect a child dataset and creates an illegal dataset name when using the root dataset) the technical limitations of the root dataset (among others the inability to rename or destroy through the `zfs` command) resulted in the general consensus to only use it as a container for the datasets in the pool - not as a filesystem itself. Removed the idea to boot from the root dataset. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: bunder2015 <[email protected]> Signed-off-by: Gregor Kopka <[email protected]> Closes #8247
* Use ZFS version for pyzfs & drop unused reqs fileNeal Gompa (ニール・ゴンパ)2019-01-084-3/+3
| | | | | | | | | | | | | | | | | Now that 'pyzfs' is part of the ZFS codebase, it should be versioned the same as the rest of the source tree. This eliminates confusion on what version of the bindings are being used, especially for dependent Python projects that may use the Python dist metadata to identify compatible versions of pyzfs to work from. In addition, a trivial change to drop the unused requirements.txt file is included, simply because it's unused and a leftover from before it was imported into the ZFS codebase and wired into the autotools build scripts. Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Neal Gompa <[email protected]> Closes #8243
* zfs receive and rollback can skew filesystem_countloli10K2019-01-0811-8/+464
| | | | | | | | | | | | | | This commit fixes a small issue which causes both zfs receive and rollback operations to incorrectly increase the "filesystem_count" property value. This change also adds a new test group "limits" to the ZFS Test Suite to exercise both filesystem_count/limit and snapshot_count/limit functionality. Reviewed by: Jerry Jelinek <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #8232
* OpenZFS 8473 - scrub does not detect errors on active sparesasomers2019-01-081-8/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scrubbing is supposed to detect and repair all errors in the pool. However, it wrongly ignores active spare devices. The problem can easily be reproduced in OpenZFS at git rev 0ef125d with these commands: truncate -s 64m /tmp/a /tmp/b /tmp/c sudo zpool create testpool mirror /tmp/a /tmp/b spare /tmp/c sudo zpool replace testpool /tmp/a /tmp/c /bin/dd if=/dev/zero bs=1024k count=63 oseek=1 conv=notrunc of=/tmp/c sync sudo zpool scrub testpool zpool status testpool # Will show 0 errors, which is wrong sudo zpool offline testpool /tmp/a sudo zpool scrub testpool zpool status testpool # Will show errors on /tmp/c, # which should've already been fixed FreeBSD head is partially affected: the first scrub will detect some errors, but the second scrub will detect more. This same test was run on Linux before applying the fix and the FreeBSD head behavior was observed. Authored by: asomers <[email protected]> Reviewed by: Andy Stormont <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Approved by: Richard Lowe <[email protected]> Ported-by: Brian Behlendorf <[email protected]> Sponsored by: Spectra Logic Corp OpenZFS-issue: https://www.illumos.org/issues/8473 FreeBSD-commit: https://github.com/freebsd/freebsd/commit/e20ec8879 OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/554675ee Closes #8251
* Include third party licenses in dist tarballsNeal Gompa (ニール・ゴンパ)2019-01-082-0/+10
| | | | | | | | | | | | | | | | Since the merge of the Linux Solaris Porting Layer source tree into the ZFS codebase, ZFS is now a double-licensed codebase, with the former SPL codebase retaining its license (GPLv2+) within the ZFS source tree. However, the license files for SPL were not being included in the tarballs generated by autotools. This change corrects that. In addition, all the other third party licenses in the codebase are now properly declared to be included in the dist tarballs. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Neal Gompa <[email protected]> Closes #8242
* Fix missing dkms modules after upgradesTony Hutter2019-01-081-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you were upgrading from say, fc28->fc29, on ZFS version X, the RPMs macros would get called like this: %post X.fc29 - This is the step where fc29 gets built by dkms. As part of the build, dkms automatically removes the previous modules before building the new ones. It then builds the new modules. %preun X.fc28 - Right before this step, X.fc29 is be built and installed, but since it has the same X, it's files get inadvertently removed by fc28's uninstall. %postun X.fc28 This patch updates %preun X.fc28 to see if we're upgrading or uninstalling. If we're uninstalling, then remove our files. If we're upgrading then do nothing, since will know dkms will have already removed our files in %post X.fc29. Note that since this fixes the %preun step, it's effect isn't going to be noticed immediately. It will only be seen when packages with this fix are upgraded to a newer version. Reviewed-by: Ralf Ertzinger <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #6902 Closes #8216
* Bump commit subject length to 72 charactersNeal Gompa (ニール・ゴンパ)2019-01-082-4/+4
| | | | | | | | | | | | | There's not really a reason to keep the subject length so short, since the reason to make it this short was for making nice renders of a summary list of the git log. With 72 characters, this still works out fine, so let's just raise it to that so that it's easier to give slightly more descriptive change summaries. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Neal Gompa <[email protected]> Closes #8250
* zfs.8 uses wrong snapshot names in Example 15Benjamin Gentil2019-01-071-4/+4
| | | | | | | Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: bunder2015 <[email protected]> Signed-off-by: Benjamin Gentil <[email protected]> Closes #8241
* Add 'zpool status -i' optionBrian Behlendorf2019-01-073-43/+64
| | | | | | | | | | | | | | | | | | | | | Only display the full details of the vdev initialization state in 'zpool status' output when requested with the -i option. By default display '(initializing)' after vdevs when they are being actively initialized. This is consistent with the established precident of appending '(resilvering), etc' and fits within the default 80 column terminal width making it easy to read. Additionally, updated the 'zpool initialize' documentation to make it clear the options are mutually exclusive, but allow duplicate options like all other zfs/zpool commands. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Tim Chase <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8230
* zfs initialize performance enhancementsGeorge Wilson2019-01-0710-79/+175
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM ======== When invoking "zpool initialize" on a pool the command will create a thread to initialize each disk. Unfortunately, it does this serially across many transaction groups which can result in commands taking a long time to return to the user and may appear hung. The same thing is true when trying to suspend/cancel the operation. SOLUTION ========= This change refactors the way we invoke the initialize interface to ensure we can start or stop the intialization in just a few transaction groups. When stopping or cancelling a vdev initialization perform it in two phases. First signal each vdev initialization thread that it should exit, then after all threads have been signaled wait for them to exit. On a pool with 40 leaf vdevs this reduces the vdev initialize stop/cancel time from ~10 minutes to under a second. The reason for this is spa_vdev_initialize() no longer needs to wait on multiple full TXGs per leaf vdev being stopped. This commit additionally adds some missing checks for the passed "initialize_vdevs" input nvlist. The contents of the user provided input "initialize_vdevs" nvlist must be validated to ensure all values are uint64s. This is done in zfs_ioc_pool_initialize() in order to keep all of these checks in a single location. Updated the innvl and outnvl comments to match the formatting used for all other new sytle ioctls. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: George Wilson <[email protected]> Closes #8230
* OpenZFS 9102 - zfs should be able to initialize storage devicesGeorge Wilson2019-01-0754-30/+2743
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM ======== The first access to a block incurs a performance penalty on some platforms (e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are "thick provisioned", where supported by the platform (VMware). This can create a large delay in getting a new virtual machines up and running (or adding storage to an existing Engine). If the thick provision step is omitted, write performance will be suboptimal until all blocks on the LUN have been written. SOLUTION ========= This feature introduces a way to 'initialize' the disks at install or in the background to make sure we don't incur this first read penalty. When an entire LUN is added to ZFS, we make all space available immediately, and allow ZFS to find unallocated space and zero it out. This works with concurrent writes to arbitrary offsets, ensuring that we don't zero out something that has been (or is in the middle of being) written. This scheme can also be applied to existing pools (affecting only free regions on the vdev). Detailed design: - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...] - start, suspend, or cancel initialization - Creates new open-context thread for each vdev - Thread iterates through all metaslabs in this vdev - Each metaslab: - select a metaslab - load the metaslab - mark the metaslab as being zeroed - walk all free ranges within that metaslab and translate them to ranges on the leaf vdev - issue a "zeroing" I/O on the leaf vdev that corresponds to a free range on the metaslab we're working on - continue until all free ranges for this metaslab have been "zeroed" - reset/unmark the metaslab being zeroed - if more metaslabs exist, then repeat above tasks. - if no more metaslabs, then we're done. - progress for the initialization is stored on-disk in the vdev’s leaf zap object. The following information is stored: - the last offset that has been initialized - the state of the initialization process (i.e. active, suspended, or canceled) - the start time for the initialization - progress is reported via the zpool status command and shows information for each of the vdevs that are initializing Porting notes: - Added zfs_initialize_value module parameter to set the pattern written by "zpool initialize". - Added zfs_vdev_{initializing,removal}_{min,max}_active module options. Authored by: George Wilson <[email protected]> Reviewed by: John Wren Kennedy <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: loli10K <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Richard Lowe <[email protected]> Signed-off-by: Tim Chase <[email protected]> Ported-by: Tim Chase <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9102 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb Closes #8230
* Python 2 and 3 compatibilityBrian Behlendorf2019-01-0645-1395/+1594
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | With Python 2 (slowly) approaching EOL and its removal from distribitions already being planned (Fedora), the existing Python 2 code needs to be transitioned to Python 3. This patch stack updates the Python code to be compatible with Python 2.7, 3.4, 3.5, 3.6, and 3.7. Reviewed-by: John Ramsden <[email protected]> Reviewed-by: Neal Gompa <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: John Wren Kennedy <[email protected]> Reviewed-by: Antonio Russo <[email protected]> Closes #8096
| * test-runner: python3 supportJohn Wren Kennedy2019-01-061-95/+100
| | | | | | | | | | | | | | | | | | | | | | Updated to be compatible with Python 2.6, 2.7, 3.5 or newer. Reviewed-by: John Ramsden <[email protected]> Reviewed-by: Neal Gompa <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: John Wren Kennedy <[email protected]> Closes #8096
| * arc_summary: consolidate test caseBrian Behlendorf2019-01-067-72/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we're only installing one version of arc_summary we only need one test case. Update the test to determine which version is available and then test its supported flags. Remove files for misc tests which should have been cleaned up. Reviewed-by: John Ramsden <[email protected]> Reviewed-by: Neal Gompa <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8096
| * pyzfs: python3 support (build system)Brian Behlendorf2019-01-0629-162/+339
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Almost all of the Python code in the respository has been updated to be compatibile with Python 2.6, Python 3.4, or newer. The only exceptions are arc_summery3.py which requires Python 3, and pyzfs which requires at least Python 2.7. This allows us to maintain a single version of the code and support most default versions of python. This change does the following: * Sets the default shebang for all Python scripts to python3. If only Python 2 is available, then at install time scripts which are compatible with Python 2 will have their shebangs replaced with /usr/bin/python. This is done for compatibility until Python 2 goes end of life. Since only the installed versions are changed this means Python 3 must be installed on the system for test-runner when testing in-tree. * Added --with-python=<2|3|3.4,etc> configure option which sets the PYTHON environment variable to target a specific python version. By default the newest installed version of Python will be used or the preferred distribution version when creating pacakges. * Fixed --enable-pyzfs configure checks so they are run when --enable-pyzfs=check and --enable-pyzfs=yes. * Enabled pyzfs for Python 3.4 and newer, which is now supported. * Renamed pyzfs package to python<VERSION>-pyzfs and updated to install in the appropriate site location. For example, when building with --with-python=3.4 a python34-pyzfs will be created which installs in /usr/lib/python3.4/site-packages/. * Renamed the following python scripts according to the Fedora guidance for packaging utilities in /bin - dbufstat.py -> dbufstat - arcstat.py -> arcstat - arc_summary.py -> arc_summary - arc_summary3.py -> arc_summary3 * Updated python-cffi package name. On CentOS 6, CentOS 7, and Amazon Linux it's called python-cffi, not python2-cffi. For Python3 it's called python3-cffi or python3x-cffi. * Install one version of arc_summary. Depending on the version of Python available install either arc_summary2 or arc_summary3 as arc_summary. The user output is only slightly different. Reviewed-by: John Ramsden <[email protected]> Reviewed-by: Neal Gompa <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8096
| * pyzfs: python3 support (unit tests)Brian Behlendorf2019-01-062-944/+978
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Updated unit tests to be compatbile with python 2 or 3. In most cases all that was required was to add the 'b' prefix to existing strings to convert them to type bytes for python 3 compatibility. * There were several places where the python version need to be checked to remain compatible with pythong 2 and 3. Some one more seasoned with Python may be able to find a way to rewrite these statements in a compatible fashion. Reviewed-by: John Ramsden <[email protected]> Reviewed-by: Neal Gompa <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: John Wren Kennedy <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8096
| * pyzfs: python3 support (library 2/2)Brian Behlendorf2019-01-064-23/+23
| | | | | | | | | | | | | | | | | | | | | | * All pool, dataset, and nvlist keys must be of type bytes. Reviewed-by: John Ramsden <[email protected]> Reviewed-by: Neal Gompa <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: John Kennedy <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8096
| * pyzfs: python3 support (library 1/2)Antonio Russo2019-01-0613-122/+141
|/ | | | | | | | | | | | | | | | | | These changes are efficient and valid in python 2 and 3. For the most part, they are also pythonic. * 2to3 conversion * add __future__ imports * iterator changes * integer division * relative import fixes Reviewed-by: John Ramsden <[email protected]> Reviewed-by: Neal Gompa <[email protected]> Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #8096
* Add zfs module feature and property compatibilityBrian Behlendorf2019-01-031-6/+30
| | | | | | | | | This change is required to ease the transition when upgrading from 0.7.x to 0.8.x. It allows 0.8.x user space utilities to remain compatible with 0.7.x and older kernel modules. Reviewed-by: Don Brady <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8231
* Add missing MMP status code to libzfs_statusbunder20152019-01-033-21/+39
| | | | | | | | | | | | When MMP was merged the status codes in libzfs_status were not updated to add the status code for ZPOOL_STATUS_IO_FAILURE_MMP. This commit corrects this and adds comments to help keep track of which code is used for which status. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: bunder2015 <[email protected]> Closes #8148 Closes #8222
* Fix 'zpool remap' freeing raceBrian Behlendorf2019-01-022-10/+32
| | | | | | | | | | | | | | | | | | | | | | | | The dmu_objset_remap_indirects_impl() logic depends on dnode_hold() returning ENOENT for dnodes which will be freed and should be skipped. This behavior can only be relied upon when taking a new hold and while the caller has an open transaction. This ensures that the open txg cannot advance and that a concurrent free will end up in the same txg (which is critical). Relying on an existing hold will not prevent dnode_free() from succeeding. The solution is to take an additional dnode_hold() after assigning the transaction. This ensures the remap will never dirty the dnode if it was freed while we were waiting in dmu_tx_assign(, TXG_WAIT). Randomly set zfs_object_remap_one_indirect_delay_ms in ztest. This increases the likelihood of an operation racing with the remap. Converted from ticks to milliseconds. Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8215
* Minor tweaks to zfs.8 man page for POSIX ACLsRichard Laager2018-12-261-6/+6
| | | | | | | | | | | | | | | | * Capitalize POSIX in POSIX ACLs. This change makes the POSIX in POSIX ACLs all caps, which is both correct and consistent with the rest of the man page. * Slightly reword part of zfs.8. I tweaked a sentence to add a missing comma, and as long as I was editing, removed a couple unnecessary words. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: bunder2015 <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #8220
* OpenZFS 9284 - arc_reclaim_thread has 2 jobsBrad Lewis2018-12-267-172/+285
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following the fix for 9018 (Replace kmem_cache_reap_now() with kmem_cache_reap_soon), the arc_reclaim_thread() no longer blocks while reaping. However, the code is still confusing and error-prone, because this thread has two responsibilities. We should instead separate this into two threads each with their own responsibility: 1. keep `arc_size` under `arc_c`, by calling `arc_adjust()`, which improves `arc_is_overflowing()` 2. keep enough free memory in the system, by calling `arc_kmem_reap_now()` plus `arc_shrink()`, which improves `arc_available_memory()`. Furthermore, we can use the zthr infrastructure to separate the "should we do something" from "do it" parts of the logic, and normalize the start up / shut down of the threads. Authored by: Brad Lewis <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Tim Kordas <[email protected]> Reviewed by: Tim Chase <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Ported-by: Brad Lewis <[email protected]> Signed-off-by: Brad Lewis <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9284 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/de753e34f9 Closes #8165
* Fix zfs_dirty_data_sync_percent documentationTom Caputi2018-12-181-2/+3
| | | | | | | | | | | In dfbe2675 zfs_dirty_data_sync was changed to a new tunable named zfs_dirty_data_sync_percent. Unfortunately, the module parameter documentation is the code was not updated accordingly. This patch simply corrects that. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8212
* Add enclosure_symlinks option to vdev_idTony Hutter2018-12-144-3/+112
| | | | | | | | | | | | | | | | Add an 'enclosure_symlinks' option to vdev_id.conf. This creates consistently named symlinks to the enclosure devices (/dev/sg*) based off the configuration in vdev_id.conf. The enclosure symlinks show up in /dev/by-enclosure/<prefix>-<channel><num>. The links make it make it easy to run sg_ses on a particular enclosure device. The enclosure links are created in addition to the normal /dev/disk/by-vdev links. 'enclosure_symlinks' is only valid in sas_direct configurations. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Simon Guest <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #8194
* ZTS: fix wait_scrubbed()Tom Caputi2018-12-142-10/+5
| | | | | | | | | | | | Currently, wait_scrubbed() is the only function of its kind that accepts a timeout, which is 10s by default. This timeout is pretty short for a scrub and causes test failures if we run too long. This patch removes the timeout, instead leaning on the global test suite timeout to ensure the tests keep moving. Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8210
* Fix zap_update() ASSERT from ztestTom Caputi2018-12-141-12/+0
| | | | | | | | | | | This patch simply removes an invalid assert from the zap_update() function. The ASSERT is invalid because it does not hold the zap lock from the time it fetches the old value to the time it confirms that it is what it should be. Reviewed by: Matt Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #8209