summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Fix zil_commit() NULL dereferenceBrian Behlendorf2014-07-175-6/+24
| | | | | | | | | | | | Update the current code to ensure inodes are never dirtied if they are part of a read-only file system or snapshot. If they do somehow get dirtied an attempt will make made to write them to disk. In the case of snapshots, which don't have a ZIL, this will result in a NULL dereference in zil_commit(). Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2405
* zdb: Introduce -V for verbatim importRichard Yao2014-07-172-7/+19
| | | | | | | | | | | | | When given a pool name via -e, zdb would attempt an import. If it failed, then it would attempt a verbatim import. This behavior is not always desirable so a -V switch is added to zdb to control the behavior. When specified, a verbatim import is done. Otherwise, the behavior is as it was previously, except no verbatim import is done on failure. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2372
* Illumos 4168, 4169, 4170: ztest, zdb and zhack fixesGeorge Wilson2014-07-173-29/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 4168 ztest assertion failure in dbuf_undirty 4169 verbatim import causes zdb to segfault 4170 zhack leaves pool in ACTIVE state Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Eric Schrock <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/4168 https://www.illumos.org/issues/4169 https://www.illumos.org/issues/4170 https://github.com/illumos/illumos-gate/commit/7fdd916 Porting notes: Of particular interest when troubleshooting corrupted pools, the commonly-used "zdb -e" operation may perform verbatim imports and furthermore, it will soon have direct support for verbatim imports via a new "-V" option. The 4169 fix eliminates a common segfault case in which spa_history_log_version() tries to access an un-opened dsl_pool_t. Ported by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2451 Closes #2283 Closes #2467
* Convert zfs_mg_noalloc_threshold to a module parameter and documentTim Chase2014-07-162-0/+30
| | | | | | | | | | The parameter was added as illumos issue 4081 which was committed to zfsonlinux in ac72fac3eaa569902cad88053167f7d74e7fe7e4. This patch documents the parameter and allows for it to be set as a module parameter. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2483
* Illumos #3641 compressed block histograms with zdbMatthew Ahrens2014-07-161-24/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is a zdb extension of the '-b' option, producing a histogram of the physical compressed block sizes per DMU object type on disk. The '-bbbb' option to zdb will uncover this new feature; here's an example usage on a new pool and snippet of the output it generates: # zpool create tank /dev/vd{b,c,d} # dd bs=1k if=/dev/urandom of=/tank/1kfile count=1 # dd bs=3k if=/dev/urandom of=/tank/3kfile count=1 # dd bs=64k if=/dev/urandom of=/tank/64kfile count=1 # zdb -bbbb tank ... 3 68.0K 68.0K 68.0K 22.7K 1.00 34.26 ZFS plain file psize (in 512-byte sectors): number of blocks 2: 1 * 3: 0 4: 0 5: 0 6: 1 * 7: 0 ... 127: 0 128: 1 * ... The blocks are also broken down by their indirection level. Expanding on the above example: # zfs set recordsize=1k tank # dd bs=1k if=/dev/urandom of=/tank/2x1kfile count=2 # zdb -bbbb tank ... 1 16K 1K 2K 2K 16.00 1.02 L1 ZFS plain file psize (in 512-byte sectors): number of blocks 2: 1 * 5 70.0K 70.0K 70.0K 14.0K 1.00 35.71 L0 ZFS plain file psize (in 512-byte sectors): number of blocks 2: 3 *** 3: 0 4: 0 5: 0 6: 1 * 7: 0 ... 127: 0 128: 1 * 6 86.0K 71.0K 72.0K 12.0K 1.21 36.73 ZFS plain file psize (in 512-byte sectors): number of blocks 2: 4 **** 3: 0 4: 0 5: 0 6: 1 * 7: 0 ... 127: 0 128: 1 * ... There's now a single 1K L1 block which is the indirect block needed for the '2x1kfile' file just created, as well as two more 1K L0 blocks from the same file. This can be used to get a distribution of the block sizes used within the pool, on a per object type basis. References: https://illumos.org/issues/3641 https://github.com/illumos/illumos-gate/commit/490d05b Ported by: Tim Chase <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Boris Protopopov <[email protected]> Closes #2456
* Preserve asize when last mirror child promoted to top-level vdevAndrew Barnes2014-07-021-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | If the smaller of 2 different sized child vdev's of a mirrored vdev is detached, and the pool has the autoexpand property set to off, as the remaining larger vdev is promoted to a top level vdev it fails to retain the asize of the original top level mirror vdev and therefore partially autoexpands. This partially autoexpanded state leaves the new vdev too large to re-mirror by adding the smaller vdev back in, and the pool fails to utilize the space until next imported. If the autoexpand property is set to on, the child vdev grows in size after it has been promoted to a top level vdev as expected. This commit causes the remaining child mirror to retain the asize of its old parent mirror vdev if the autoexpand property is set to off, this allows the smaller vdev to be re-added if required the vdev can then be told to expand if required by the usual using zpool online -e. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Andrew Barnes <[email protected]> Signed-off-by: George Wilson <[email protected]> Closes #1208
* Fix comment spelling errors.Garrison Jensen2014-07-011-2/+2
| | | | | | Signed-off-by: Garrison Jensen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2402
* Document the optional "device" argument for "zpool split"Tim Chase2014-07-011-3/+5
| | | | | | | | | Most ZFS implementations seemed to have missed this bit of documentation. The additional text is based on FreeBSD's man page. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2416
* Return default value on numeric properties failing the "head check.Tim Chase2014-07-011-1/+3
| | | | | | | | | | | | | | | | | | | | | | Updates 962d52421236fc9cd61d59b4f18cff3276077da9. The referenced fix to get_numeric_property() caused numeric property lookups to consider the type of the parent (head) dataset when checking validity but there are some cases in the caller expects to see the property's default value even when the lookup is invalid. One case in which this is true is change_one() which is part of the renaming infrastructure. It may look up "zoned" on a snapshot of a volume which is not valid but it expects to see the default value of false. There may be other, yet unidentified cases in which zfs_prop_get_int() is used on technically invalid properties but which expect the property's default value to be returned. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Turbo Fredriksson <[email protected]> Closes #2320
* Illumos #4936 fix potential overflow in lz4Dan McDonald2014-07-011-0/+3
| | | | | | | | | | | | | | | | | | | | | | | 4936 lz4 could theoretically overflow a pointer with a certain input Reviewed by: Saso Kiselkov <[email protected]> Reviewed by: Keith Wesolowski <[email protected]> Approved by: Gordon Ross <[email protected]> Ported by: Tim Chase <[email protected]> References: https://illumos.org/issues/4936 https://github.com/illumos/illumos-gate/commit/58d0718 Porting notes: This fixes the widely-reported "20-year-old vulnerability" in LZO/LZ4 implementations which inherited said bug from the reference implementation. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2429
* Comment the lack of real_LZ4_uncompress()Tim Chase2014-07-011-0/+5
| | | | | | | | | | Added several comments regarding the removal of real_LZ4_uncompress() which exists in the reference implementation but has been removed here since it's not used. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Improve differing sector size errorBrian Behlendorf2014-06-271-2/+3
| | | | | | | | | | | | | | | When adding or replacing a vdev with a different sector size the error message should be more useful. In addition to describing the problem provide a hint that the '-o ashift' option can be used to override the optimal default value. Since using a non-optimal value may incur a significant performance penalty we should issue this error. But there a numerous reasons why a administrator may wish to do this anyway. Signed-off-by: Niklas Edmundsson <ZNikke@github> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2421
* Add information about the -o option to zpool replaceTurbo Fredriksson2014-06-272-4/+15
| | | | | | | | | | Users need to be aware that when replacing devices in an existing pool they may need to override automatically detected ashift value. This will all depend on the exact hardware they are using. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2024
* Fix man zpool property feature_guidSenH2014-06-261-0/+1
| | | | | | | | | | | | | | | | | | The property name gets mangled with the explanation due to the property length. Fixed by putting the explanation on the next line. Before: unsupported@feature_Info rmation about unsupported features that are enabled on the pool. See zpool-features(5) for details. After: unsupported@feature_guid Information about unsupported features that are enabled on the pool. See zpool-features(5) for details. Signed-off-by: SenH <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2419
* Tag zfs-0.6.3zfs-0.6.3Brian Behlendorf2014-06-123-1/+5
| | | | | | META file and release log updated. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix zfs.spec.in defaultsBrian Behlendorf2014-06-121-5/+5
| | | | | | | | | | | | | | | Commit 2ee4e7da accidentally introduced two issues which only occur when rebuilding the ZFS source rpm outside the ZFS build system. 1) The _dracutdir, _udevdir, and _udevruledir macros must be checked using the 'undefined' keyword. This was just overlooked in the patch review and does not cause a failure when using 'make pkg' because the values are provided by the make target. 2) The default _udevruledir path included a typo. Signed-off-by: Brian Behlendorf <[email protected]> Issue #2310
* Accept udev and dracut paths specified by ./configureTurbo Fredriksson2014-06-114-9/+59
| | | | | | | | | | | | | | | | | | | | | | There are two common locations where udev and dracut components are commonly installed. When building packages using the 'make rpm|deb' targets check those common locations and pass them to rpmbuild. For non-standard configurations these values can be provided by the the following configure options: --with-udevdir=DIR install udev helpers [default=check] --with-udevruledir=DIR install udev rules [[UDEVDIR/rules.d]] --with-dracutdir=DIR install dracut helpers [default=check] When rebuilding using the source packages the per-distribution default values specified in the spec file will be used. This is the preferred way to build packages for a distribution but the ability to override the defaults is provided as a convenience. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2310 Closes #1680
* Revert "Fix __zio_execute() asynchronous dispatch"Brian Behlendorf2014-06-111-2/+0
| | | | | | | | | | | | | | | | | This reverts commit 91579709fccd3e55a21970742b66c388fb1403db which limited the asynchronous dispatch to kernel space. We want to do this for two reasons: 1) While we have slightly more headroom in user space excessively deep stacks have been observed while running ztest, see #2293. 2) Removing this conditional makes the pipeline behave consistently regardless of if it's executing in kernel space or user space. This way we're more likely to uncover subtle issues with ztest. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2384
* Set LANG to a reasonable default (C)Turbo Fredriksson2014-06-102-3/+3
| | | | | | | | | Set LANG=C before calling 'rpmbuild' to avoid rpmbuild failing on the translated date string in the changelog. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes: zfsonlinux/spl#306
* Document the -X and -T options to 'zpool import'Turbo Fredriksson2014-06-061-2/+48
| | | | | | | | | | | These options have existed for a long time but have historically been undocumented because they are not guaranteed to be safe. They should only be used as a last resort when attempting to recover a damaged pool. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1130
* Expand the description of scan-related and other parameters.Tim Chase2014-06-061-9/+16
| | | | | | | | | | | | | Document that the scan-related parameters are, in fact, applicable only to scrub and/or resilver operations as appropriate. Expand a few of the prefetch-related descriptions. Add clarification to other module parameters. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2361
* Man page updates for 'zfs share'Turbo Fredriksson2014-06-061-1/+9
| | | | | | | | | | * Remove the references to share(1M), unshare(1M) and dfstab(4) since they are not applicable to Linux. * Add the exact exportfs command line used when setting sharenfs=on. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue: #1641
* Document the fact that ashift is vdev specific, not a pool global.Turbo Fredriksson2014-06-061-1/+3
| | | | | | | | | | Users need to be aware that when adding devices to an existing pool they may need to override automatically detected ashift value. This will all depend on the exact hardware they are using. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes: #2024
* Only automatically mount a clone when 'canmount == on'.Turbo Fredriksson2014-06-061-1/+12
| | | | | | | | | | | | | | According to the man page, "When the noauto option is set, a dataset can only be mounted and unmounted explicitly. The dataset is not mounted automatically when the dataset is created or imported ...." When cloning a dataset the canmount property was not being honored. This patch adds the required check to achieve the behavior described in the man page. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2241
* Do not export pool to prevent cache from been removedDerek Dai2014-06-051-7/+0
| | | | | | | Signed-off-by: Derek Dai <[email protected]> Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2353
* Accept kernel source dir(s) specified by ./configureTurbo Fredriksson2014-06-052-13/+32
| | | | | | | | | | This adds ability to set the location of the kernel via defines when building from the spec files. This is useful when building against a kernel installed in a non-standard location. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1874
* Update spec file to enable systemd for RHEL7Ben Allen2014-06-051-0/+5
| | | | | | Signed-off-by: Ben Allen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2355
* Move the libraries into separate packagesTurbo Fredriksson2014-06-022-15/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From day one the various ZFS libraries should have been placed in their own sub-packages. Primarily this allows for multiple major versions of the libraries to be concurrently installed. It also facilitates a smaller build environment by minimizing the required dependencies. The specific changes required to split the libraries from the utilities are as follows: * libzpool2, libnvpair1, libuutil1, and libzfs2 packages were added and contain the versioned shared libraries. The Fedora packaging guidelines discourage providing static libraries so they are not included in the packages. http://fedoraproject.org/wiki/Packaging:Guidelines#Packaging_Static_Libraries * The zfs-devel package was renamed libzfs2-devel and the new package obsoletes the old zfs-devel package. This package includes all the required headers for the libzpool2, libnvpair1, libuutil1, and libzfs2 libraries and their respective unversioned shared libraries. This package should eventually be split in to individual lib*-devel packages but it will still take some work to cleanly separate them. Therefore the libzfs2-devel package provides the expected lib*-devel packages so the all proper dependencies can still be created. http://fedoraproject.org/wiki/Packaging:Guidelines#Devel_Packages * Moved '/sbin/ldconfig' execution from the zfs packge to each of the new library packages as described by the packaging guidelines. http://fedoraproject.org/wiki/Packaging:Guidelines#Shared_Libraries * The /usr/share/doc/ files were moved in to the libzfs2-devel package. * Updated config/deb.am to be aware of the packaging changes. This ensures that 'deb-utils' make target converts all the resulting packages generated by the 'rpm-utils' target. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes: #2329 Closes: #2341 Issue: #2145
* Remove superfluous statementRichard Yao2014-05-301-1/+0
| | | | | | | | | | | | | | Clang's static analyzer reported that the value assigned to pcksum is never used. That is because we initialize both zc and pcksum to {{ 0 }} and then do `pcksum = zc;`. That is fairly pointless. However, it has the effect of generating a false positive in Clang's static analyzer. Since noise from false positives can obscure real issues, we fix it anyway. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2330
* Fix memory leak in zpool_clear_label()Richard Yao2014-05-301-1/+3
| | | | | | | | | | | | | Clang's static analyzer reported a memory leak in zpool_clear_label(). Upon review, it turns out to be right. This should be a very short lived leak because no daemons use this functionality, but that does not preclude the possibility of third party daemons that do use it. Lets fix it to be a good Samaritan. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2330
* Allow building without ACLsChris Wedgwood2014-05-301-15/+15
| | | | | | | | | | Some kernel definitions were buried inside the #if... #endif logic for ACLs. When ACLs are not available these definitions get lost causing the build to fail. Signed-off-by: Chris Wedgwood <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2349
* Fix DKMS package upgrade and packagerBrian Behlendorf2014-05-301-5/+12
| | | | | | | | | | | | | | | | | | | Running 'yum upgrade zfs-dkms' package could appear to work properly and still leave you with no zfs modules installed. This will occur when only the zfs release, and not the version, are incremented. This may be the case for a fast moving zfs-testing repository. During the upgrade process DKMS will realize that zfs-x.y.z is already installed and remove it. DKMS then correctly builds the new modules for zfs-x.y.z. However, as a final step when the old zfs-x.y.z-r is removed the %preun script runs and removes the newly build modules. To handle this case the %preun script has been updated to only run when the installed version exactly matches the full spec file version. This change also updated ChangeLog section based on the DKMS reference spec file. Signed-off-by: Brian Behlendorf <[email protected]>
* Restrict release number to META versionBrian Behlendorf2014-05-291-1/+1
| | | | | | | | | | | When creating packages in a git repository the release number can be automatically set by 'git describe'. This normally works well but if your repository has newer tags which match the form NAME-VERSION* the release may be incorrectly calculated. To prevent this the match patten has been restricted to the contents of the META file, NAME-VERSION. Signed-off-by: Brian Behlendorf <[email protected]>
* Use default slab typesBrian Behlendorf2014-05-222-15/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should not override the default memory type of the kmem cache. This was done previously to force certain objects which were slightly over object size limit cut off in to KMC_KMEM caches for better performance. The zfsonlinux/spl#356 patch slightly increases the default cut off from 511 bytes 1024 bytes for x86_64. This means there is long longer a need to override the default for the caches. And since the default values are now being used the new spl_kmem_cache_slab_limit and spl_kmem_cache_kmem_limit tunables will apply to all kmem caches. The following is a list of caches that will be impacted: | object size | forced type | default type ----------------- | ------------- | ------------- | -------------- dnode_t | 936 bytes | KMC_KMEM | KMC_KMEM zio_cache | 1104 bytes | *KMC_KMEM | *KMC_VMEM zio_link_cache | 48 bytes | KMC_KMEM | KMC_KMEM zio_vdev_cache | 131088 bytes | KMC_VMEM | KMC_VMEM zio_buf_512 | 512 bytes | KMC_KMEM | KMC_KMEM zio_data_buf_512 | 512 bytes | KMC_KMEM | KMC_KMEM zio_buf_1024 | 1024 bytes | KMC_KMEM | KMC_KMEM zio_data_buf_1024 | 1024 bytes | +KMC_VMEM | +KMC_KMEM * Cache memory type will change from KMC_KMEM to KMC_VMEM. + Cache memory type will change from KMC_VMEM to KMC_KMEM. This patch removes another slight point of divergence between ZoL and Illumos. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Closes #2337
* Omit compiler warning by sticking to RAIIMarcel Huber2014-05-221-5/+5
| | | | | | | | | | | Resolve gcc 4.9.0 20140507 warnings about uninitialized 'ptr' when using -Wmaybe-uninitialized. The first two cases appears appear to be legitimate but not the second two. In general this is a good practice so they are all initialized. Signed-off-by: Marcel Huber <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2345
* Added INTEL SSD 530 SeriesJohn Albietz2014-05-191-0/+1
| | | | | | | | INTEL SSD 530 Series... SSDSC2BW24 Signed-off-by: John Albietz <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2184
* Honor zfs_nocacheflush for file vdevsHC2014-05-191-0/+4
| | | | | | | | | | For consistency with disk vdevs honor the zfs_nocacheflush tunable. This setting is available primarily for debugging and performance analysis. Signed-off-by: HC <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2336
* Calculate header size correctly in sa_find_sizes()Tim Chase2014-05-191-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case where a variable-sized SA overlaps the spill block pointer and a new variable-sized SA is being added, the header size was improperly calculated to include the to-be-moved SA. This problem could be reproduced when xattr=sa enabled as follows: ln -s $(perl -e 'print "x" x 120') blah setfattr -n security.selinux -v blahblah -h blah The symlink is large enough to interfere with the spill block pointer and has a typical SA registration as follows (shown in modified "zdb -dddd" <SA attr layout obj> format): [ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ] Adding the SA xattr will attempt to extend the registration to: [ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ZPL_DXATTR ] but since the ZPL_SYMLINK SA interferes with the spill block pointer, it must also be moved to the spill block which will have a registration of: [ ZPL_SYMLINK ZPL_DXATTR ] This commit updates extra_hdrsize when this condition occurs, allowing hdrsize to be subsequently decreased appropriately. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Issue #2214 Issue #2228 Issue #2316 Issue #2343
* Allow for lock-free reading zfsdev_state_list.Tim Chase2014-05-192-18/+52
| | | | | | | | | | | | | | | | | | | | | | | | | Restructure the zfsdev_state_list to allow for lock-free reading by converting to a simple singly-linked list from which items are never deleted and over which only forward iterations are performed. It depends on, among other things, the atomicity of accessing the zs_minor integer and zs_next pointer. This fixes a lock inversion in which the zfsdev_state_lock is used by both the sync task (txg_sync) and indirectly by any user program which uses /dev/zfs; the zfsdev_release method uses the same lock and then blocks on the sync task. The most typical failure scenerio occurs when the sync task is cleaning up a user hold while various concurrent "zfs" commands are in progress. Neither Illumos nor Solaris are affected by this issue because they use DDI interface which provides lock-free reading of device state via the ddi_get_soft_state() function. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2301
* Handle ZPOOL_STATUS_HOSTID_MISMATCH in zpool statusRichard Yao2014-05-191-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Verbatim imports can cause hostid mismatches, but things otherwise work. `zpool status` does not handle this and will fail when assertions are enabled: ``` zpool: ../../cmd/zpool/zpool_main.c:4418: status_callback: Assertion `reason == ZPOOL_STATUS_OK' failed. Program received signal SIGABRT, Aborted. ``` Lets instead add a case to display an informative message such as this: ``` pool: rpool state: ONLINE status: Mismatch between pool hostid and system hostid on imported pool. This pool was previously imported into a system with a different hostid, and then was verbatim imported into this system. action: Export this pool on all systems on which it is imported. Then import it to correct the mismatch. see: http://zfsonlinux.org/msg/ZFS-8000-EY scan: scrub repaired 0 in 0h8m with 0 errors on Thu Apr 17 19:43:57 2014 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 sda ONLINE 0 0 0 errors: No known data errors ``` Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2342
* Use a dedicated taskq for vdev_fileChunwei Chen2014-05-143-1/+24
| | | | | | | | | | | | | | Originally, vdev_file used system_taskq. This would cause a deadlock, especially on system with few CPUs. The reason is that the prefetcher threads, which are on system_taskq, will sometimes be blocked waiting for I/O to finish. If the prefetcher threads consume all the tasks in system_taskq, the I/O cannot be served and thus results in a deadlock. We fix this by creating a dedicated vdev_file_taskq for vdev_file I/O. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2270
* Handle vdev_lookup_top() failure in dva_get_dsize_sync()Brian Behlendorf2014-05-061-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The dva_get_dsize_sync() function incorrectly assumes that the call to vdev_lookup_top() cannot fail. However, the NULL dereference at clearly shows that under certain circumstances it is possible. Note that offset 0x570 (1376) maps as expected to vd->vdev_deflate_ratio. BUG: unable to handle kernel NULL pointer dereference at 00000570 crash> struct -o vdev struct vdev { [0] uint64_t vdev_id; ... ... [1376] uint64_t vdev_deflate_ratio; Given that this can happen this patch add the required error handling. In the case where vdev_lookup_top() fails assume that no deflation will occur for the DVA and use the asize. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Alexey Zhuravlev <[email protected]> Closes #1707 Closes #1987 Closes #1891 Signed-off-by: Brian Behlendorf <[email protected]>
* Check the dataset type more rigorously when fetching properties.Tim Chase2014-05-0610-20/+34
| | | | | | | | | | | | | | | | | | | | | | When fetching property values of snapshots, a check against the head dataset type must be performed. Previously, this additional check was performed only when fetching "version", "normalize", "utf8only" or "case". This caused the ZPL properties "acltype", "exec", "devices", "nbmand", "setuid" and "xattr" to be erroneously displayed with meaningless values for snapshots of volumes. It also did not allow for the display of "volsize" of a snapshot of a volume. This patch adds the headcheck flag paramater to zfs_prop_valid_for_type() and zprop_valid_for_type() to indicate the check is being done against a head dataset's type in order that properties valid only for snapshots are handled correctly. This allows the the head check in get_numeric_property() to be performed when fetching a property for a snapshot. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2265
* Fix styleBrian Behlendorf2014-05-061-1/+2
| | | | | | | A minor style issue was accidentally introduced by aa7d06a. This change resolves that style problem. Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos #4101 finer-grained control of metaslab_debugGeorge Wilson2014-05-062-8/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | Today the metaslab_debug logic performs two tasks: - load all metaslabs on import/open - don't unload metaslabs at the end of spa_sync This change provides knobs for each of these independently. References: https://illumos.org/issues/4101 https://github.com/illumos/illumos-gate/commit/0713e23 Notes: 1) This is a small piece of the metaslab improvement patch from Illumos. It was worth bringing over before the rest, since it's low risk and it can be useful on fragmented pools (e.g. Lustre MDTs). metaslab_debug_unload would give the performance benefit of the old metaslab_debug option without causing unwanted delay during pool import. Ported-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2227
* Treat spill block dbufs as meta dataBrian Behlendorf2014-05-051-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | When the system attributes (SAs) for an object exceed what can can be stored in the bonus area of a dnode a spill block is allocated. These spill blocks are currently considered data blocks. However, they should be accounted for as meta data because they are effectively an extension of the dnode. While this may seem like a minor accounting issue it has broader implications. The key thing to be aware of is that each spill block will hold a reference on its parent dnode. The dnode in turn holds a reference on its dbuf in the dnode object. This means that a single 512 byte data buffer for a spill block can pin over 16k of meta data. This is analogous to the small file situation described in 2b13331 where a relatively small number of data buffer can cause the ARC to exceed the meta limit. However, unlike the small file case a spill block can legitimately be considered meta data. By changing the spill block to meta data they will now be dropped from the cache when the meta limit is reached. This then allows the dnodes and dbufs which the spill block was pinning to be released. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Closes #2294
* Remove SELinux enforcing check from init scriptsBrian Behlendorf2014-05-023-20/+0
| | | | | | | | | | | | | | The default SELinux policy for RHEL and Fedora has been updated to include ZFS in the list of filesystems which support xattrs. Therefore, there's no longer a need to detect this in the init scripts. References: https://bugzilla.redhat.com/show_bug.cgi?id=811532 https://bugzilla.redhat.com/show_bug.cgi?id=816543 Signed-off-by: Brian Behlendorf <[email protected]> Closes #2166
* ztest: Switch to LWP rwlock interfaceRichard Yao2014-05-011-51/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ztest is intended to subject the ZFS code in userland to stress that it should be able to withstand. Any failures that occur when running it are failures that likely would occur inside the kernel. However, being in userland, it is much easier to debug them. In practice, this prevents a large number of problems from reaching production code. A design decision was made by the original authors of ztest to make a distinction between userland locking primitives and kernel locking primitives. The ztest code itself calls userland locking primitives while the kernel code being run in userland will call emulated kernel locking primitives that wrap the userland locking primitives. When ztest was first ported to Linux, a decision was made to use the emulated kernel interfaces everywhere. In effect, the userland rw_rdlock()/rw_wrlock() became the kernel rw_enter() and and the userland rw_unlock() became the kernel rw_exit(). This caused a regression because of an assertion in rw_enter() to catch recursive locking. That is permitted in userland, but not in the kernel. Consequently, the ztest code itself does recursive read locking. The use of the emulated kernel interfaces consequently caused the following failure: ztest: ../../lib/libzpool/kernel.c:384: Assertion `rwlp->rw_owner != zk_thread_current() (0x1c87150 != 0x1c87150)' failed. That occurs because ztest_dmu_objset_create_destroy() will take a read lock and call ztest_dmu_object_alloc_free(). That will call ztest_io(), which will take a readlock only when asked to do ZTEST_IO_REWRITE. This triggered the assertion. The pthreads rwlock interface was based on the LWP rwlock interface implemented in Illumos libc. Luckily enough, the subset used by ztest is almost identical, so we can solve this problem by switching to the LWP thread rwlock interface in ztest. This eliminates a point of divergence with Illumos and should make code sharing slightly easier. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1970
* libspl: Implement LWP rwlock interfaceRichard Yao2014-05-012-0/+52
| | | | | | | | | | | | This implements a subset of the LWP rwlock interface by wrapping the equivalent POSIX thread interface. It is a superset of the features needed by ztest. The missing bits are {,_}rw_read_held() and {,_}rw_write_held(). Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #1970
* Fix libblkid ZFS detection when making new poolsRichard Yao2014-05-011-1/+1
| | | | | | | | | zfsonlinux/zfs@1db7b9be75a225cedb3b7a60028ca5695e5b8346 should have fixed this, but this particular string was overlooked. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2288