summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Remove superfluous statementRichard Yao2014-05-301-1/+0
| | | | | | | | | | | | | | Clang's static analyzer reported that the value assigned to pcksum is never used. That is because we initialize both zc and pcksum to {{ 0 }} and then do `pcksum = zc;`. That is fairly pointless. However, it has the effect of generating a false positive in Clang's static analyzer. Since noise from false positives can obscure real issues, we fix it anyway. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2330
* Fix memory leak in zpool_clear_label()Richard Yao2014-05-301-1/+3
| | | | | | | | | | | | | Clang's static analyzer reported a memory leak in zpool_clear_label(). Upon review, it turns out to be right. This should be a very short lived leak because no daemons use this functionality, but that does not preclude the possibility of third party daemons that do use it. Lets fix it to be a good Samaritan. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2330
* Allow building without ACLsChris Wedgwood2014-05-301-15/+15
| | | | | | | | | | Some kernel definitions were buried inside the #if... #endif logic for ACLs. When ACLs are not available these definitions get lost causing the build to fail. Signed-off-by: Chris Wedgwood <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2349
* Fix DKMS package upgrade and packagerBrian Behlendorf2014-05-301-5/+12
| | | | | | | | | | | | | | | | | | | Running 'yum upgrade zfs-dkms' package could appear to work properly and still leave you with no zfs modules installed. This will occur when only the zfs release, and not the version, are incremented. This may be the case for a fast moving zfs-testing repository. During the upgrade process DKMS will realize that zfs-x.y.z is already installed and remove it. DKMS then correctly builds the new modules for zfs-x.y.z. However, as a final step when the old zfs-x.y.z-r is removed the %preun script runs and removes the newly build modules. To handle this case the %preun script has been updated to only run when the installed version exactly matches the full spec file version. This change also updated ChangeLog section based on the DKMS reference spec file. Signed-off-by: Brian Behlendorf <[email protected]>
* Restrict release number to META versionBrian Behlendorf2014-05-291-1/+1
| | | | | | | | | | | When creating packages in a git repository the release number can be automatically set by 'git describe'. This normally works well but if your repository has newer tags which match the form NAME-VERSION* the release may be incorrectly calculated. To prevent this the match patten has been restricted to the contents of the META file, NAME-VERSION. Signed-off-by: Brian Behlendorf <[email protected]>
* Use default slab typesBrian Behlendorf2014-05-222-15/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should not override the default memory type of the kmem cache. This was done previously to force certain objects which were slightly over object size limit cut off in to KMC_KMEM caches for better performance. The zfsonlinux/spl#356 patch slightly increases the default cut off from 511 bytes 1024 bytes for x86_64. This means there is long longer a need to override the default for the caches. And since the default values are now being used the new spl_kmem_cache_slab_limit and spl_kmem_cache_kmem_limit tunables will apply to all kmem caches. The following is a list of caches that will be impacted: | object size | forced type | default type ----------------- | ------------- | ------------- | -------------- dnode_t | 936 bytes | KMC_KMEM | KMC_KMEM zio_cache | 1104 bytes | *KMC_KMEM | *KMC_VMEM zio_link_cache | 48 bytes | KMC_KMEM | KMC_KMEM zio_vdev_cache | 131088 bytes | KMC_VMEM | KMC_VMEM zio_buf_512 | 512 bytes | KMC_KMEM | KMC_KMEM zio_data_buf_512 | 512 bytes | KMC_KMEM | KMC_KMEM zio_buf_1024 | 1024 bytes | KMC_KMEM | KMC_KMEM zio_data_buf_1024 | 1024 bytes | +KMC_VMEM | +KMC_KMEM * Cache memory type will change from KMC_KMEM to KMC_VMEM. + Cache memory type will change from KMC_VMEM to KMC_KMEM. This patch removes another slight point of divergence between ZoL and Illumos. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Closes #2337
* Omit compiler warning by sticking to RAIIMarcel Huber2014-05-221-5/+5
| | | | | | | | | | | Resolve gcc 4.9.0 20140507 warnings about uninitialized 'ptr' when using -Wmaybe-uninitialized. The first two cases appears appear to be legitimate but not the second two. In general this is a good practice so they are all initialized. Signed-off-by: Marcel Huber <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2345
* Added INTEL SSD 530 SeriesJohn Albietz2014-05-191-0/+1
| | | | | | | | INTEL SSD 530 Series... SSDSC2BW24 Signed-off-by: John Albietz <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2184
* Honor zfs_nocacheflush for file vdevsHC2014-05-191-0/+4
| | | | | | | | | | For consistency with disk vdevs honor the zfs_nocacheflush tunable. This setting is available primarily for debugging and performance analysis. Signed-off-by: HC <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2336
* Calculate header size correctly in sa_find_sizes()Tim Chase2014-05-191-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case where a variable-sized SA overlaps the spill block pointer and a new variable-sized SA is being added, the header size was improperly calculated to include the to-be-moved SA. This problem could be reproduced when xattr=sa enabled as follows: ln -s $(perl -e 'print "x" x 120') blah setfattr -n security.selinux -v blahblah -h blah The symlink is large enough to interfere with the spill block pointer and has a typical SA registration as follows (shown in modified "zdb -dddd" <SA attr layout obj> format): [ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ] Adding the SA xattr will attempt to extend the registration to: [ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ZPL_DXATTR ] but since the ZPL_SYMLINK SA interferes with the spill block pointer, it must also be moved to the spill block which will have a registration of: [ ZPL_SYMLINK ZPL_DXATTR ] This commit updates extra_hdrsize when this condition occurs, allowing hdrsize to be subsequently decreased appropriately. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Issue #2214 Issue #2228 Issue #2316 Issue #2343
* Allow for lock-free reading zfsdev_state_list.Tim Chase2014-05-192-18/+52
| | | | | | | | | | | | | | | | | | | | | | | | | Restructure the zfsdev_state_list to allow for lock-free reading by converting to a simple singly-linked list from which items are never deleted and over which only forward iterations are performed. It depends on, among other things, the atomicity of accessing the zs_minor integer and zs_next pointer. This fixes a lock inversion in which the zfsdev_state_lock is used by both the sync task (txg_sync) and indirectly by any user program which uses /dev/zfs; the zfsdev_release method uses the same lock and then blocks on the sync task. The most typical failure scenerio occurs when the sync task is cleaning up a user hold while various concurrent "zfs" commands are in progress. Neither Illumos nor Solaris are affected by this issue because they use DDI interface which provides lock-free reading of device state via the ddi_get_soft_state() function. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2301
* Handle ZPOOL_STATUS_HOSTID_MISMATCH in zpool statusRichard Yao2014-05-191-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Verbatim imports can cause hostid mismatches, but things otherwise work. `zpool status` does not handle this and will fail when assertions are enabled: ``` zpool: ../../cmd/zpool/zpool_main.c:4418: status_callback: Assertion `reason == ZPOOL_STATUS_OK' failed. Program received signal SIGABRT, Aborted. ``` Lets instead add a case to display an informative message such as this: ``` pool: rpool state: ONLINE status: Mismatch between pool hostid and system hostid on imported pool. This pool was previously imported into a system with a different hostid, and then was verbatim imported into this system. action: Export this pool on all systems on which it is imported. Then import it to correct the mismatch. see: http://zfsonlinux.org/msg/ZFS-8000-EY scan: scrub repaired 0 in 0h8m with 0 errors on Thu Apr 17 19:43:57 2014 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 sda ONLINE 0 0 0 errors: No known data errors ``` Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2342
* Use a dedicated taskq for vdev_fileChunwei Chen2014-05-143-1/+24
| | | | | | | | | | | | | | Originally, vdev_file used system_taskq. This would cause a deadlock, especially on system with few CPUs. The reason is that the prefetcher threads, which are on system_taskq, will sometimes be blocked waiting for I/O to finish. If the prefetcher threads consume all the tasks in system_taskq, the I/O cannot be served and thus results in a deadlock. We fix this by creating a dedicated vdev_file_taskq for vdev_file I/O. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2270
* Handle vdev_lookup_top() failure in dva_get_dsize_sync()Brian Behlendorf2014-05-061-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The dva_get_dsize_sync() function incorrectly assumes that the call to vdev_lookup_top() cannot fail. However, the NULL dereference at clearly shows that under certain circumstances it is possible. Note that offset 0x570 (1376) maps as expected to vd->vdev_deflate_ratio. BUG: unable to handle kernel NULL pointer dereference at 00000570 crash> struct -o vdev struct vdev { [0] uint64_t vdev_id; ... ... [1376] uint64_t vdev_deflate_ratio; Given that this can happen this patch add the required error handling. In the case where vdev_lookup_top() fails assume that no deflation will occur for the DVA and use the asize. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Alexey Zhuravlev <[email protected]> Closes #1707 Closes #1987 Closes #1891 Signed-off-by: Brian Behlendorf <[email protected]>
* Check the dataset type more rigorously when fetching properties.Tim Chase2014-05-0610-20/+34
| | | | | | | | | | | | | | | | | | | | | | When fetching property values of snapshots, a check against the head dataset type must be performed. Previously, this additional check was performed only when fetching "version", "normalize", "utf8only" or "case". This caused the ZPL properties "acltype", "exec", "devices", "nbmand", "setuid" and "xattr" to be erroneously displayed with meaningless values for snapshots of volumes. It also did not allow for the display of "volsize" of a snapshot of a volume. This patch adds the headcheck flag paramater to zfs_prop_valid_for_type() and zprop_valid_for_type() to indicate the check is being done against a head dataset's type in order that properties valid only for snapshots are handled correctly. This allows the the head check in get_numeric_property() to be performed when fetching a property for a snapshot. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2265
* Fix styleBrian Behlendorf2014-05-061-1/+2
| | | | | | | A minor style issue was accidentally introduced by aa7d06a. This change resolves that style problem. Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos #4101 finer-grained control of metaslab_debugGeorge Wilson2014-05-062-8/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | Today the metaslab_debug logic performs two tasks: - load all metaslabs on import/open - don't unload metaslabs at the end of spa_sync This change provides knobs for each of these independently. References: https://illumos.org/issues/4101 https://github.com/illumos/illumos-gate/commit/0713e23 Notes: 1) This is a small piece of the metaslab improvement patch from Illumos. It was worth bringing over before the rest, since it's low risk and it can be useful on fragmented pools (e.g. Lustre MDTs). metaslab_debug_unload would give the performance benefit of the old metaslab_debug option without causing unwanted delay during pool import. Ported-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2227
* Treat spill block dbufs as meta dataBrian Behlendorf2014-05-051-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | When the system attributes (SAs) for an object exceed what can can be stored in the bonus area of a dnode a spill block is allocated. These spill blocks are currently considered data blocks. However, they should be accounted for as meta data because they are effectively an extension of the dnode. While this may seem like a minor accounting issue it has broader implications. The key thing to be aware of is that each spill block will hold a reference on its parent dnode. The dnode in turn holds a reference on its dbuf in the dnode object. This means that a single 512 byte data buffer for a spill block can pin over 16k of meta data. This is analogous to the small file situation described in 2b13331 where a relatively small number of data buffer can cause the ARC to exceed the meta limit. However, unlike the small file case a spill block can legitimately be considered meta data. By changing the spill block to meta data they will now be dropped from the cache when the meta limit is reached. This then allows the dnodes and dbufs which the spill block was pinning to be released. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Closes #2294
* Remove SELinux enforcing check from init scriptsBrian Behlendorf2014-05-023-20/+0
| | | | | | | | | | | | | | The default SELinux policy for RHEL and Fedora has been updated to include ZFS in the list of filesystems which support xattrs. Therefore, there's no longer a need to detect this in the init scripts. References: https://bugzilla.redhat.com/show_bug.cgi?id=811532 https://bugzilla.redhat.com/show_bug.cgi?id=816543 Signed-off-by: Brian Behlendorf <[email protected]> Closes #2166
* ztest: Switch to LWP rwlock interfaceRichard Yao2014-05-011-51/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ztest is intended to subject the ZFS code in userland to stress that it should be able to withstand. Any failures that occur when running it are failures that likely would occur inside the kernel. However, being in userland, it is much easier to debug them. In practice, this prevents a large number of problems from reaching production code. A design decision was made by the original authors of ztest to make a distinction between userland locking primitives and kernel locking primitives. The ztest code itself calls userland locking primitives while the kernel code being run in userland will call emulated kernel locking primitives that wrap the userland locking primitives. When ztest was first ported to Linux, a decision was made to use the emulated kernel interfaces everywhere. In effect, the userland rw_rdlock()/rw_wrlock() became the kernel rw_enter() and and the userland rw_unlock() became the kernel rw_exit(). This caused a regression because of an assertion in rw_enter() to catch recursive locking. That is permitted in userland, but not in the kernel. Consequently, the ztest code itself does recursive read locking. The use of the emulated kernel interfaces consequently caused the following failure: ztest: ../../lib/libzpool/kernel.c:384: Assertion `rwlp->rw_owner != zk_thread_current() (0x1c87150 != 0x1c87150)' failed. That occurs because ztest_dmu_objset_create_destroy() will take a read lock and call ztest_dmu_object_alloc_free(). That will call ztest_io(), which will take a readlock only when asked to do ZTEST_IO_REWRITE. This triggered the assertion. The pthreads rwlock interface was based on the LWP rwlock interface implemented in Illumos libc. Luckily enough, the subset used by ztest is almost identical, so we can solve this problem by switching to the LWP thread rwlock interface in ztest. This eliminates a point of divergence with Illumos and should make code sharing slightly easier. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1970
* libspl: Implement LWP rwlock interfaceRichard Yao2014-05-012-0/+52
| | | | | | | | | | | | This implements a subset of the LWP rwlock interface by wrapping the equivalent POSIX thread interface. It is a superset of the features needed by ztest. The missing bits are {,_}rw_read_held() and {,_}rw_write_held(). Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #1970
* Fix libblkid ZFS detection when making new poolsRichard Yao2014-05-011-1/+1
| | | | | | | | | zfsonlinux/zfs@1db7b9be75a225cedb3b7a60028ca5695e5b8346 should have fixed this, but this particular string was overlooked. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2288
* dmu_tx_assign() should not return ENOMEMBrian Behlendorf2014-05-011-1/+6
| | | | | | | | | | | | | | | | | | | | | | As described in the comment above dmu_tx_assign() this function must only fail if the pool is out of space. If for some other reason the TX cannot be assigned (such as memory pressure) ERESTART must be returned. Alternately, EAGAIN could be returned to inject a delay but that isn't required because the caller will block on the condition variable waiting for the next TXG. /* * Assign tx to a transaction group. txg_how can be one of: * * (1) TXG_WAIT. If the current open txg is full, waits until there's * a new one. This should be used when you're not holding locks. * It will only fail if we're truly out of space (or over quota). * ... */ Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #2287
* Implement File Attribute SupportRichard Yao2014-05-013-6/+114
| | | | | | | | | | | | | | | | | | | | | | | We add support for lsattr and chattr to resolve a regression caused by 88c283952f0bfeab54612f9ce666601d83c4244f that broke Python's xattr.list(). That changet broke Gentoo Portage's FEATURES=xattr, which depended on Python's xattr.list(). Only attributes common to both Solaris and Linux are supported. These are 'a', 'd' and 'i' in Linux's lsattr and chattr commands. File attributes exclusive to Solaris are present in the ZFS code, but cannot be accessed or modified through this method. That was the case prior to this patch. The resolution of issue zfsonlinux/zfs#229 should implement some method to permit access and modification of Solaris-specific attributes. References: https://bugs.gentoo.org/show_bug.cgi?id=483516 Original-patch-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1691
* Refactor inode_owner_or_capable() autotools checkRichard Yao2014-05-012-19/+36
| | | | | | | | | | | We need inode_owner_or_capable() for ZFS file attributes in addition to xattrs, so it should go into its own file. This moves it into its own file and changes it to be more comprehensive. It will now fail if no known good API is detected. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #1691
* Fill in mountpoint buffer before using it in errorsilovezfs2014-04-301-3/+3
| | | | | | | | | | | | | | | | | | zfs_is_mountable() fills in the mountpoint buffer, so, as in upstream, it needs to have been called before the mountpoint buffer can be used in error messages. In particular, return (zfs_error_fmt(hdl, EZFS_MOUNTFAILED, dgettext(TEXT_DOMAIN, "cannot mount '%s'"), mountpoint)); should not come before the call to zfs_is_mountable(). Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: ilovezfs <[email protected]> Closes #2284
* Add assertion to catch 0-count pageChunwei Chen2014-04-251-0/+7
| | | | | | | | | | | | Some network related block device uses tcp_sendpage, which doesn't behave well when using 0-count page. Add assertion to catch them. This has a runtime dependency on: zfsonlinux/spl@ae16ed9 Fix crash when using ZFS on Ceph rbd Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2277
* Add support for aarch64 (ARMv8)Jorgen Lundman2014-04-251-2/+2
| | | | | | | | | | | | | | Using the ARM reference simulation (fast model foundation v8) I cross compiled spl and zfs, to confirm it works on ARMv8 (64 bit arm architecture, called aarch64 in Linux). As it is based on previous ARM porting, the resulting patch is disappointingly small, there was very little to do. The code fixes the compile issues and has light testing done. Signed-off-by: Jorgen Lundman <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2260
* Fix LZ4 endianness autodetectionNed Bass2014-04-201-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Endianness detection in LZ4 is broken in user-space builds. This bug corrupts compressed data and manifests itself in several ztest failures. When LZ4 was originally ported to Illumos ZFS, the proper checks for Linux were stripped out. The Linux port then inherited the remaining detection code that works on Illumos but not on Linux. The current LZ4 endianness check misuses the condition defined(__BIG_ENDIAN) to indicate a big-endian system. On Linux __BIG_ENDIAN is defined uncondtionally in the user-space header /usr/include/endian.h, regardless of the endianness of the system. The kernel does not use this header, so only user-space builds are affected. While we could fix this by restoring the upstream LZ4 endianness detection code, reliable checks already exist in libspl/include/sys/isa_defs.h. This change uses the libspl results to replace the word-size and endianness checks in LZ4, simplifying the code and reducing duplication. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: DHE <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Fixes #1963 Fixes #1964 Fixes #1965
* Fix zfsdev_ioctl() kmem leak warningBrian Behlendorf2014-04-181-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | Due to an asymmetry in the kmem accounting a memory leak was being reported when it was only an accounting issue. All memory allocated with kmem_alloc() must be released with kmem_free() or it will not be properly accounted for. In this case the code used strfree() to release the memory allocated by kmem_alloc(). Presumably this was done because the size of the memory region wasn't available when the memory needed to be freed. To resolve this issue the code has been updated to use strdup() instead of kmem_alloc() to allocate the memory. Like strfree(), strdup() is not integrated with the memory accounting. This means we can use strfree() to release it like Illumos. SPL: kmem leaked 10/4368729 bytes address size data func:line ffff880067e9aa40 10 ZZZZZZZZZZ zfsdev_ioctl:5655 Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #2262
* Various zimport.sh fixesBrian Behlendorf2014-04-171-26/+31
| | | | | | | | | | | | | | 1) $SPLSRC and $SRCDIR should be changed to $SRC_DIR. These are vestiges of an earlier version of the script and were missed when it was updated. Additionally ensure the directory is created. 2) The 'fail' function should take an integer argument for the error code to return. Otherwise 0 (success) will be mistakenly returned and errors will we incorrectly suppressed. The error code should be meaningful enough to determine where the script failed. Signed-off-by: Brian Behlendorf <[email protected]>
* Report atime and relatime as the property's actual value.Tim Chase2014-04-161-2/+2
| | | | | | | | | | | Neither atime nor relatime should be considered to be "temporary mount point properties". Their semantics are enforced completely within ZFS and also they're (correctly) not documented as being temporary mount point properties. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2257
* Uninitialized variable spa_autoreplace usedDHE2014-04-161-1/+1
| | | | | | | | Caught by ztest and valgrind. Signed-off-by: DHE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2259
* Use ddi_time_after and friends to compare timeChunwei Chen2014-04-146-10/+22
| | | | | | | | | Also, make sure we use clock_t for ddi_get_lbolt to prevent type conversion from screwing things. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2142
* Make zimport.sh bash dependency explicitBrian Behlendorf2014-04-101-1/+1
| | | | | | | | Unfortunately, the zimport.sh test script really does depend on bash. Moving to /bin/sh should be possible once the shared infrastructure scripts it depends on is made portable. Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 3.14 compat: rq_for_each_segment in dmu_req_copyChunwei Chen2014-04-103-8/+56
| | | | | | | | | | rq_for_each_segment changed from taking bio_vec * to taking bio_vec. We provide rq_for_each_segment4 which takes both. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2124
* Revert "Fix zvol+btrfs hang"Chunwei Chen2014-04-101-77/+0
| | | | | | | | | | | | After the dmu_req_copy change, bi_io_vecs are not touched, so this is no longer needed. This reverts commit e26ade5101ba1d8e8350ff1270bfca4258e1ffe3. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2124
* Refactor dmu_req_copy for immutable biovec changesChunwei Chen2014-04-101-28/+39
| | | | | | | | | | | | Originally, dmu_req_copy modifies bv_len and bv_offset in bio_vec so that it can continue in subsequent passes. However, after the immutable biovec changes in Linux 3.14, this is not allowed. So instead, we just tell dmu_req_copy how many bytes are already copied and it will skip to the right spot accordingly. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2124
* Linux 3.14 compat: Immutable biovec changes in vdev_disk.cChunwei Chen2014-04-104-5/+36
| | | | | | | | | | bi_sector, bi_size and bi_idx are moved from bio to bio->bi_iter. This patch creates BIO_BI_*(bio) macros to hide the differences. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2124
* Linux 3.14 compat: posix_acl_{create,chmod}Chunwei Chen2014-04-103-5/+26
| | | | | | | | | posix_acl_{create,chmod} is changed to __posix_acl_{create_chmod} Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2124
* Improve zfs.sh error messagesBrian Behlendorf2014-04-102-4/+8
| | | | | | | | | Ensure an error message is logged when the 'zfs.sh' script fails to either load a module or if udev fails to create the /dev/zfs device. Error messages for missing KERNEL_MODULES are suppressed because that functionality may just be built-in to the kernel. Signed-off-by: Brian Behlendorf <[email protected]>
* Replace zed_file_create_dirs() with mkdirp()Chris Dunlap2014-04-093-97/+13
| | | | | | | | | | | | | | | | | | When processing directory components starting from the root dir, zed_file_create_dirs() contained a bug in checking the return value of mkdir(). A typo was made, and the test for (mkdir_errno != EEXIST) was erroneously written as (mkdir_errno == EEXIST). If some of the leading directory components already existed, this bug would cause the routine to exit before creating the remaining directory components. Instead of fixing the above mkdir_errno test, this commit replaces zed_file_create_dirs() with mkdirp(). This cleanup was already planned, and zed_file_create_dirs() only existed because I didn't realize mkdirp() was already in tree at the time. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2248
* Set errno for mkdirp() called with NULL path ptrChris Dunlap2014-04-091-1/+3
| | | | | | | | | | | | | | | | If mkdirp() is called with a NULL ptr for the path arg, it will return -1 with errno unchanged. This is unexpected since on error it should return -1 and set errno to one of the error values listed for mkdir(2). This commit sets errno = ENOENT for this NULL ptr case. This is in accordance with the errors specified by mkdir(2): ENOENT A component of the path prefix does not exist or is a null pathname. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2248
* Dynamically create loop devicesBrian Behlendorf2014-04-091-12/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | Several of the in-tree regression tests depend on the availability of loop devices. If for some reason no loop devices are available the tests will fail. Normally this isn't an issue because most Linux distributions create 8 loop devices by default. This is enough for our purposes. However, recent Fedora releases have only been creating a single loop device and this leads to failures. Alternately, if something else of the system is using the loop devices we may see failures. The fix for this is to update the support scripts to dynamically create loop devices as needed. The scripts need only create a node under /dev/ and the loop driver with create the minor. This behavior has been supported by the loop driver for ages. Additionally this patch updates cleanup_loop_devices() to cleanup loop devices which have already had their file store deleted. This helps prevent stale loop devices from accumulating on the system due to test failures. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Closes #2249
* Improve partition detection on lesser used devicesRichard Yao2014-04-081-8/+12
| | | | | | | | | | | | | | The format strings in efi_get_info() are intended to extract both the main device and partition number. However, this is only done correctly for hd, sd and vd devices. The format strings for ram, dm-, md and loop devices misparse the input. This causes the partition device to be incorrectly labelled as the main device with the partition being labelled 0. Reported-by: ilovezfs <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2175
* Allow specifying '-o <opts>' in defaults/init script.Turbo Fredriksson2014-04-041-1/+2
| | | | | | Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2103
* Support using overlay mounts in defaults/init script.Turbo Fredriksson2014-04-041-1/+6
| | | | | | Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2103
* Fix for re-reading /etc/mtab.John M. Layman2014-04-043-5/+22
| | | | | | | | | | | | | | | | | | | This is a continuation of fb5c53ea65b75c67c23f90ebbbb1134a5bb6c140: When /etc/mtab is updated on Linux it's done atomically with rename(2). A new mtab is written, the existing mtab is unlinked, and the new mtab is renamed to /etc/mtab. This means that we must close the old file and open the new file to get the updated contents. Using rewind(3) will just move the file pointer back to the start of the file, freopen(3) will close and open the file. In this commit, a few more rewind(3) calls were replaced with freopen(3) to allow updated mtab entries to be picked up immediately. Signed-off-by: John M. Layman <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2215 Issue #1611
* Fix locking order in zfs_zget()Richard Yao2014-04-041-1/+1
| | | | | Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Fix deadlock in zfs_zget()Richard Yao2014-04-041-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zfsonlinux/zfs#180 occurred because of a race between inode eviction and zfs_zget(). zfsonlinux/zfs@36df284 tried to address it by making a call to the VFS to learn whether an inode is being evicted. If it was being evicted the operation was retried after dropping and reacquiring the relevant resources. Unfortunately, this introduced another deadlock. INFO: task kworker/u24:6:891 blocked for more than 120 seconds. Tainted: P O 3.13.6 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u24:6 D ffff88107fcd2e80 0 891 2 0x00000000 Workqueue: writeback bdi_writeback_workfn (flush-zfs-5) ffff8810370ff950 0000000000000002 ffff88103853d940 0000000000012e80 ffff8810370fffd8 0000000000012e80 ffff88103853d940 ffff880f5c8be098 ffff88107ffb6950 ffff8810370ff980 ffff88103a9a5b78 0000000000000000 Call Trace: [<ffffffff813dd1d4>] schedule+0x24/0x70 [<ffffffff8115fc09>] __wait_on_freeing_inode+0x99/0xc0 [<ffffffff8115fdd8>] find_inode_fast+0x78/0xb0 [<ffffffff811608c5>] ilookup+0x65/0xd0 [<ffffffffa035c5ab>] zfs_zget+0xdb/0x260 [zfs] [<ffffffffa03589d6>] zfs_get_data+0x46/0x340 [zfs] [<ffffffffa035fee1>] zil_add_block+0xa31/0xc00 [zfs] [<ffffffffa0360642>] zil_commit+0x12/0x20 [zfs] [<ffffffffa036a6e4>] zpl_putpage+0x174/0x840 [zfs] [<ffffffff811071ec>] do_writepages+0x1c/0x40 [<ffffffff8116df2b>] __writeback_single_inode+0x3b/0x2b0 [<ffffffff8116ecf7>] writeback_sb_inodes+0x247/0x420 [<ffffffff8116f5f3>] wb_writeback+0xe3/0x320 [<ffffffff81170b8e>] bdi_writeback_workfn+0xfe/0x490 [<ffffffff8106072c>] process_one_work+0x16c/0x490 [<ffffffff810613f3>] worker_thread+0x113/0x390 [<ffffffff81066edf>] kthread+0xdf/0x100 This patch implements the original fix in a slightly different manner in order to avoid both deadlocks. Instead of relying on a call to ilookup() which can block in __wait_on_freeing_inode() the return value from igrab() is used. This gives us the information that ilookup() provided without the risk of a deadlock. Alternately, this race could be closed by registering an sops->drop_inode() callback. The callback would need to detect the active SA hold thereby informing the VFS that this inode should not be evicted. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #180