aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux
Commit message (Collapse)AuthorAgeFilesLines
* Linux 4.8 compat: REQ_PREFLUSHBrian Behlendorf2016-07-291-0/+12
| | | | | | | | | | | | The REQ_FLUSH flag was renamed REQ_PREFLUSH to avoid confusion with REQ_OP_FLUSH. See https://github.com/torvalds/linux/commit/28a8f0d3 for complete details. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4892 Issue #4899
* Check whether the kernel supports i_uid/gid_read/write helpersNikolay Borisov2016-07-251-0/+53
| | | | | | | | | | | | | Since the concept of a kuid and the need to translate from it to ordinary integer type was added in kernel version 3.5 implement necessary plumbing to be able to detect this condition during compile time. If the kernel doesn't support the kuid then just fall back to directly accessing the respective struct inode's members Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4685 Issue #227
* Linux 4.7 compat: handler->set() takes both dentry and inodeBrian Behlendorf2016-06-011-1/+15
| | | | | | | | | Counterpart to fd4c7b7, the same approach was taken to resolve the compatibility issue. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #4717 Issue #4665
* Linux 4.7 compat: replace blk_queue_flush with blk_queue_write_cacheChunwei Chen2016-05-201-0/+27
| | | | | | Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4665
* Linux 4.7 compat: handler->get() takes both dentry and inodeChunwei Chen2016-05-201-1/+14
| | | | | | Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4665
* Fix user namespaces uid/gid mappingBrian Behlendorf2016-04-301-4/+4
| | | | | | | | | | | As described in torvalds/linux@5f3a4a2 the &init_user_ns, and not the current user_ns, should be passed to posix_acl_from_xattr() and posix_acl_to_xattr(). Conveniently the init_user_ns is available through the init credential (kcred). Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Massimo Maggi <[email protected]> Closes #4177
* Support for vectorized algorithms on x86Gvozden Neskovic2016-03-212-1/+390
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is initial support for x86 vectorized implementations of ZFS parity and checksum algorithms. For the compilation phase, configure step checks if toolchain supports relevant instruction sets. Each implementation must ensure that the code is not passed to compiler if relevant instruction set is not supported. For this purpose, following new defines are provided if instruction set is supported: - HAVE_SSE, - HAVE_SSE2, - HAVE_SSE3, - HAVE_SSSE3, - HAVE_SSE4_1, - HAVE_SSE4_2, - HAVE_AVX, - HAVE_AVX2. For detecting if an instruction set can be used in runtime, following functions are provided in (include/linux/simd_x86.h): - zfs_sse_available() - zfs_sse2_available() - zfs_sse3_available() - zfs_ssse3_available() - zfs_sse4_1_available() - zfs_sse4_2_available() - zfs_avx_available() - zfs_avx2_available() - zfs_bmi1_available() - zfs_bmi2_available() These function should be called once, on module load, or initialization. They are safe to use from user and kernel space. If an implementation is using more than single instruction set, both compiler and runtime support for all relevant instruction sets should be checked. Kernel fpu methods: - kfpu_begin() - kfpu_end() Use __get_cpuid_max and __cpuid_count from <cpuid.h> Both gcc and clang have support for these. They also handle ebx register in case it is used for PIC code. Signed-off-by: Gvozden Neskovic <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #4381
* Linux 4.5 compat: xattr list handlerBrian Behlendorf2016-01-201-27/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The registered xattr .list handler was simplified in the 4.5 kernel to only perform a permission check. Given a dentry for the file it must return a boolean indicating if the name is visible. This differs slightly from the previous APIs which also required the function to copy the name in to the provided list and return its size. That is now all the responsibility of the caller. This should be straight forward change to make to ZoL since we've always required the caller to make the copy. However, this was slightly complicated by the need to support 3 older APIs. Yes, between 2.6.32 and 4.5 there are 4 versions of this interface! Therefore, while the functional change in this patch is small it includes significant cleanup to make the code understandable and maintainable. These changes include: - Improved configure checks for .list, .get, and .set interfaces. - Interfaces checked from newest to oldest. - Strict checking for each possible known interface. - Configure fails when no known interface is available. - HAVE_*_XATTR_LIST renamed HAVE_XATTR_LIST_* for consistency with similar iops and fops configure checks. - POSIX_ACL_XATTR_{DEFAULT|ACCESS} were removed forcing callers to move to their replacements, XATTR_NAME_POSIX_ACL_{DEFAULT|ACCESS}. Compatibility wrapper were added for old kernels. - ZPL_XATTR_LIST_WRAPPER added which behaves the same as the existing ZPL_XATTR_{GET|SET} WRAPPERs. Only the inode is guaranteed to be a valid pointer, passing NULL for the 'list' and 'name' variables is allowed and must be checked for. All .list functions were updated to use the wrapper to aid readability. - zpl_xattr_filldir() updated to use the .list function for its permission check which is consistent with the updated Linux 4.5 interface. If a .list function is registered it should return 0 to indicate a name should be skipped, if there is no registered function the name will be added. - Additional documentation from xattr(7) describing the correct behavior for each namespace was added before the relevant handlers. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Issue #4228
* Use uio for zvol_{read,write}Chunwei Chen2015-12-151-0/+2
| | | | | | | | | Since uio now supports bvec, we can convert bio into uio and reuse dmu_{read,write}_uio. This way, we can remove some duplicate code. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4078
* Linux 4.4 compat: xattr operations takes xattr_handlerChunwei Chen2015-12-011-0/+28
| | | | | | | | | | The xattr_hander->{list,get,set} were changed to take a xattr_handler, and handler_flags argument was removed and should be accessed by handler->flags. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4021
* Linux 4.3 compat: bio_end_io_t / BIO_UPTODATELukas Wunner2015-09-251-12/+9
| | | | | | | | | | | | | | | | | | Commit torvalds/linux@4246a0b63bd8f56a1469b12eafeb875b1041a451 ("block: add a bi_error field to struct bio") dropped the error argument from bio_endio in favor of newly introduced bio->bi_error. This also replaces bio->bi_flags value BIO_UPTODATE. bio_endio was a 3 argument function until Linux 2.6.24, which made it a 2 argument function, and now the prototype has changed yet again to a 1 argument function. Support for pre 2.6.24 kernels was already dropped with 37f9dac592bf ("zvol processing should use struct bio") which assumed the 2 argument version in zvol_request(). Remaining code to support the 3 argument version is hereby removed. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Issue #3799
* Reintroduce IO accounting on zvols on Linux 3.19+Richard Yao2015-09-091-0/+5
| | | | | | | | | | | | | | zfsonlinux/zfs@e20cd6f7a8922709b1aa2ecefd783390102d79e0 caused us to lose IO accounting on zvols. When I originally wrote that last year, the symbols we needed to maintain IO accounting were GPL exported, but torvalds/linux@394ffa503bc40e32d7f54a9b817264e81ce131b4 provided suitable symbols for restoring this functionality 4 months later. We can call them to restore the IO accounting on Linux 3.19 and later as well as any older kernels where that patch is backported. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3741
* Remove blk_rq_bytes()/blk_rq_sectors autotools checksRichard Yao2015-09-041-23/+0
| | | | Signed-off-by: Richard Yao <[email protected]>
* Remove blk_rq_pos() autotools checkRichard Yao2015-09-041-8/+0
| | | | Signed-off-by: Richard Yao <[email protected]>
* Remove blk_fetch_request() autotools checkRichard Yao2015-09-041-14/+0
| | | | Signed-off-by: Richard Yao <[email protected]>
* Remove blk_requeue_request() autotools checkRichard Yao2015-09-041-8/+0
| | | | Signed-off-by: Richard Yao <[email protected]>
* Remove blk_end_request() autotools check.Richard Yao2015-09-041-74/+0
| | | | Signed-off-by: Richard Yao <[email protected]>
* Remove rq_is_sync() autotools checkRichard Yao2015-09-041-8/+0
| | | | Signed-off-by: Richard Yao <[email protected]>
* Remove rq_for_each_segment() autotools checkRichard Yao2015-09-041-42/+0
| | | | Signed-off-by: Richard Yao <[email protected]>
* zvol processing should use struct bioRichard Yao2015-09-041-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Internally, zvols are files exposed through the block device API. This is intended to reduce overhead when things require block devices. However, the ZoL zvol code emulates a traditional block device in that it has a top half and a bottom half. This is an unnecessary source of overhead that does not exist on any other OpenZFS platform does this. This patch removes it. Early users of this patch reported double digit performance gains in IOPS on zvols in the range of 50% to 80%. Comments in the code suggest that the current implementation was done to obtain IO merging from Linux's IO elevator. However, the DMU already does write merging while arc_read() should implicitly merge read IOs because only 1 thread is permitted to fetch the buffer into ARC. In addition, commercial ZFSOnLinux distributions report that regular files are more performant than zvols under the current implementation, and the main consumers of zvols are VMs and iSCSI targets, which have their own elevators to merge IOs. Some minor refactoring allows us to register zfs_request() as our ->make_request() handler in place of the generic_make_request() function. This eliminates the layer of code that broke IO requests on zvols into a top half and a bottom half. This has several benefits: 1. No per zvol spinlocks. 2. No redundant IO elevator processing. 3. Interrupts are disabled only when actually necessary. 4. No redispatching of IOs when all taskq threads are busy. 5. Linux's page out routines will properly block. 6. Many autotools checks become obsolete. An unfortunate consequence of eliminating the layer that generic_make_request() is that we no longer calls the instrumentation hooks for block IO accounting. Those hooks are GPL-exported, so we cannot call them ourselves and consequently, we lose the ability to do IO monitoring via iostat. Since zvols are internally files mapped as block devices, this should be okay. Anyone who is willing to accept the performance penalty for the block IO layer's accounting could use the loop device in between the zvol and its consumer. Alternatively, perf and ftrace likely could be used. Also, tools like latencytop will still work. Tools such as latencytop sometimes provide a better view of performance bottlenecks than the traditional block IO accounting tools do. Lastly, if direct reclaim occurs during spacemap loading and swap is on a zvol, this code will deadlock. That deadlock could already occur with sync=always on zvols. Given that swap on zvols is not yet production ready, this is not a blocker. Signed-off-by: Richard Yao <[email protected]>
* VDEV_REQ_FUA should be mapped to REQ_FUARichard Yao2015-09-021-1/+1
| | | | | | | | | | | Pre-2.6.37 kernels support REQ_FUA in request flags, but not in BIO flags. zvols are the only consumer of VDEV_REQ_FUA and since they are passed requests, they should be obey the REQ_FUA flag like later kernels. This optimization will only matter on 2.6.36 and 2.6.37 because the zvol rework changes things to use bio, where we no longer are able to distinguish on earlier kernels Signed-off-by: Richard Yao <[email protected]>
* Remove blk_queue_io_opt() autotools checkRichard Yao2015-09-011-9/+0
| | | | | | | | This is needed for supporting kernels earlier than 2.6.30. Support for those kernels was dropped, so we can safely remove this check. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Remove blk_queue_physical_block_size() autotools checkRichard Yao2015-09-011-10/+0
| | | | | | | | This is needed for supporting kernels earlier than 2.6.30. Support for those kernels was dropped, so we can safely remove this check. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 3.18 compat: Snapshot auto-mountingBrian Behlendorf2015-08-311-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-factor the .zfs/snapshot auto-mouting code to take in to account changes made to the upstream kernels. And to lay the groundwork for enabling access to .zfs snapshots via NFS clients. This patch makes the following core improvements. * All actively auto-mounted snapshots are now tracked in two global trees which are indexed by snapshot name and objset id respectively. This allows for fast lookups of any auto-mounted snapshot regardless without needing access to the parent dataset. * Snapshot entries are added to the tree in zfsctl_snapshot_mount(). However, they are now removed from the tree in the context of the unmount process. This eliminates the need complicated error logic in zfsctl_snapshot_unmount() to handle unmount failures. * References are now taken on the snapshot entries in the tree to ensure they always remain valid while a task is outstanding. * The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right after the auto-mount succeeds. This allows to kernel to unmount idle auto-mounted snapshots if needed removing the need for the zfsctl_unmount_snapshots() function. * Snapshots in active use will not be automatically unmounted. As long as at least one dentry is revalidated every zfs_expire_snapshot/2 seconds the auto-unmount expiration timer will be extended. * Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3589 Closes #3344 Closes #3295 Closes #3257 Closes #3243 Closes #3030 Closes #2841
* Add compatibility layer for {kmap,kunmap}_atomicChunwei Chen2015-08-242-1/+42
| | | | | | | | | Starting from linux-2.6.37, {kmap,kunmap}_atomic takes 1 argument instead of 2. We use zfs_{kmap,kunmap}_atomic as wrappers and always take 2 argument, but ignore the 2nd for newer kernel. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 4.2 compat: bdi_setup_and_register()Brian Behlendorf2015-07-171-0/+1
| | | | | | | | | | | The vfs_compat.h header should include the linux/backing-dev.h header because it depends on the bdi_* functions defined there. In previous kernels this header was being indirectly included which prevented a build failure. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #3596
* Linux 2.6.36 compat, use REQ_FAILFAST_MASK and remove pre-2.6.36 supportTim Chase2015-05-111-10/+5
| | | | | | | | | | | | | | | | | Commit f4af6bb783b0b7f2a6075cb1c74c225db8a157b2 which added support for REQ_FAILFAST_MASK but the new autoconf test didn't use the same preprocessor macro name as the code did. The effect is that FAILFAST mode has not been enabled for ZoL in any post-2.6.35 kernel. Retire the HAVE_BIO_RW_FAILFAST interface used in pre-2.6.28 kernels. Raise an error condition if the FAILFAST interface can't be detected. Signed-off-by: Tim Chase <[email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3386
* Linux 4.0 compat: bdi_setup_and_register()Brian Behlendorf2015-03-031-10/+20
| | | | | | | | | | | | | | | | | | | | The 'capabilities' argument which was passed to bdi_setup_and_register() has been removed. File systems should no longer pass BDI_CAP_MAP_COPY. For our purposes this means there are now three different interfaces which must be handled. A zpl_bdi_setup_and_register() wrapper function has been introduced to provide a single interface to the ZPL code. * 2.6.32 - 2.6.33, bdi_setup_and_register() is not exported. * 2.6.34 - 3.19, bdi_setup_and_register() takes 3 arguments. * 4.0 - x.y, bdi_setup_and_register() takes 2 arguments. I've also taken this opportunity to remove HAVE_BDI because kernels older then 2.6.32 are no longer supported. All kernels newer than this will have one of the above interfaces. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #3128
* Linux 3.19 compat: file_inode was addedJörg Thalheim2015-02-101-0/+13
| | | | | | | | | struct access f->f_dentry->d_inode was replaced by accessor function file_inode(f) Signed-off-by: Joerg Thalheim <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3084
* Kernel header installation should respect --prefixRichard Yao2014-10-281-1/+1
| | | | | | | | | | This is the upstream component of work that enables preliminary support for building Gentoo's ZFS packaging on other Linux systems via Gentoo Prefix. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2641
* Update utsname supportBrian Behlendorf2014-10-172-1/+31
| | | | | | | | | | | | | | | | | | Modify the code to use the utsname() kernel function rather than a global variable. This results is cleaner more portable code because utsname() is already provided by the kernel and can be easily emulated in user space via uname(2). This means that it will behave consistently in both contexts. This is also has the benefit that it allows the removal of a few _KERNEL pre-processor conditions. And it also is a pre-requisite for a proper FUSE port because we need to provide a valid utsname. Finally, it allows us to remove this functionality from the SPL and all the related compatibility code. Signed-off-by: Brian Behlendorf <[email protected]> Issue #2757
* Include sys/taskq.h in linux/vfs_compat.hRichard Yao2014-08-141-0/+2
| | | | | | | | | | | | We should have included sys/taskq.h directly because we use the taskq code here, but we instead had files that included sys/taskq.h also include sys/kmem.h, which happened to include sys/taskq.h. sys/kmem.h no longer does this, so we must define the include as we should have done in the first place. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2411
* Allow building without ACLsChris Wedgwood2014-05-301-15/+15
| | | | | | | | | | Some kernel definitions were buried inside the #if... #endif logic for ACLs. When ACLs are not available these definitions get lost causing the build to fail. Signed-off-by: Chris Wedgwood <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2349
* Implement File Attribute SupportRichard Yao2014-05-011-3/+0
| | | | | | | | | | | | | | | | | | | | | | | We add support for lsattr and chattr to resolve a regression caused by 88c283952f0bfeab54612f9ce666601d83c4244f that broke Python's xattr.list(). That changet broke Gentoo Portage's FEATURES=xattr, which depended on Python's xattr.list(). Only attributes common to both Solaris and Linux are supported. These are 'a', 'd' and 'i' in Linux's lsattr and chattr commands. File attributes exclusive to Solaris are present in the ZFS code, but cannot be accessed or modified through this method. That was the case prior to this patch. The resolution of issue zfsonlinux/zfs#229 should implement some method to permit access and modification of Solaris-specific attributes. References: https://bugs.gentoo.org/show_bug.cgi?id=483516 Original-patch-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1691
* Linux 3.14 compat: rq_for_each_segment in dmu_req_copyChunwei Chen2014-04-101-0/+24
| | | | | | | | | | rq_for_each_segment changed from taking bio_vec * to taking bio_vec. We provide rq_for_each_segment4 which takes both. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2124
* Linux 3.14 compat: Immutable biovec changes in vdev_disk.cChunwei Chen2014-04-101-0/+10
| | | | | | | | | | bi_sector, bi_size and bi_idx are moved from bio to bio->bi_iter. This patch creates BIO_BI_*(bio) macros to hide the differences. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2124
* Linux 3.14 compat: posix_acl_{create,chmod}Chunwei Chen2014-04-101-3/+8
| | | | | | | | | posix_acl_{create,chmod} is changed to __posix_acl_{create_chmod} Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2124
* cstyle: Resolve C style issuesMichael Kjorling2013-12-184-105/+115
| | | | | | | | | | | | | | | | | | The vast majority of these changes are in Linux specific code. They are the result of not having an automated style checker to validate the code when it was originally written. Others were caused when the common code was slightly adjusted for Linux. This patch contains no functional changes. It only refreshes the code to conform to style guide. Everyone submitting patches for inclusion upstream should now run 'make checkstyle' and resolve any warning prior to opening a pull request. The automated builders have been updated to fail a build if when 'make checkstyle' detects an issue. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1821
* Honor CONFIG_FS_POSIX_ACL kernel optionMassimo Maggi2013-11-051-0/+2
| | | | | | | | | | | | | | | | The required Posix ACL interfaces are only available for kernels with CONFIG_FS_POSIX_ACL defined. Therefore, only enable Posix ACL support for these kernels. All major distribution kernels enable CONFIG_FS_POSIX_ACL by default. If your kernel does not support Posix ACLs the following warning will be printed at ZFS module load time. "ZFS: Posix ACLs disabled by kernel" Signed-off-by: Massimo Maggi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1825
* Posix ACL SupportMassimo Maggi2013-10-292-0/+177
| | | | | | | | | | | | | | | | | | | | | | | | This change adds support for Posix ACLs by storing them as an xattr which is common practice for many Linux file systems. Since the Posix ACL is stored as an xattr it will not overwrite any existing ZFS/NFSv4 ACLs which may have been set. The Posix ACL will also be non-functional on other platforms although it may be visible as an xattr if that platform understands SA based xattrs. By default Posix ACLs are disabled but they may be enabled with the new 'aclmode=noacl|posixacl' property. Set the property to 'posixacl' to enable them. If ZFS/NFSv4 ACL support is ever added an appropriate acltype will be added. This change passes the POSIX Test Suite cleanly with the exception of xacl/00.t test 45 which is incorrect for Linux (Ext4 fails too). http://www.tuxera.com/community/posix-test-suite/ Signed-off-by: Massimo Maggi <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #170
* Add SEEK_DATA/SEEK_HOLE to lseek()/llseek()Li Dongyang2013-07-021-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | The approach taken was the rework zfs_holey() as little as possible and then just wrap the code as needed to ensure correct locking and error handling. Tested with xfstests 285 and 286. All tests pass except for 7-9 of 285 which try to reserve blocks first via fallocate(2) and fail because fallocate(2) is not yet supported. Note that the filp->f_lock spinlock did not exist prior to Linux 2.6.30, but we avoid the need for autotools check by virtue of the fact that SEEK_DATA/SEEK_HOLE support was not added until Linux 3.1. An autoconf check was added for lseek_execute() which is currently a private function but the expectation is that it will be exported perhaps as early as Linux 3.11. Reviewed-by: Richard Laager <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1384
* Return -EOPNOTSUPP for ZFS_IOC_{GET|SET}FLAGSBrian Behlendorf2013-06-261-0/+3
| | | | | | | | | | Until these hooks are fully implemented return the expected -EOPNOTSUPP error to indicate they are not functional. This allows test suites such as xfstests to cleanly skip testing this functionality until it's implemented. Signed-off-by: Brian Behlendorf <[email protected]> Issue #229
* Fix compile warning on 32-bit systemsYing Zhu2013-06-191-1/+1
| | | | | | | | | | | | | The definition of zfs_vdev_holder casts VDEV_HOLDER into a function pointer passing to linux kernel's block layer function blkdev_get_by_path. However current VDEV_HOLDER is defined to be wider than 32 bits and the compiler warns about potential overflows. Instead of specifying different values for 32-bit and 64-bit systems using ifdefs, choose the common factor 32-bit addresses. Redefine VDEV_HOLDER to 0x2401de7("zholder") here. Signed-off-by: Ying Zhu <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1520
* Change zfs-kmod-devel install pathBrian Behlendorf2013-03-131-1/+1
| | | | | | | | | | | | | | | Install the common zfs kernel development headers under /usr/src/zfs-<version>/ rather than in a kernel specific directory. The kernel specific build products such as zfs_config.h and Modules.symvers are left installed under /usr/src/zfs-<version>/<kernel>. This was done to be consistent with where dkms expects kernel module source to be packaged. It also allows for a common zfs-kmod-devel package which includes the headers, and per-kernel zfs-kmod-devel-<kernel> packages. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix hot sparesBrian Behlendorf2013-03-011-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The issue with hot spares in ZoL is because it opens all leaf vdevs exclusively (O_EXCL). On Linux, exclusive opens cause subsequent exclusive opens to fail with EBUSY. This could be resolved by not opening any of the devices exclusively, which is what Illumos does, but the additional protection offered by exclusive opens is desirable. It cleanly prevents you from accidentally adding an in-use non-ZFS device to your pool. To fix this we very slightly relaxed the usage of O_EXCL in the following ways. 1) Functions which open the device but only read had the O_EXCL flag removed and were updated to use O_RDONLY. 2) A common holder was added to the vdev disk code. This allow the ZFS code to internally open the device multiple times but non-ZFS callers may not. 3) An exception was added to make_disks() for hot spare when creating partition tables. For hot spare devices which are already opened exclusively we skip creating the partition table because this must already have been done when the disk was originally added as a hot spare. Additional minor changes include fixing check_in_use() to use a partition instead of a slice suffix. And is_spare() was moved above make_disks() to avoid adding a forward reference. Signed-off-by: Brian Behlendorf <[email protected]> Closes #250
* Linux 2.6.26 compat, lookup_bdev()Brian Behlendorf2013-01-281-0/+9
| | | | | | | | | | | | | | | | | | | It's doubtful many people were impacted by this but commit 6c28567 accidentally broke ZFS builds for 2.6.26 and earlier kernels. This commit depends on the lookup_bdev() function which exists in 2.6.26 but wasn't exported until 2.6.27. The availability of the function isn't critical so a wrapper is introduced which returns ERR_PTR(-ENOTSUP) when the function isn't defined. This will have the effect of causing zvol_is_zvol() to always fail for 2.6.26 kernels. This in turn means vdevs will always get opened concurrently which is good for normal usage. This will only become an issue if your using a zvol as a vdev in another pool. In which case you really should be using a newer kernel anyway. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1205
* Add d_clear_d_op() compatibilityBrian Behlendorf2013-01-231-0/+20
| | | | | | | | | | | | | Added d_clear_d_op() helper function which clears some flags and the registered dentry->d_op table. This is required because d_set_d_op() issues a warning when the dentry operations table is already set. For the .zfs control directory to work properly we must be able to override the default operations table and register custom .d_automount and .d_revalidate callbacks. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #1230
* Fix 'zfs rollback' on mounted file systemsBrian Behlendorf2013-01-171-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rolling back a mounted filesystem with open file handles and cached dentries+inodes never worked properly in ZoL. The major issue was that Linux provides no easy mechanism for modules to invalidate the inode cache for a file system. Because of this it was possible that an inode from the previous filesystem would not get properly dropped from the cache during rolling back. Then a new inode with the same inode number would be create and collide with the existing cached inode. Ideally this would trigger an VERIFY() but in practice the error wasn't handled and it would just NULL reference. Luckily, this issue can be resolved by sprucing up the existing Solaris zfs_rezget() functionality for the Linux VFS. The way it works now is that when a file system is rolled back all the cached inodes will be traversed and refetched from disk. If a version of the cached inode exists on disk the in-core copy will be updated accordingly. If there is no match for that object on disk it will be unhashed from the inode cache and marked as stale. This will effectively make the inode unfindable for lookups allowing the inode number to be immediately recycled. The inode will then only be accessible from the cached dentries. Subsequent dentry lookups which reference a stale inode will result in the dentry being invalidated. Once invalidated the dentry will drop its reference on the inode allowing it to be safely pruned from the cache. Special care is taken for negative dentries since they do not reference any inode. These dentires will be invalidate based on when they were added to the dentry cache. Entries added before the last rollback will be invalidate to prevent them from masking real files in the dataset. Two nice side effects of this fix are: * Removes the dependency on spl_invalidate_inodes(), it can now be safely removed from the SPL when we choose to do so. * zfs_znode_alloc() no longer requires a dentry to be passed. This effectively reverts this portition of the code to its upstream counterpart. The dentry is not instantiated more correctly in the Linux ZPL layer. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #795
* Fix false ENOENT on snapshot control dentriesNed Bass2013-01-161-0/+25
| | | | | | | | | | | | | | | | | | | | | | Lookups in the snapshot control directory for an existing snapshot fail with ENOENT if an earlier lookup failed before the snapshot was created. This is because the earlier lookup causes a negative dentry to be cached which is never invalidated. The bug can be reproduced as follows (the second ls should succeed): $ ls /tank/.zfs/snapshot/s ls: cannot access /tank/.zfs/snapshot/s: No such file or directory $ zfs snap tank@s $ ls /tank/.zfs/snapshot/s ls: cannot access /tank/.zfs/snapshot/s: No such file or directory To remedy this, always invalidate cached dentries in the snapshot control directory. Since these entries never exist on disk there is no significant performance penalty for the extra lookups. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1192
* Improve AF hard disk detectionBrian Behlendorf2012-11-151-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | Use the bdev_physical_block_size() interface to determine the minimize write size which can be issued without incurring a read-modify-write operation. This is used to set the ashift correctly to prevent a performance penalty when using AF hard disks. Unfortunately, this interface isn't entirely reliable because it's not uncommon for disks to misreport this value. For this reason you may still need to manually set your ashift with: zpool create -o ashift=12 ... The solution to this in the upstream Illumos source was to add a white list of known offending drives. Maintaining such a list will be a burden, but it still may be worth doing if we can detect a large number of these drives. This should be considered as future work. Reported-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #916