summaryrefslogtreecommitdiffstats
path: root/config
Commit message (Collapse)AuthorAgeFilesLines
* Add support for AVX-512 family of instruction setsGvozden Neskovic2016-08-161-0/+189
| | | | | | | | | | | | | This patch adds compiler and runtime tests (user and kernel) for following instruction sets: avx512f, avx512cd, avx512er, avx512pf, avx512bw, avx512dq, avx512vl, avx512ifma, avx512vbmi. note: Linux support for AVX-512F (Foundation) instruction set started with linux v3.15 Signed-off-by: Gvozden Neskovic <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4952
* Use file_dentry and file_inode wrappersChen Haiquan2016-08-112-0/+21
| | | | | | | | | | | | Fix bugs due to kernel change in torvalds/linux@4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay"). This problem crashes system when use zfs as a layer of overlayfs. Signed-off-by: Chen Haiquan <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4914 Closes #4935
* Linux 4.8 compat: Fix removal of bio->bi_rw memberBrian Behlendorf2016-08-112-0/+71
| | | | | | | | | | | | | | | | | | All users of bio->bi_rw have been replaced with compatibility wrappers. This allows the kernel specific logic to be abstracted away, and for each of the supported cases to be documented with the wrapper. The updated interfaces are as follows: * void blk_queue_set_write_cache(struct request_queue *, bool, bool) * boolean_t bio_is_flush(struct bio *) * boolean_t bio_is_fua(struct bio *) * boolean_t bio_is_discard(struct bio *) * boolean_t bio_is_secure_erase(struct bio *) * VDEV_WRITE_FLUSH_FUA Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #4951
* Linux 4.7 compat: fix zpl_get_acl returns invalid acl pointerChunwei Chen2016-08-092-0/+21
| | | | | | | | | | | | | | | | | Starting from Linux 4.7, get_acl will set acl cache pointer to temporary sentinel value before calling i_op->get_acl. Therefore we can't compare against ACL_NOT_CACHED and return. Since from Linux 3.14, get_acl already check the cache for us, so we disable this in zpl_get_acl. Linux 4.7 also does set_cached_acl for us so we disable it in zpl_get_acl. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4944 Closes #4946
* Linux 4.8 compat: posix_acl_valid()Brian Behlendorf2016-08-082-0/+25
| | | | | | | | | | | | | | The posix_acl_valid() function has been updated to require a user namespace. Filesystem callers should normally provide the user_ns from the super block associcated with the ACL; the zpl_posix_acl_valid() wrapper has been added for this purpose. See https://github.com/torvalds/linux/commit/0d4d717f for complete details. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #4922
* Retire HAVE_CURRENT_UMASK and HAVE_POSIX_ACL_CACHINGBrian Behlendorf2016-08-082-41/+0
| | | | | | | | | | Remove ZFS_AC_KERNEL_CURRENT_UMASK and ZFS_AC_KERNEL_POSIX_ACL_CACHING configure checks, all supported kernel provide this functionality. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #4922
* Linux 4.8 compat: new s_user_ns member of struct super_blockNikolay Borisov2016-08-082-0/+22
| | | | | | | | | | | | | Kernel 4.8 paved the way to enabling mounting a file system inside a non-init user namespace. To facilitate this a s_user_ns member was added holding the userns in which the filesystem's instance was mounted. This enables doing the uid/gid translation relative to this particular username space and not the default init_user_ns. Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4928
* Linux 4.8 compat: submit_bio()Brian Behlendorf2016-07-292-0/+21
| | | | | | | | | | | | | The rw argument has been removed from submit_bio/submit_bio_wait. Callers are now expected to set bio->bi_rw instead of passing it in. See https://github.com/torvalds/linux/commit/4e49ea4a for complete details. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4892 Issue #4899
* Fix sync behavior for disk vdevsTim Chase2016-07-252-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to b39c22b, which was first generally available in the 0.6.5 release as b39c22b, ZoL never actually submitted synchronous read or write requests to the Linux block layer. This means the vdev_disk_dio_is_sync() function had always returned false and, therefore, the completion in dio_request_t.dr_comp was never actually used. In b39c22b, synchronous ZIO operations were translated to synchronous BIO requests in vdev_disk_io_start(). The follow-on commits 5592404 and aa159af fixed several problems introduced by b39c22b. In particular, 5592404 introduced the new flag parameter "wait" to __vdev_disk_physio() but under ZoL, since vdev_disk_physio() is never actually used, the wait flag was always zero so the new code had no effect other than to cause a bug in the use of the dio_request_t.dr_comp which was fixed by aa159af. The original rationale for introducing synchronous operations in b39c22b was to hurry certains requests through the BIO layer which would have otherwise been subject to its unplug timer which would increase the latency. This behavior of the unplug timer, however, went away during the transition of the plug/unplug system between kernels 2.6.32 and 2.6.39. To handle the unplug timer behavior on 2.6.32-2.6.35 kernels the BIO_RW_UNPLUG flag is used as a hint to suppress the plugging behavior. For kernels 2.6.36-2.6.38, the REQ_UNPLUG macro will be available and ise used for the same purpose. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4858
* Check whether the kernel supports i_uid/gid_read/write helpersNikolay Borisov2016-07-252-0/+23
| | | | | | | | | | | | | Since the concept of a kuid and the need to translate from it to ordinary integer type was added in kernel version 3.5 implement necessary plumbing to be able to detect this condition during compile time. If the kernel doesn't support the kuid then just fall back to directly accessing the respective struct inode's members Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4685 Issue #227
* Illumos Crypto Port module added to enable native encryption in zfsTom Caputi2016-07-204-23/+23
| | | | | | | | | | | | | | | | | | | | A port of the Illumos Crypto Framework to a Linux kernel module (found in module/icp). This is needed to do the actual encryption work. We cannot use the Linux kernel's built in crypto api because it is only exported to GPL-licensed modules. Having the ICP also means the crypto code can run on any of the other kernels under OpenZFS. I ended up porting over most of the internals of the framework, which means that porting over other API calls (if we need them) should be fairly easy. Specifically, I have ported over the API functions related to encryption, digests, macs, and crypto templates. The ICP is able to use assembly-accelerated encryption on amd64 machines and AES-NI instructions on Intel chips that support it. There are place-holder directories for similar assembly optimizations for other architectures (although they have not been written). Signed-off-by: Tom Caputi <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4329
* Add configure result for xattr_handlerChunwei Chen2016-07-121-0/+2
| | | | | | Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4828
* OpenZFS 6314 - buffer overflow in dsl_dataset_nameIgor Kozhukhov2016-06-282-22/+0
| | | | | | | | | | | Reviewed by: George Wilson <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/6314 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d6160ee
* OpenZFS 2605, 6980, 6902Matthew Ahrens2016-06-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 2605 want to resume interrupted zfs send Reviewed by: George Wilson <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Xin Li <[email protected]> Reviewed by: Arne Jansen <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: kernelOfTruth <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/2605 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/9c3fd12 6980 6902 causes zfs send to break due to 32-bit/64-bit struct mismatch Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/6980 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ea4a67f Porting notes: - All rsend and snapshop tests enabled and updated for Linux. - Fix misuse of input argument in traverse_visitbp(). - Fix ISO C90 warnings and errors. - Fix gcc 'missing braces around initializer' in 'struct send_thread_arg to_arg =' warning. - Replace 4 argument fletcher_4_native() with 3 argument version, this change was made in OpenZFS 4185 which has not been ported. - Part of the sections for 'zfs receive' and 'zfs send' was rewritten and reordered to approximate upstream. - Fix mktree xattr creation, 'user.' prefix required. - Minor fixes to newly enabled test cases - Long holds for volumes allowed during receive for minor registration.
* Linux 4.7 compat: handler->set() takes both dentry and inodeBrian Behlendorf2016-06-011-29/+54
| | | | | | | | | Counterpart to fd4c7b7, the same approach was taken to resolve the compatibility issue. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #4717 Issue #4665
* Linux 4.7 compat: use iterate_shared for concurrent readdirChunwei Chen2016-05-201-14/+35
| | | | | | | | | | | | | Register iterate_shared if it exists so the kernel will used shared lock and allowing concurrent readdir. Also, use shared lock when doing llseek with SEEK_DATA or SEEK_HOLE to allow concurrent seeking. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4664 Closes #4665
* Fix config for posix_acl_release() GPL testChunwei Chen2016-05-201-16/+17
| | | | | | | | | The GPL test for posix_acl_release() didn't include <linux/module.h>. Also run this test only when posix_acl_release() exists. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4665
* Linux 4.7 compat: replace blk_queue_flush with blk_queue_write_cacheChunwei Chen2016-05-201-8/+47
| | | | | | Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4665
* Linux 4.7 compat: handler->get() takes both dentry and inodeChunwei Chen2016-05-201-27/+50
| | | | | | Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4665
* Add support for libtirpcBrian Behlendorf2016-04-282-0/+31
| | | | | | | | | | | | | | | | | | | | | While OpenSolaris libc and glibc both include XDR support, the musl libc does not in favor of depending on the BSD-licensed libtirpc library. Adding support is a simple matter of detecting the library, including the headers and linking against it. By default libtirpc will be checked for and if available used. Otherwise, configure will fall back to using the xdr implementation provided by libc if available. The options --with-tirpc/--without-tirpc can be used to disable this checking. In addition, the xdr_control() function has been simplied to only handle ZFSs specific use case. Original-patch-by: stf <[email protected]> Original-patch-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Signed-off-by: Carlo Landmeter <[email protected]> Closes #2254 Closes #4559
* Linux 4.5 compat: Use xattr_handler->name for aclChunwei Chen2016-04-252-0/+26
| | | | | | | | | | | | | | Linux 4.5 added member "name" to xattr_handler. xattr_handler which matches to whole name rather than prefix should use "name" instead of "prefix". Otherwise, kernel will return with EINVAL when it tries to resolve handlers. Also, we remove the strcmp checks when xattr_handler has name, because xattr_resolve_name will do the check for us. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4549 Closes #4537
* Fix WANT_DEVNAME2DEVID configure errorBrian Behlendorf2016-04-012-6/+7
| | | | | | | | | | Accidentally introduced by commit e4023e4. The AM_CONDITIONAL cannot be located where it can be invoked conditionally, as in the `--with-config=user` case. Relocate it to the top level ZFS_AC_CONFIG macro along with the other AM_CONDITIONALs. Signed-off-by: Brian Behlendorf <[email protected]> Issue #4416
* Only build devname2devid when libudev headers are availableBrian Behlendorf2016-03-311-3/+10
| | | | | | | | | Accidentally introduced by commit 39fc0cb. The devname2devid utility which depends on libudev must only be built when libudev headers are available. This is accomplished through an AM_CONDITIONAL. Signed-off-by: Brian Behlendorf <[email protected]> Issue #4416
* Add support for devid and phys_path keys in vdev disk labelsDon Brady2016-03-312-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is foundational work for ZED. Updates a leaf vdev's persistent device strings on Linux platform * only applies for a dedicated leaf vdev (aka whole disk) * updated during pool create|add|attach|import * used for matching device matching during auto-{online,expand,replace} * stored in a leaf disk config label (i.e. alongside 'path' NVP) * can opt-out using env var ZFS_VDEV_DEVID_OPT_OUT=YES Some examples: path: '/dev/sdb1' devid: 'scsi-350000394a8ca4fbc-part1' phys_path: 'pci-0000:04:00.0-sas-0x50000394a8ca4fbf-lun-0' path: '/dev/mapper/mpatha' devid: 'dm-uuid-mpath-35000c5006304de3f' Signed-off-by: Don Brady <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2856 Closes #3978 Closes #4416
* Support for vectorized algorithms on x86Gvozden Neskovic2016-03-214-1/+193
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is initial support for x86 vectorized implementations of ZFS parity and checksum algorithms. For the compilation phase, configure step checks if toolchain supports relevant instruction sets. Each implementation must ensure that the code is not passed to compiler if relevant instruction set is not supported. For this purpose, following new defines are provided if instruction set is supported: - HAVE_SSE, - HAVE_SSE2, - HAVE_SSE3, - HAVE_SSSE3, - HAVE_SSE4_1, - HAVE_SSE4_2, - HAVE_AVX, - HAVE_AVX2. For detecting if an instruction set can be used in runtime, following functions are provided in (include/linux/simd_x86.h): - zfs_sse_available() - zfs_sse2_available() - zfs_sse3_available() - zfs_ssse3_available() - zfs_sse4_1_available() - zfs_sse4_2_available() - zfs_avx_available() - zfs_avx2_available() - zfs_bmi1_available() - zfs_bmi2_available() These function should be called once, on module load, or initialization. They are safe to use from user and kernel space. If an implementation is using more than single instruction set, both compiler and runtime support for all relevant instruction sets should be checked. Kernel fpu methods: - kfpu_begin() - kfpu_end() Use __get_cpuid_max and __cpuid_count from <cpuid.h> Both gcc and clang have support for these. They also handle ebx register in case it is used for PIC code. Signed-off-by: Gvozden Neskovic <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #4381
* Cleanup linkingRichard Yao2016-03-182-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I noticed during code review of zfsonlinux/zfs#4385 that the author of a commit had peppered the various Makefile.am files with `$(TIRPC_LIBS)` when putting it into `lib/libspl/Makefile.am` should have sufficed. Upon further examination, it seems that he had copied what we do with `$(ZLIB)`. We also have a bit of that with `-ldl` too. Unfortunately, what we do is wrong, so lets fix it to set a good example for future contributors. In addition, we have multiple `-lz` and `-luuid` passed to the compiler because each `AC_CHECK_LIB` adds it to `$LIBS`. That is somewhat annoying to see, so we switch to `AC_SEARCH_LIBS` to avoid it. This is consistent with the recommendation to use `AC_SEARCH_LIBS` over `AC_CHECK_LIB` by autotools upstream: https://www.gnu.org/software/autoconf/manual/autoconf-2.66/html_node/Libraries.html In an ideal world, this would translate into improvements in ELF's `DT_NEEDED` entries, but that is not the case because of a couple of bugs in libtool. The first bug causes libtool to overlink by using static link dependencies for dynamic linking: https://wiki.mageia.org/en/Overlinking_issues_in_packaging#libtool_issues The workaround for this should be to pass `-Wl,--as-needed` in `LDFLAGS`. That leads us to the second bug, where libtool passes `LDFLAGS` after the libraries are specified and `ld` will only honor `--as-needed` on libraries specified before it: https://sigquit.wordpress.com/2011/02/16/why-asneeded-doesnt-work-as-expected-for-your-libraries-on-your-autotools-project/ There are a few possible workarounds for the second bug. One is to either patch the compiler spec file to specify `-Wl,--as-needed` or pass `-Wl,--as-needed` via `CC` like `CC='gcc -Wl,--as-needed'` so that it is specified early. Another is to patch ltmain.sh like Gentoo does: https://gitweb.gentoo.org/repo/gentoo.git/tree/eclass/ELT-patches/as-needed Without one of those workarounds, this cleanup provides no benefit in terms of `DT_NEEDED` entry generation. It should still be an improvement because it nicely simplifies the code while encouraging good habits when patching autotools scripts. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4426
* Add the ZFS Test SuiteBrian Behlendorf2016-03-164-4/+202
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the ZFS Test Suite and test-runner framework from illumos. This is a continuation of the work done by Turbo Fredriksson to port the ZFS Test Suite to Linux. While this work was originally conceived as a stand alone project integrating it directly with the ZoL source tree has several advantages: * Allows the ZFS Test Suite to be packaged in zfs-test package. * Facilitates easy integration with the CI testing. * Users can locally run the ZFS Test Suite to validate ZFS. This testing should ONLY be done on a dedicated test system because the ZFS Test Suite in its current form is destructive. * Allows the ZFS Test Suite to be run directly in the ZoL source tree enabled developers to iterate quickly during development. * Developers can easily add/modify tests in the framework as features are added or functionality is changed. The tests will then always be in sync with the implementation. Full documentation for how to run the ZFS Test Suite is available in the tests/README.md file. Warning: This test suite is designed to be run on a dedicated test system. It will make modifications to the system including, but not limited to, the following. * Adding new users * Adding new groups * Modifying the following /proc files: * /proc/sys/kernel/core_pattern * /proc/sys/kernel/core_uses_pid * Creating directories under / Notes: * Not all of the test cases are expected to pass and by default these test cases are disabled. The failures are primarily due to assumption made for illumos which are invalid under Linux. * When updating these test cases it should be done in as generic a way as possible so the patch can be submitted back upstream. Most existing library functions have been updated to be Linux aware, and the following functions and variables have been added. * Functions: * is_linux - Used to wrap a Linux specific section. * block_device_wait - Waits for block devices to be added to /dev/. * Variables: Linux Illumos * ZVOL_DEVDIR "/dev/zvol" "/dev/zvol/dsk" * ZVOL_RDEVDIR "/dev/zvol" "/dev/zvol/rdsk" * DEV_DSKDIR "/dev" "/dev/dsk" * DEV_RDSKDIR "/dev" "/dev/rdsk" * NEWFS_DEFAULT_FS "ext2" "ufs" * Many of the disabled test cases fail because 'zfs/zpool destroy' returns EBUSY. This is largely causes by the asynchronous nature of device handling on Linux and is expected, the impacted test cases will need to be updated to handle this. * There are several test cases which have been disabled because they can trigger a deadlock. A primary example of this is to recursively create zpools within zpools. These tests have been disabled until the root issue can be addressed. * Illumos specific utilities such as (mkfile) should be added to the tests/zfs-tests/cmd/ directory. Custom programs required by the test scripts can also be added here. * SELinux should be either is permissive mode or disabled when running the tests. The test cases should be updated to conform to a standard policy. * Redundant test functionality has been removed (zfault.sh). * Existing test scripts (zconfig.sh) should be migrated to use the framework for consistency and ease of testing. * The DISKS environment variable currently only supports loopback devices because of how the ZFS Test Suite expects partitions to be named (p1, p2, etc). Support must be added to generate the correct partition name based on the device location and name. * The ZFS Test Suite is part of the illumos code base at: https://github.com/illumos/illumos-gate/tree/master/usr/src/test Original-patch-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6 Closes #1534
* Require libblkidBrian Behlendorf2016-03-091-106/+6
| | | | | | | | | | | | | | | | | | | | | | Historically libblkid support was detected as part of configure and optionally enabled. This was done because at the time support for detecting ZFS pool vdevs had just be added to libblkid and those updated packages were not yet part of many distributions. This is no longer the case and any reasonably current distribution will ship a version of libblkid which can detect ZFS pool vdevs. This patch makes libblkid mandatory at build time and libblkid the preferred method of scanning for ZFS pools. For distributions which include a modern version of libblkid there is no change in behavior. Explicitly scanning the default search paths is still supported and can be enabled with the '-s' command line option. Additionally making libblkid mandatory means that the 'zpool create' command can reliably detect if a specified device has an existing non-ZFS filesystem (ext4, xfs) and print a warning. Signed-off-by: Brian Behlendorf <[email protected]> Closes #2448
* Add support for alpine linuxCarlo Landmeter2016-03-081-1/+6
| | | | | | | | | Both Alpine Linux and Gentoo use OpenRC so we share its logic Signed-off-by: Carlo Landmeter <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4386
* Make configure error clearer when failing to find SPLOlaf Faaland2016-02-171-3/+23
| | | | | | | Signed-off-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #4251
* Linux 4.5 compat: xattr list handlerBrian Behlendorf2016-01-201-58/+170
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The registered xattr .list handler was simplified in the 4.5 kernel to only perform a permission check. Given a dentry for the file it must return a boolean indicating if the name is visible. This differs slightly from the previous APIs which also required the function to copy the name in to the provided list and return its size. That is now all the responsibility of the caller. This should be straight forward change to make to ZoL since we've always required the caller to make the copy. However, this was slightly complicated by the need to support 3 older APIs. Yes, between 2.6.32 and 4.5 there are 4 versions of this interface! Therefore, while the functional change in this patch is small it includes significant cleanup to make the code understandable and maintainable. These changes include: - Improved configure checks for .list, .get, and .set interfaces. - Interfaces checked from newest to oldest. - Strict checking for each possible known interface. - Configure fails when no known interface is available. - HAVE_*_XATTR_LIST renamed HAVE_XATTR_LIST_* for consistency with similar iops and fops configure checks. - POSIX_ACL_XATTR_{DEFAULT|ACCESS} were removed forcing callers to move to their replacements, XATTR_NAME_POSIX_ACL_{DEFAULT|ACCESS}. Compatibility wrapper were added for old kernels. - ZPL_XATTR_LIST_WRAPPER added which behaves the same as the existing ZPL_XATTR_{GET|SET} WRAPPERs. Only the inode is guaranteed to be a valid pointer, passing NULL for the 'list' and 'name' variables is allowed and must be checked for. All .list functions were updated to use the wrapper to aid readability. - zpl_xattr_filldir() updated to use the .list function for its permission check which is consistent with the updated Linux 4.5 interface. If a .list function is registered it should return 0 to indicate a name should be skipped, if there is no registered function the name will be added. - Additional documentation from xattr(7) describing the correct behavior for each namespace was added before the relevant handlers. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Issue #4228
* Linux 4.5 compat: get_link() / put_link()Brian Behlendorf2016-01-205-48/+161
| | | | | | | | | | | | | | | | | | | | The follow_link() interface was retired in favor of get_link(). In the process of phasing in get_link() the Linux kernel went through two different versions. The first of which depended on put_link() and the final version on a delayed done function. - Improved configure checks for .follow_link, .get_link, .put_link. - Interfaces checked from newest to oldest. - Strict checking for each possible known interface. - Configure fails when no known interface is available. - Both versions .get_link are detected and supported as well two previous versions of .follow_link. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Issue #4228
* Illumos 3557, 3558, 3559, 3560George Wilson2016-01-152-19/+0
| | | | | | | | | | | | | | | | | | | | | | | | | 3557 dumpvp_size is not updated correctly when a dump zvol's size is changed 3558 setting the volsize on a dump device does not return back ENOSPC 3559 setting a volsize larger than the space available sometimes succeeds 3560 dumpadm should be able to remove a dump device Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Christopher Siden <[email protected]> Approved by: Albert Lee <[email protected]> References: https://www.illumos.org/issues/3559 https://github.com/illumos/illumos-gate/commit/c61ea56 Porting notes: - Internal zvol.c changes not applied due to implementation differences. The external interface and behavior was already consistent with the latest upstream code. - Retired 2.6.28 HAVE_CHECK_DISK_SIZE_CHANGE configure check. All supported kernels (2.6.32 and newer) provide this interface. Ported-by: Brian Behlendorf <[email protected]> Closes #4217
* Skip GPL-only symbols test when cross-compilingKamil Domański2015-12-181-8/+10
| | | | | | Signed-off-by: Kamil Domański <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4107
* Use large stacks when availableBrian Behlendorf2015-12-072-2/+29
| | | | | | | | | | | | | | | | | While stack size will vary by architecture it has historically defaulted to 8K on x86_64 systems. However, as of Linux 3.15 the default thread stack size was increased to 16K. These kernels are now the default in most non- enterprise distributions which means we no longer need to assume 8K stacks. This patch takes advantage of that fact by appropriately reverting stack conservation changes which were made to ensure stability. Changes which may have had a negative impact on performance for certain workloads. This also has the side effect of bringing the code slightly more in line with upstream. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #4059
* Linux 4.4 compat: xattr operations takes xattr_handlerChunwei Chen2015-12-011-0/+69
| | | | | | | | | | The xattr_hander->{list,get,set} were changed to take a xattr_handler, and handler_flags argument was removed and should be accessed by handler->flags. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4021
* Linux 4.4 compat: make_request_fn returns blk_qc_tChunwei Chen2015-12-011-2/+24
| | | | | | | | | | | As part of block polling support in Linux 4.4, make_request_fn should return a cookie value of type blk_qc_t. For now, we make zvol_request always return BLK_QC_T_NONE until we assess whether and how we want to support block polling. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4021
* Fix synchronous behavior in __vdev_disk_physio()Brian Behlendorf2015-09-252-53/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed side-effect of making the vdev_disk_io_start() synchronous for certain I/Os. This in turn resulted in vdev_disk_io_start() being able to re-dispatch zio's which would result in a RCU stalls when a disk was removed from the system. Additionally, this could negatively impact performance and explains the performance regressions reported in both #3829 and #3780. This patch resolves the issue by making the blocking behavior dependent on a 'wait' flag being passed rather than overloading the passed bio flags. Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to non-rotational devices where there is no benefit to queuing to aggregate the I/O. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3652 Issue #3780 Issue #3785 Issue #3817 Issue #3821 Issue #3829 Issue #3832 Issue #3870
* Linux 4.3 compat: bio_end_io_t / BIO_UPTODATELukas Wunner2015-09-251-11/+9
| | | | | | | | | | | | | | | | | | Commit torvalds/linux@4246a0b63bd8f56a1469b12eafeb875b1041a451 ("block: add a bi_error field to struct bio") dropped the error argument from bio_endio in favor of newly introduced bio->bi_error. This also replaces bio->bi_flags value BIO_UPTODATE. bio_endio was a 3 argument function until Linux 2.6.24, which made it a 2 argument function, and now the prototype has changed yet again to a 1 argument function. Support for pre 2.6.24 kernels was already dropped with 37f9dac592bf ("zvol processing should use struct bio") which assumed the 2 argument version in zvol_request(). Remaining code to support the 3 argument version is hereby removed. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Issue #3799
* Reintroduce IO accounting on zvols on Linux 3.19+Richard Yao2015-09-092-0/+27
| | | | | | | | | | | | | | zfsonlinux/zfs@e20cd6f7a8922709b1aa2ecefd783390102d79e0 caused us to lose IO accounting on zvols. When I originally wrote that last year, the symbols we needed to maintain IO accounting were GPL exported, but torvalds/linux@394ffa503bc40e32d7f54a9b817264e81ce131b4 provided suitable symbols for restoring this functionality 4 months later. We can call them to restore the IO accounting on Linux 3.19 and later as well as any older kernels where that patch is backported. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3741
* Remove blk_queue_nonrot() autotools checkRichard Yao2015-09-042-26/+0
| | | | | | | | | | This autotools check was never needed because we can check for the existence of QUEUE_FLAG_NONROT in the kernel headers. Also, the comment in config/kernel-blk-queue-nonrot.m4 is incorrect. This was a Linux 2.6.28 API change, not a Linux 2.6.27 API change. Signed-off-by: Richard Yao <[email protected]>
* Remove blk_queue_discard() autotools checkRichard Yao2015-09-042-23/+0
| | | | | | | This autotools check was never needed because we can check for the existence of QUEUE_FLAG_DISCARD in the kernel headers. Signed-off-by: Richard Yao <[email protected]>
* Remove blk_rq_bytes()/blk_rq_sectors autotools checksRichard Yao2015-09-043-64/+0
| | | | Signed-off-by: Richard Yao <[email protected]>
* Remove blk_rq_pos() autotools checkRichard Yao2015-09-042-22/+0
| | | | Signed-off-by: Richard Yao <[email protected]>
* Remove blk_fetch_request() autotools checkRichard Yao2015-09-042-26/+0
| | | | Signed-off-by: Richard Yao <[email protected]>
* Remove blk_requeue_request() autotools checkRichard Yao2015-09-042-26/+0
| | | | Signed-off-by: Richard Yao <[email protected]>
* Remove blk_end_request() autotools check.Richard Yao2015-09-042-41/+0
| | | | Signed-off-by: Richard Yao <[email protected]>
* Remove rq_is_sync() autotools checkRichard Yao2015-09-042-22/+0
| | | | Signed-off-by: Richard Yao <[email protected]>
* Remove rq_for_each_segment() autotools checkRichard Yao2015-09-042-48/+0
| | | | Signed-off-by: Richard Yao <[email protected]>
* zvol processing should use struct bioRichard Yao2015-09-045-0/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Internally, zvols are files exposed through the block device API. This is intended to reduce overhead when things require block devices. However, the ZoL zvol code emulates a traditional block device in that it has a top half and a bottom half. This is an unnecessary source of overhead that does not exist on any other OpenZFS platform does this. This patch removes it. Early users of this patch reported double digit performance gains in IOPS on zvols in the range of 50% to 80%. Comments in the code suggest that the current implementation was done to obtain IO merging from Linux's IO elevator. However, the DMU already does write merging while arc_read() should implicitly merge read IOs because only 1 thread is permitted to fetch the buffer into ARC. In addition, commercial ZFSOnLinux distributions report that regular files are more performant than zvols under the current implementation, and the main consumers of zvols are VMs and iSCSI targets, which have their own elevators to merge IOs. Some minor refactoring allows us to register zfs_request() as our ->make_request() handler in place of the generic_make_request() function. This eliminates the layer of code that broke IO requests on zvols into a top half and a bottom half. This has several benefits: 1. No per zvol spinlocks. 2. No redundant IO elevator processing. 3. Interrupts are disabled only when actually necessary. 4. No redispatching of IOs when all taskq threads are busy. 5. Linux's page out routines will properly block. 6. Many autotools checks become obsolete. An unfortunate consequence of eliminating the layer that generic_make_request() is that we no longer calls the instrumentation hooks for block IO accounting. Those hooks are GPL-exported, so we cannot call them ourselves and consequently, we lose the ability to do IO monitoring via iostat. Since zvols are internally files mapped as block devices, this should be okay. Anyone who is willing to accept the performance penalty for the block IO layer's accounting could use the loop device in between the zvol and its consumer. Alternatively, perf and ftrace likely could be used. Also, tools like latencytop will still work. Tools such as latencytop sometimes provide a better view of performance bottlenecks than the traditional block IO accounting tools do. Lastly, if direct reclaim occurs during spacemap loading and swap is on a zvol, this code will deadlock. That deadlock could already occur with sync=always on zvols. Given that swap on zvols is not yet production ready, this is not a blocker. Signed-off-by: Richard Yao <[email protected]>