aboutsummaryrefslogtreecommitdiffstats
path: root/include/os
Commit message (Collapse)AuthorAgeFilesLines
...
* Linux 6.5 compat: disk_check_media_change() was addedColeman Kane2023-07-211-0/+2
| | | | | | | | | | | The disk_check_media_change() function was added which replaces bdev_check_media_change. This change was introduced in 6.5rc1 444aa2c58cb3b6cfe3b7cc7db6c294d73393a894 and the new function takes a gendisk* as its argument, no longer a block_device*. Thus, bdev->bd_disk is now used to pass the expected data. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #15060
* Linux 6.5 compat: BLK_STS_NEXUS renamed to BLK_STS_RESV_CONFLICTColeman Kane2023-07-211-0/+8
| | | | | | | | | This change was introduced in Linux commit 7ba150834b840f6f5cdd07ca69a4ccf39df59a66 Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #15059
* Linux 6.5 compat: intptr_t definition is canonically signedColeman Kane2023-07-211-1/+1
| | | | | | | | | | | Make the version here match that elsewhere in the kernel and system headers. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #15058 Signed-off-by: Brian Behlendorf <[email protected]>
* FreeBSD: catch up to __FreeBSD_version 1400093Mateusz Guzik2023-07-201-0/+4
| | | | | Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #15036
* Add a delay to tearing down threads.Rich Ercolani2023-06-261-0/+1
| | | | | | | | | | | | | | | | | | It's been observed that in certain workloads (zvol-related being a big one), ZFS will end up spending a large amount of time spinning up taskqs only to tear them down again almost immediately, then spin them up again... I noticed this when I looked at what my mostly-idle system was doing and wondered how on earth taskq creation/destroy was a bunch of time... So I added a configurable delay to avoid it tearing down tasks the first time it notices them idle, and the total number of threads at steady state went up, but the amount of time being burned just tearing down/turning up new ones almost vanished. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #14938
* Finally drop long disabled vdev cache.Alexander Motin2023-06-091-1/+0
| | | | | | | | | | | | | | | | | | | It was a vdev level read cache, designed to aggregate many small reads by speculatively issuing bigger reads instead and caching the result. But since it has almost no idea about what is going on with exception of ZIO_FLAG_DONT_CACHE flag set by higher layers, it was found to make more harm than good, for which reason it was disabled for the past 12 years. These days we have much better instruments to enlarge the I/Os, such as speculative and prescient prefetches, I/O scheduler, I/O aggregation etc. Besides just the dead code removal this removes one extra mutex lock/unlock per write inside vdev_cache_write(), not otherwise disabled and trying to do some work. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14953
* Use __attribute__((malloc)) on memory allocation functionsRichard Yao2023-05-263-11/+14
| | | | | | | | | | | | | | This informs the C compiler that pointers returned from these functions do not alias other functions, which allows it to do better code optimization and should make the compiled code smaller. References: https://stackoverflow.com/a/53654773 https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-malloc-function-attribute https://clang.llvm.org/docs/AttributeReference.html#malloc Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14827
* zil: Add some more statistics.Alexander Motin2023-05-251-0/+34
| | | | | | | | | | | | | | | | In addition to a number of actual log bytes written, account also a total written bytes including padding and total allocated bytes (bytes <= write <= alloc). It should allow to monitor zil traffic and space efficiency. Add dtrace probe for zil block size selection. Make zilstat report more information and fit it into less width. Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14863
* Wrap clang specific pragmaBrian Behlendorf2023-05-021-0/+4
| | | | | | | | Clang specific pragmas need to be wrapped to prevent a build warning when compiling with gcc. Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #14814
* powerpc64: Support ELFv2 asm on Big EndianJustin Hibbits2023-04-271-1/+1
| | | | | | | | | | | FreeBSD/powerpc64 is all ELFv2 since FreeBSD 13, even big endian. The existing sha256 and sha512 asm code assumes that BE is all ELFv1, and LE is ELFv2. Minor changes to add ELFv2 in the BE side gets this working correctly on FreeBSD with latest OpenZFS import. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Justin Hibbits <[email protected]> Closes #14779
* Add loongarch64 supportHan Gao2023-04-251-1/+17
| | | | | | | | | | | Add loongarch64 definitions & lua module setjmp asm LoongArch is a new RISC ISA, which is a bit like MIPS or RISC-V. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Han Gao <[email protected]> Signed-off-by: WANG Xuerui <[email protected]> Closes #13422
* Linux: Suppress -Wordered-compare-function-pointers in tracepoint codeRichard Yao2023-04-201-0/+4
| | | | | | | | | | | | Clang points out that there is a comparison against -1, but we cannot fix it because that is from the kernel headers, which we must support. We can workaround this by using a pragma. Sponsored-By: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Youzhong Yang <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14738
* Linux 6.3 compat: idmapped mount API changesyouzhongyang2023-04-109-37/+92
| | | | | | | | | Linux kernel 6.3 changed a bunch of APIs to use the dedicated idmap type for mounts (struct mnt_idmap), we need to detect these changes and make zfs work with the new APIs. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Youzhong Yang <[email protected]> Closes #14682
* module: freebsd: fix aarch64 fpu handlingKyle Evans2023-04-101-2/+12
| | | | | | | | | | | | Just like x86, aarch64 needs to use the fpu_kern(9) API around FPU usage, otherwise we panic promptly at boot as soon as ZFS attempts to do checksum benchmarking. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Kyle Evans <[email protected]> Closes #14715
* Miscellaneous FreBSD compilation bugfixesMartin Matuška2023-04-064-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | Add missing machine/md_var.h to spl/sys/simd_aarch64.h and spl/sys/simd_arm.h In spl/sys/simd_x86.h, PCB_FPUNOSAVE exists only on amd64, use PCB_NPXNOSAVE on i386 In FreeBSD sys/elf_common.h redefines AT_UID and AT_GID on FreeBSD, we need a hack in vnode.h similar to Linux. sys/simd.h needs to be included early. In zfs_freebsd_copy_file_range() we pass a (size_t *)lenp to zfs_clone_range() that expects a (uint64_t *) Allow compiling armv6 world by limiting ARM macros in sha256_impl.c and sha512_impl.c to __ARM_ARCH > 6 Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Signed-off-by: WHR <[email protected]> Signed-off-by: Martin Matuska <[email protected]> Closes #14674
* linux 6.3 compat: needs REQ_PREFLUSH | REQ_OP_WRITEyouzhongyang2023-03-311-1/+1
| | | | | | | | | Modify bio_set_flush() so if kernel version is >= 4.10, flags REQ_PREFLUSH and REQ_OP_WRITE are set together. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Youzhong Yang <[email protected]> Closes #14695
* linux 6.3 compat: add another bdev_io_acct caseRich Ercolani2023-03-271-2/+8
| | | | | | | | | Linux 6.3+, and backports from it (6.2.8+), changed the signatures on bdev_io_{start,end}_acct. Add a case for it. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #14658 Closes #14668
* spl: cmn_err_once() should be usable in brace-less if else statementsAttila Fülöp2023-03-152-12/+12
| | | | | | | | | | | Commit 11913870 (#14567) added cmn_err_once() by #define'ing a compound statement but failed to consider usage in a single statement brace-less if else. Fix the problem by using the common "do {} while (0)" construct. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes #14629
* Refactor CONFIG_SPE check on Linux/powerpcWHR2023-03-151-18/+10
| | | | | | | | | | | | Commit 5401472 adds a check to call enable_kernel_spe and disable_kernel_spe only if CONFIG_SPE is defined. Refactor this check in a way similar to what CONFIG_ALTIVEC and CONFIG_VSX are checked, in order to remove redundant kfpu_begin() and kfpu_end() implementations. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: WHR <[email protected]> Closes #14623
* Fix missing semicolons in commit 1f196e3WHR2023-03-151-4/+4
| | | | | | | Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: WHR <[email protected]> Closes #14623
* Silence clang static analyzer warnings about stored stack addressesRichard Yao2023-03-141-3/+0
| | | | | | | | | | | | | | Clang's static analyzer complains that nvs_xdr() and nvs_native() functions return pointers to stack memory. That is technically true, but the pointers are stored in stack memory from the caller's stack frame, are not read by the caller and are deallocated when the caller returns, so this is harmless. We set the pointers to NULL to silence the warnings. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14612
* Implementation of block cloning for ZFSPawel Jakub Dawidek2023-03-103-2/+7
| | | | | | | | | | | | | | | Block Cloning allows to manually clone a file (or a subset of its blocks) into another (or the same) file by just creating additional references to the data blocks without copying the data itself. Those references are kept in the Block Reference Tables (BRTs). The whole design of block cloning is documented in module/zfs/brt.c. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Christian Schwarz <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rich Ercolani <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #13392
* Workaround for Linux PowerPC GPL-only cpu_has_feature()Low-power2023-03-102-0/+26
| | | | | | | | | | | | | | | | | | | Linux since 4.7 makes interface 'cpu_has_feature' to use jump labels on powerpc if CONFIG_JUMP_LABEL_FEATURE_CHECKS is enabled, in this case however the inline function references GPL-only symbol 'cpu_feature_keys'. ZFS currently uses 'cpu_has_feature' either directly or indirectly from several places; while it is unknown how this issue didn't break ZFS on 64-bit little-endian powerpc, it is known to break ZFS with many Linux versions on both 32-bit and 64-bit big-endian powerpc. Until this issue is fixed in Linux, we have to workaround it by overriding affected inline functions without depending on 'cpu_feature_keys'. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: WHR <[email protected]> Closes #14590
* Fix build for Linux/powerpc without CONFIG_ALTIVEC or CONFIG_VSXLow-power2023-03-071-8/+22
| | | | | | | | | This fixes building ZFS for Linux 4.7+ powerpc* architecture, where Linux was configured without CONFIG_ALTIVEC or CONFIG_VSX. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: WHR <[email protected]> Closes #14591
* spl: Add cmn_err_once() to log a message only on the first callAttila Fülöp2023-03-072-0/+50
| | | | | | | Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes #14567
* Add SHA2 SIMD feature tests for LinuxTino Reichardt2023-03-026-13/+169
| | | | | | | | | | | | | | | These are added: - zfs_neon_available() for arm and aarch64 - zfs_sha256_available() for arm and aarch64 - zfs_sha512_available() for aarch64 - zfs_shani_available() for x86_64 Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Co-Authored-By: Sebastian Gottschall <[email protected]> Closes #13741
* Add SHA2 SIMD feature tests for FreeBSDTino Reichardt2023-03-027-28/+205
| | | | | | | | | | | | | | | | | These are added: - zfs_neon_available() for arm and aarch64 - zfs_sha256_available() for arm and aarch64 - zfs_sha512_available() for aarch64 - zfs_shani_available() for x86_64 Changes: - simd_powerpc.h: change license from CDDL to BSD Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13741
* Remove old or redundant SHA2 filesTino Reichardt2023-03-024-347/+0
| | | | | | | | | | | | | | | | | We had three sha2.h headers in different places. The FreeBSD version, the Linux version and the generic solaris version. The only assembly used for acceleration was some old x86-64 openssl implementation for sha256 within the icp module. For FreeBSD the whole SHA2 files of FreeBSD were copied into OpenZFS, these files got removed also. Tested-by: Rich Ercolani <[email protected]> Tested-by: Sebastian Gottschall <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes #13741
* Linux: Assert mutex is held in mutex_exit()Richard Yao2023-02-281-0/+1
| | | | | | | | | | | | | | A spurious mutex_exit() in a development branch caused weird issues until I identified it. An assertion prior to mutex_exit() would have caught it. Rather than adding assertions before invocations of mutex_exit() in the code, let us simply add an assertion to mutex_exit(). It is cheap and will likely improve developer productivity. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Richard Yao <[email protected]> Sponsored-By: Wasabi Technology, Inc. Closes #14541
* Use .section .rodata instead of .rodata on FreeBSDDimitry Andric2023-02-241-1/+1
| | | | | | | | | | | | | | | In commit 0a5b942d4 the FreeBSD SECTION_STATIC macro was set to ".rodata". This assembler directive is supported by LLVM (as a convenience alias for ".section .rodata") by not by GNU as. This caused the FreeBSD builds that are done with gcc to fail. Therefore, use ".section .rodata" instead, similar to the other asm_linkage.h headers. Reviewed-by: Mateusz Guzik <[email protected]> Reviewed-by: Attila Fülöp <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Dimitry Andric <[email protected]> Closes #14526
* Fix buffered/direct/mmap I/O raceBrian Behlendorf2023-02-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a page is faulted in for memory mapped I/O the page lock may be dropped before it has been read and marked up to date. If a buffered read encounters such a page in mappedread() it must wait until the page has been updated. Failure to do so will result in a panic on debug builds and incorrect data on production builds. The critical part of this change is in mappedread() where pages which are not up to date are now handled. Additionally, it includes the following simplifications. - zfs_getpage() and zfs_fillpage() could be passed an array of pages. This could be more efficient if it was used but in practice only a single page was ever provided. These interfaces were simplified to acknowledge that. - update_pages() was modified to correctly set the PG_error bit on a page when it cannot be read by dmu_read(). - Setting PG_error and PG_uptodate was moved to zfs_fillpage() from zpl_readpage_common(). This is consistent with the handling in update_pages() and mappedread(). - Minor additional refactoring to comments and variable declarations to improve readability. - Add a test case to exercise concurrent buffered, direct, and mmap IO to the same file. - Reduce the mmap_sync test case default run time. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #13608 Closes #14498
* Linux: use filemap_range_has_page()Brian Behlendorf2023-02-143-8/+20
| | | | | | | | | | As of the 4.13 kernel filemap_range_has_page() can be used to check if there is a page mapped in a given file range. When available this interface should be used which eliminates the need for the zp->z_is_mapped boolean. Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #14493
* Restore FreeBSD to use .rodataJorgen Lundman2023-02-061-1/+1
| | | | | | | | | | | In https://github.com/openzfs/zfs/pull/14228 the FreeBSD SECTION_STATIC was set to ".data" instead of ".rodata". This commit just restores it back to .rodata. Reviewed-by: Attila Fülöp <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #14460
* linux 6.2 compat: zpl_set_acl arg2 is now struct dentryColeman Kane2023-01-241-0/+3
| | | | | | | | | | | | | | | Linux 6.2 changes the second argument of the set_acl operation to be a "struct dentry *" rather than a "struct inode *". The inode* parameter is still available as dentry->d_inode, so adjust the call to the _impl function call to dereference and pass that pointer to it. Also document that the get_acl -> get_inode_acl member name change from commit 884a693 was an API change also introduced in Linux 6.2. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #14415
* x86 asm: Replace .align with .balignAttila Fülöp2023-01-242-10/+10
| | | | | | | | | | | | | | | | The .align directive used to align storage locations is ambiguous. On some platforms and assemblers it takes a byte count, on others the argument is interpreted as a shift value. The current usage expects the first interpretation. Replace it with the unambiguous .balign directive which always expects a byte count, regardless of platform and assembler. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Closes #14422
* Unify Assembler files between Linux and WindowsJorgen Lundman2023-01-174-0/+397
| | | | | | | | | | Add new macro ASMABI used by Windows to change calling API to "sysv_abi". Reviewed-by: Attila Fülöp <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #14228
* Cleanup of dead code suggested by Clang Static Analyzer (#14380)Richard Yao2023-01-172-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I recently gained the ability to run Clang's static analyzer on the linux kernel modules via a few hacks. This extended coverage to code that was previously missed since Clang's static analyzer only looked at code that we built in userspace. Running it against the Linux kernel modules built from my local branch produced a total of 72 reports against my local branch. Of those, 50 were reports of logic errors and 22 were reports of dead code. Since we already had cleaned up all of the previous dead code reports, I felt it would be a good next step to clean up these dead code reports. Clang did a further breakdown of the dead code reports into: Dead assignment 15 Dead increment 2 Dead nested assignment 5 The benefit of cleaning these up, especially in the case of dead nested assignment, is that they can expose places where our error handling is incorrect. A number of them were fairly straight forward. However several were not: In vdev_disk_physio_completion(), not only were we not using the return value from the static function vdev_disk_dio_put(), but nothing used it, so I changed it to return void and removed the existing (void) cast in the other area where we call it in addition to no longer storing it to a stack value. In FSE_createDTable(), the function is dead code. Its helper function FSE_freeDTable() is also dead code, as are the CPP definitions in `module/zstd/include/zstd_compat_wrapper.h`. We just delete it all. In zfs_zevent_wait(), we have an optimization opportunity. cv_wait_sig() returns 0 if there are waiting signals and 1 if there are none. The Linux SPL version literally returns `signal_pending(current) ? 0 : 1)` and FreeBSD implements the same semantics, we can just do `!cv_wait_sig()` in place of `signal_pending(current)` to avoid unnecessarily calling it again. zfs_setattr() on FreeBSD version did not have error handling issue because the code was removed entirely from FreeBSD version. The error is from updating the attribute directory's files. After some thought, I decided to propapage errors on it to userspace. In zfs_secpolicy_tmp_snapshot(), we ignore a lack of permission from the first check in favor of checking three other permissions. I assume this is intentional. In zfs_create_fs(), the return value of zap_update() was not checked despite setting an important version number. I see no backward compatibility reason to permit failures, so we add an assertion to catch failures. Interestingly, Linux is still using ASSERT(error == 0) from OpenSolaris while FreeBSD has switched to the improved ASSERT0(error) from illumos, although illumos has yet to adopt it here. ASSERT(error == 0) was used on Linux while ASSERT0(error) was used on FreeBSD since the entire file needs conversion and that should be the subject of another patch. dnode_move()'s issue was caused by us not having implemented POINTER_IS_VALID() on Linux. We have a stub in `include/os/linux/spl/sys/kmem_cache.h` for it, when it really should be in `include/os/linux/spl/sys/kmem.h` to be consistent with Illumos/OpenSolaris. FreeBSD put both `POINTER_IS_VALID()` and `POINTER_INVALIDATE()` in `include/os/freebsd/spl/sys/kmem.h`, so we copy what it did. Whenever a report was in platform-specific code, I checked the FreeBSD version to see if it also applied to FreeBSD, but it was only relevant a few times. Lastly, the patch that enabled Clang's static analyzer to be run on the Linux kernel modules needs more work before it can be put into a PR. I plan to do that in the future as part of the on-going static analysis work that I am doing. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14380
* change how d_alias is replaced by du.d_aliasGian-Carlo DeFazio2023-01-121-4/+4
| | | | | | | | | | | | | | | | | | | | | d_alias may need to be converted to du.d_alias depending on the kernel version. d_alias is currently in only one place in the code which changes "hlist_for_each_entry(dentry, &inode->i_dentry, d_alias)" to "hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias)" as neccesary. This effectively results in a double macro expansion for code that uses the zfs headers but already has its own macro for just d_alias (lustre in this case). Remove the conditional code for hlist_for_each_entry and have a macro for "d_alias -> du.d_alias" instead. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Gian-Carlo DeFazio <[email protected]> Closes #14377
* linux 6.2 compat: bio->bi_rw was renamed bio->bi_opfColeman Kane2023-01-061-0/+4
| | | | | | | | | | | | | | | | | | The bi_rw member of struct bio was renamed to bi_opf in Linux 6.2. As well, Linux's implementation of bio_set_op_attrs(...) has been removed. The HAVE_BIO_BI_OPF macro already appears to be defined, but the removal of the bio_set_op_attrs(...) implementation makes the build fall back on the locally-defined implementation, which isn't updated for the bio->bi_opf change. This commit adds that update. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #14324 Closes #14331
* linux 6.2 compat: get_acl() got moved to get_inode_acl() in 6.2Coleman Kane2023-01-061-1/+1
| | | | | | | | | | | | Linux 6.2 renamed the get_acl() operation to get_inode_acl() in the inode_operations struct. This should fix Issue #14323. Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #14323 Closes #14331
* Implement uncached prefetchAlexander Motin2023-01-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously the primarycache property was handled only in the dbuf layer. Since the speculative prefetcher is implemented in the ARC, it had to be disabled for uncacheable buffers. This change gives the ARC knowledge about uncacheable buffers via arc_read() and arc_write(). So when remove_reference() drops the last reference on the ARC header, it can either immediately destroy it, or if it is marked as prefetch, put it into a new arc_uncached state. That state is scanned every second, evicting stale buffers that were not demand read. This change also tracks dbufs that were read from the beginning, but not to the end. It is assumed that such buffers may receive further reads, and so they are stored in dbuf cache. If a following reads reaches the end of the buffer, it is immediately evicted. Otherwise it will follow regular dbuf cache eviction. Since the dbuf layer does not know actual file sizes, this logic is not applied to the final buffer of a dnode. Since uncacheable buffers should no longer stay in the ARC for long, this patch also tries to optimize I/O by allocating ARC physical buffers as linear to allow buffer sharing. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14243
* arc_read()/arc_access() refactoring and cleanupAlexander Motin2022-12-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARC code was many times significantly modified over the years, that created significant amount of tangled and potentially broken code. This should make arc_access()/arc_read() code some more readable. - Decouple prefetch status tracking from b_refcnt. It made sense originally, but became highly cryptic over the years. Move all the logic into arc_access(). While there, clean up and comment state transitions in arc_access(). Some transitions were weird IMO. - Unify arc_access() calls to arc_read() instead of sometimes calling it from arc_read_done(). To avoid extra state changes and checks add one more b_refcnt for ARC_FLAG_IO_IN_PROGRESS. - Reimplement ARC_FLAG_WAIT in case of ARC_FLAG_IO_IN_PROGRESS with the same callback mechanism to not falsely account them as hits. Count those as "iohits", an intermediate between "hits" and "misses". While there, call read callbacks in original request order, that should be good for fairness and random speculations/allocations/aggregations. - Introduce additional statistic counters for prefetch, accounting predictive vs prescient and hits vs iohits vs misses. - Remove hash_lock argument from functions not needing it. - Remove ARC_FLAG_PREDICTIVE_PREFETCH, since it should be opposite to ARC_FLAG_PRESCIENT_PREFETCH if ARC_FLAG_PREFETCH is set. We may wish to add ARC_FLAG_PRESCIENT_PREFETCH to few more places. - Fix few false positive tests found in the process. Reviewed-by: George Wilson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #14123
* Linux PPC: Fix build failures on kernels built without CONFIG_SPERichard Yao2022-12-091-0/+15
| | | | | | | | | | | | We do a simple ifdef to avoid calling enable_kernel_spe()/ disable_kernel_spe() on PowerPC. Reported-by: Rich Ercolani <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Tested-by: Georgy Yakovlev <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14233 Closes #14244
* Allow to control failfastMariusz Zaborski2022-11-101-2/+8
| | | | | | | | | | | | | | | | | | | | | Linux defaults to setting "failfast" on BIOs, so that the OS will not retry IOs that fail, and instead report the error to ZFS. In some cases, such as errors reported by the HBA driver, not the device itself, we would wish to retry rather than generating vdev errors in ZFS. This new property allows that. This introduces a per vdev option to disable the failfast option. This also introduces a global module parameter to define the failfast mask value. Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Allan Jude <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Mariusz Zaborski <[email protected]> Sponsored-by: Seagate Technology LLC Submitted-by: Klara, Inc. Closes #14056
* Support idmapped mount in user namespaceyouzhongyang2022-11-083-23/+76
| | | | | | | | | | | | | | | | | | Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". Update the OpenZFS accordingly to improve the idmapped mount support. This pull request contains the following changes: - xattr setter functions are fixed to take mnt_ns argument. Without this, cp -p would fail for an idmapped mount in a user namespace. - idmap_util is enhanced/fixed for its use in a user ns context. - One test case added to test idmapped mount in a user ns. Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Youzhong Yang <[email protected]> Closes #14097
* freebsd: simplify MD isa_defs.hBrooks Davis2022-11-072-455/+0
| | | | | | | | | | | | | | | | Most of this file was a pile of defines, apparently from Solaris that controlled nothing in the source tree. A few things controlled the definition of unused types or macros which I have removed. Considerable further cleanup is possible including removal of architectures FreeBSD never supported. This file should likely converge with the Linux version to the extent possible. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Brooks Davis <[email protected]> Closes #14127
* freebsd: trim dkio.h to the minimumBrooks Davis2022-11-071-460/+0
| | | | | | | | | | Only DKIOCFLUSHWRITECACHE is required. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Brooks Davis <[email protected]> Closes #14127
* freebsd: add ifdefs around legacy ioctl supportBrooks Davis2022-11-071-0/+13
| | | | | | | | | | | | | | | | Require that ZFS_LEGACY_SUPPORT be defined for legacy ioctl support to be built. For now, define it in zfs_ioctl_compat.h so support is always built. This will allow systems that need never support pre-openzfs tools a mechanism to remove support at build time. This code should be removed once the need for tool compatability is gone. No functional change at this time. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Brooks Davis <[email protected]> Closes #14127
* freebsd: remove no-op vn_renamepath()Brooks Davis2022-11-071-1/+0
| | | | | | | | | | | | vn_renamepath() is a Solaris-ism that was defined away in the FreeBSD port. Now that the only use is in the FreeBSD zfs_vnops_os.c, drop it entierly. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Brooks Davis <[email protected]> Closes #14127
* freebsd: remove unused vn_rename()Brooks Davis2022-11-071-9/+0
| | | | | | | | Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Brooks Davis <[email protected]> Closes #14127