aboutsummaryrefslogtreecommitdiffstats
path: root/module
Commit message (Collapse)AuthorAgeFilesLines
* Consolidate zfs_holey and zfs_accessMatthew Macy2021-06-234-182/+98
| | | | | | | | | The zfs_holey() and zfs_access() functions can be made common to both FreeBSD and Linux. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Matt Macy <[email protected]> Closes #11125
* Fix error code on __zpl_ioctl_setflags()Luis Henriques2021-06-231-1/+1
| | | | | | | | | | Other (all?) Linux filesystems seem to return -EPERM instead of -EACCESS when trying to set FS_APPEND_FL or FS_IMMUTABLE_FL without the CAP_LINUX_IMMUTABLE capability. This was detected by generic/545 test in the fstest suite. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Luis Henriques <[email protected]> Closes #11791
* Linux: always check or verify return of igrab()Adam D. Moss2021-06-233-3/+9
| | | | | | | | | | | zhold() wraps igrab() on Linux, and igrab() may fail when the inode is in the process of being deleted. This means zhold() must only be called when a reference exists and therefore it cannot be deleted. This is the case for all existing consumers so add a VERIFY and a comment explaining this requirement. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Adam Moss <[email protected]> Closes #11704
* Linux: Set spl_kmem_cache_slab_limit when page size !4KBrian Behlendorf2021-06-231-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For small objects the kernel's slab implementation is very fast and space efficient. However, as the allocation size increases to require multiple pages performance suffers. The SPL kmem cache allocator was designed to better handle these large allocation sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers prefer to use the kernel's slab allocator for small objects and the custom SPL kmem cache allocator for larger objects. This logic was effectively disabled for all architectures using a non-4K page size which caused all kmem caches to only use the SPL implementation. Functionally this is fine, but the SPL code which calculates the target number of objects per-slab does not take in to account that __vmalloc() always returns page-aligned memory. This can result in a massive amount of wasted space when allocating tiny objects on a platform using large pages (64k). To resolve this issue we set the spl_kmem_cache_slab_limit cutoff to 16K for all architectures. This particular change does not attempt to update the logic used to calculate the optimal number of pages per slab. This remains an issue which should be addressed in a future change. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #12152 Closes #11429 Closes #11574 Closes #12150
* Fix zfs_get_data access to files with wrong generationChunwei Chen2021-06-234-3/+22
| | | | | | | | | | | | | | | If TX_WRITE is create on a file, and the file is later deleted and a new directory is created on the same object id, it is possible that when zil_commit happens, zfs_get_data will be called on the new directory. This may result in panic as it tries to do range lock. This patch fixes this issue by record the generation number during zfs_log_write, so zfs_get_data can check if the object is valid. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #10593 Closes #11682
* Fix dmu_recv_stream test for resumablePaul Zuchowski2021-06-231-2/+2
| | | | | | | | | | | | Use dsl_dataset_has_resume_receive_state() not dsl_dataset_is_zapified() to check if stream is resumable. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Paul Zuchowski <[email protected]> Closes #12034
* Remove iov_iter_advance() for iter_writeRich Ercolani2021-06-231-3/+0
| | | | | | | | | | | | | | | | | | | | The additional iter advance is incorrect, as copy_from_iter() has already done the right thing. This will result in the following warning being printed to the console as of the 5.12 kernel. Attempted to advance past end of bvec iter This change should have been included with #11378 when a similar change was made on the read side. Suggested-by: @siebenmann Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Issue #11378 Closes #12041 Closes #12155 (cherry picked from commit 3f81aba7668143c6ca6fc44983d4c880606dea8f) Signed-off-by: Jonathon Fernyhough <[email protected]>
* linux 5.13 compat: bdevops->revalidate_disk() removed (#12122)Jonathon Fernyhough2021-06-231-0/+2
| | | | | | | | | | | | | | | | | | | Linux kernel commit 0f00b82e5413571ed225ddbccad6882d7ea60bc7 removes the revalidate_disk() handler from struct block_device_operations. This caused a regression, and this commit eliminates the call to it and the assignment in the block_device_operations static handler assignment code, when configure identifies that the kernel doesn't support that API handler. Reviewed-by: Colin Ian King <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #11967 Closes #11977 (cherry picked from commit 48c7b0e444591ca5e0199e266e79ecff53e81ee6) Signed-off-by: Jonathon Fernyhough <[email protected]> Co-authored-by: Coleman Kane <[email protected]>
* Bend zpl_set_acl to permit the new userns* parameterRich Ercolani2021-06-231-12/+23
| | | | | | | | | | | | | Just like #12087, the set_acl signature changed with all the bolted-on *userns parameters, which disabled set_acl usage, and caused #12076. Turn zpl_set_acl into zpl_set_acl and zpl_set_acl_impl, and add a new configure test for the new version. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #12076 Closes #12093
* Update tmpfile() existence detectionRich Ercolani2021-06-231-0/+5
| | | | | | | | | | | | | Linux changed the tmpfile() signature again in torvalds/linux@6521f89, which in turn broke our HAVE_TMPFILE detection in configure. Update that macro to include the new case, and change the signature of zpl_tmpfile as appropriate. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes: #12060 Closes: #12087
* Linux 5.12 update: bio_max_segs() replaces BIO_MAX_PAGESColeman Kane2021-06-231-0/+5
| | | | | | | | | | | | | | The BIO_MAX_PAGES macro is being retired in favor of a bio_max_segs() function that implements the typical MIN(x,y) logic used throughout the kernel for bounding the allocation, and also the new implementation is intended to be signed-safe (which the former was not). Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #11765 (cherry picked from commit ffd6978ef59cfe2773e984bf03de2f0b93b03f5c) Signed-off-by: Jonathon Fernyhough <[email protected]>
* Linux 5.12 compat: idmapped mountsColeman Kane2021-06-236-13/+100
| | | | | | | | | | | | | | | | | | | In Linux 5.12, the filesystem API was modified to support ipmapped mounts by adding a "struct user_namespace *" parameter to a number functions and VFS handlers. This change adds the needed autoconf macros to detect the new interfaces and updates the code appropriately. This change does not add support for idmapped mounts, instead it preserves the existing behavior by passing the initial user namespace where needed. A subsequent commit will be required to add support for idmapped mounted. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #11712 (cherry picked from commit e2a8296131e94ad785f5564156ed2db1fdb2e080) Signed-off-by: Jonathon Fernyhough <[email protected]>
* FreeBSD: Initialize/destroy zp->z_lockRyan Moeller2021-06-231-0/+2
| | | | | | | | | | zp->z_lock is used in shared code for protecting projid and scantime. We don't exercise these paths much if at all on FreeBSD, so have been lucky enough not to have issues with the uninitialized locks so far. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #12003
* FreeBSD: damage control racing .. lookups in face of mkdir/rmdirMateusz Guzik2021-06-231-0/+27
| | | | | | | External-issue: https://reviews.freebsd.org/D29769 Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11926
* Fix AVX512BW Fletcher code on AVX512-but-not-BW machinesRomain Dolbeau2021-06-231-1/+7
| | | | | | | | | | Introduce a specific valid function for avx512f+avx512bw (instead of checking only for avx512f). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes #11937 Closes #11938
* Combine zio caches if possibleMateusz Guzik2021-06-231-24/+50
| | | | | | | | | | | This deduplicates 2 sets of caches which use the same allocation size. Memory savings fluctuate a lot, one sample result is FreeBSD running "make buildworld" saving ~180MB RAM in reduced page count associated with zio caches. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11877
* Fix crash in zio_done error reportingPaul Zuchowski2021-06-231-2/+3
| | | | | | | | | Fix NULL pointer dereference when reporting checksum error for gang block in zio_done. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Zuchowski <[email protected]> Closes #11872 Closes #11896
* ZTS: fix removal_condense_export test caseBrian Behlendorf2021-06-231-2/+5
| | | | | | | | | | | | | It's been observed in the CI that the required 25% of obsolete bytes in the mapping can be to high a threshold for this test resulting in condensing never being triggered and a test failure. To prevent these failures make the existing zfs_condense_indirect_obsolete_pct tuning available so the obsolete percentage can be reduced from 25% to 5% during this test. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11869
* Ratelimit deadman zevents as with delay zeventsRyan Moeller2021-06-232-3/+8
| | | | | | | | | | | | | | | Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11786
* Avoid taking global lock to destroy zfsdev stateRyan Moeller2021-06-232-21/+11
| | | | | | | | | | | | | | We have exclusive access to our zfsdev state object in this section until it is invalidated by setting zs_minor to -1, so we can destroy the state without taking a lock if we do the invalidation last, after a member to ensure correct ordering. While here, strengthen the assertions that zs_minor is valid when we enter. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11751
* FreeBSD: Fix stable/12 after AT_BENEATH removalRyan Moeller2021-06-231-3/+1
| | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11827
* Allow pool names that look like Solaris disk namesRyan Moeller2021-06-231-6/+0
| | | | | | | | | | | Nothing bad happens if a prefix of your pool name matches a disk name. This is a bit of a silly restriction at this point. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11781 Closes #11813
* Don't scale zfs_zevent_len_max by CPU countRyan Moeller2021-06-231-4/+1
| | | | | | | | | | The lower bound for this scaling to too low and the upper bound is too high. Use a fixed default length of 512 instead, which is a reasonable value on any system. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11822
* Atomically check and set dropped zevent countRyan Moeller2021-06-231-2/+1
| | | | | | | | | ratelimit_dropped isn't protected by a lock and is expected to be updated atomically. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11822
* Fix regression in POSIX mode behaviorAndrew2021-06-231-4/+0
| | | | | | | | | | | | | | | | | Commit 235a85657 introduced a regression in evaluation of POSIX modes that require group DENY entries in the internal ZFS ACL. An example of such a POSX mode is 007. When write_implies_delete_child is set, then ACE_WRITE_DATA is added to `wanted_dirperms` in prior to calling zfs_zaccess_common(). This occurs is zfs_zaccess_delete(). Unfortunately, when zfs_zaccess_aces_check hits this particular DENY ACE, zfs_groupmember() is checked to determine whether access should be denied, and since zfs_groupmember() always returns B_TRUE on Linux and so this check is failed, resulting ultimately in EPERM being returned. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Andrew Walker <[email protected]> Closes #11760
* Allow setting bootfs property on pools with indirect vdevsMartin Matuška2021-06-231-3/+1
| | | | | | | | | The FreeBSD boot loader relies on the bootfs property and is capable of booting from removed (indirect) vdevs. Reviewed-by Eric van Gyzen Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Martin Matuska <[email protected]> Closes #11763
* FreeBSD: make seqc asserts conditional on replayMateusz Guzik2021-06-231-3/+6
| | | | | | | Avoids tripping on asserts when doing pool recovery. Reviewed-by: Ryan Moeller <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11739
* FreeBSD: Fix memory leaks in kstatsRyan Moeller2021-06-231-7/+4
| | | | | | | | | | | Don't handle (incorrectly) kmem_zalloc() failure. With KM_SLEEP, will never return NULL. Free the data allocated for non-virtual kstats when deleting the object. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11767
* FreeBSD: bring back possibility to rewind the checkpoint from bootloaderMariusz Zaborski2021-06-231-1/+16
| | | | | | | | | | | | | | | | | | Add parsing of the rewind options. When I was upstreaming the change [1], I omitted the part where we detect that the pool should be rewind. When the FreeBSD repo has synced with the OpenZFS, this part of the code was removed. [1] FreeBSD repo: 277f38abffc6a8160b5044128b5b2c620fbb970c [2] OpenZFS repo: f2c027bd6a003ec5793f8716e6189c389c60f47a External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254152 Originally reviewed by: tsoome, allanjude Originally reviewed by: kevans (ok from high-level overview) Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Mariusz Zaborski <[email protected]> Closes #11730
* FreeBSD: Clean up zfsdev_close to match LinuxRyan Moeller2021-06-231-10/+8
| | | | | | | | | | | | | | Resolve some oddities in zfsdev_close() which could result in a panic and were not present in the equivalent function for Linux. - Remove unused definition ZFS_MIN_MINOR - FreeBSD: Simplify zfsdev state destruction - Assert zs_minor is valid in zfsdev_close - Make locking around zfsdev state match Linux Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11720
* Macroify teardown lock handlingMateusz Guzik2021-06-233-30/+28
| | | | | | | | | | | This will allow platforms to implement it as they see fit, in particular in a different manner than rrm locks. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Macy <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11153
* FreeBSD: rename teardown inactive macros to mimick rrm conventionMateusz Guzik2021-06-233-18/+18
| | | | | | | | Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Macy <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11153
* FreeBSD: remove 2 assertions that teardown lock is not heldMateusz Guzik2021-06-231-45/+0
| | | | | | | | | | | They are not very useful and hard to implement in the rms routine the code is about to start using. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Macy <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11153
* FreeBSD: rework asserts in zfs_dd_lookupMateusz Guzik2021-06-231-3/+2
| | | | | | | | | | | | 1. even up ifdefs 2. drop the arguably useless teardown lock asserts -- nothing else checks for it Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Macy <[email protected]> Signed-off-by: Mateusz Guzik <[email protected]> Closes #11153
* FreeBSD: Fix scope of deadman tunablesRyan Moeller2021-06-231-2/+2
| | | | | | | | | | A few deadman tunables ended up in the wrong sysctl node. Move them to vfs.zfs.deadman.* Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11715
* zvol: call zil_replaying() during replayChristian Schwarz2021-06-233-3/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zil_replaying(zil, tx) has the side-effect of informing the ZIL that an entry has been replayed in the (still open) tx. The ZIL uses that information to record the replay progress in the ZIL header when that tx's txg syncs. ZPL log entries are not idempotent and logically dependent and thus calling zil_replaying() is necessary for correctness. For ZVOLs the question of correctness is more nuanced: ZVOL logs only TX_WRITE and TX_TRUNCATE, both of which are idempotent. Logical dependencies between two records exist only if the write or discard request had sync semantics or if the ranges affected by the records overlap. Thus, at a first glance, it would be correct to restart replay from the beginning if we crash before replay completes. But this does not address the following scenario: Assume one log record per LWB. The chain on disk is HDR -> 1:W(1, "A") -> 2:W(1, "B") -> 3:W(2, "X") -> 4:W(3, "Z") where N:W(O, C) represents log entry number N which is a TX_WRITE of C to offset A. We replay 1, 2 and 3 in one txg, sync that txg, then crash. Bit flips corrupt 2, 3, and 4. We come up again and restart replay from the beginning because we did not call zil_replaying() during replay. We replay 1 again, then interpret 2's invalid checksum as the end of the ZIL chain and call replay done. The replayed zvol content is "AX". If we had called zil_replaying() the HDR would have pointed to 3 and our resumed replay would not have replayed anything because 3 was corrupted, resulting in zvol content "BX". If 3 logically depends on 2 then the replay corrupted the ZVOL_OBJ's contents. This patch adds the zil_replaying() calls to the replay functions. Since the callbacks in the replay function need the zilog_t* pointer so that they can call zil_replaying() we open the ZIL while replaying in zvol_create_minor(). We also verify that replay has been done when on-demand-opening the ZIL on the first modifying bio. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Christian Schwarz <[email protected]> Closes #11667
* Cancel TRIM / initialize on FAULTED non-writeable vdevsnssrikanth2021-06-232-6/+19
| | | | | | | | | | | | | | When a device which is actively trimming or initializing becomes FAULTED, and therefore no longer writable, cancel the active TRIM or initialization. When the device is merely taken offline with `zpool offline` then stop the operation but do not cancel it. When the device is brought back online the operation will be resumed if possible. Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Co-authored-by: Vipin Kumar Verma <[email protected]> Signed-off-by: Srikanth N S <[email protected]> Closes #11588
* Fix overly broad locking in spa_vdev_config_exit()Brian Behlendorf2021-06-231-2/+2
| | | | | | | | | | | | | Calling vdev_free() only requires the we acquire the spa config SCL_STATE_ALL locks, not the SCL_ALL locks. In particular, we need need to avoid taking the SCL_CONFIG lock (included in SCL_ALL) as a writer since this can lead to a deadlock. The txg_sync_thread() may block in spa_txg_history_init_io() when taking the SCL_CONFIG lock as a reading when it detects there's a pending writer. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Mark Maybee <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11585
* Wrap bare EINVAL returns with SET_ERRORRyan Moeller2021-06-231-2/+2
| | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11636
* Intentionally allow ZFS_READONLY in zfs_writeRyan Moeller2021-03-082-7/+25
| | | | | | | | | | | | | | | | ZFS_READONLY represents the "DOS R/O" attribute. When that flag is set, we should behave as if write access were not granted by anything in the ACL. In particular: We _must_ allow writes after opening the file r/w, then setting the DOS R/O attribute, and writing some more. (Similar to how you can write after fchmod(fd, 0444).) Restore these semantics which were lost on FreeBSD when refactoring zfs_write. To my knowledge Linux does not actually expose this flag, but we'll need it to eventually so I've added the supporting checks. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11693
* Initialize ZIL buffersBrian Behlendorf2021-03-081-0/+1
| | | | | | | | | When populating a ZIL destination buffer ensure it is always zeroed before its contents are constructed. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #11687
* Add "zstd-fast" to help options for "compression" propertyJake Howard2021-03-051-1/+1
| | | | | | | This value does work as expected, and is documented in the manpage. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jake Howard <[email protected]> Closes #11670
* Fix assert in FreeBSD-specific dmu_read_pagesAndriy Gapon2021-03-051-1/+1
| | | | | | | | | | | | | The function has three similar pieces of code: for read-behind pages, requested pages and read-ahead pages. All three pieces had an assert to ensure that the page is not mapped. Later the assert was relaxed to require that the page is not mapped for writing. But that was done in two places out of three. This change fixes the third piece, read-ahead. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Andriy Gapon <[email protected]> Closes #11654
* Linux 5.12 compat: bio->bi_disk member movedColeman Kane2021-03-052-0/+8
| | | | | | | | | | The struct bio member bi_disk was moved underneath a new member named bi_bdev. So all attempts to reference bio->bi_disk need to now become bio->bi_bdev->bd_disk. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Coleman Kane <[email protected]> Closes #11639
* Linux: increase max nvlist_src sizeBrian Behlendorf2021-03-051-1/+1
| | | | | | | | | | | | On Linux increase the maximum allowed size of the src nvlist which can be passed to the /dev/zfs ioctl. Originally, this was set to a maximum of KMALLOC_MAX_SIZE (4M) because it was kmalloc'd. Since that time it's been converted to a vmalloc so that's no longer a hard limit, and it's desirable for `zfs send/recv` to allow larger nvlists so more snapshots can be sent at once. Signed-off-by: Brian Behlendorf <[email protected]> Closes #6572 Closes #11638
* vdev_ops: don't try to call vdev_op_hold or vdev_op_rele when NULLfbynite2021-03-051-2/+2
| | | | | | | | | | | | | | | | | | | | This prevents a panic after a SLOG add/removal on the root pool followed by a zpool scrub. When a SLOG is removed, a hole takes its place - the vdev_ops for a hole is vdev_hole_ops, which defines the handler functions of vdev_op_hold and vdev_op_rele as NULL. This bug has been reported in illumos and FreeBSD, a different trigger in the FreeBSD report though. Credit for this patch goes to Patrick Mooney <[email protected]> Obtained from: illumos-gate commit: c65bd18728f34725 External-issue: https://www.illumos.org/issues/12981 External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252396 Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Wing <[email protected]> Closes #11623
* Restore FreeBSD resource usage accountingRyan Moeller2021-03-056-0/+100
| | | | | | | Add zfs_racct_* interfaces for platform-dependent read/write accounting. Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ryan Moeller <[email protected]> Closes #11613
* FreeBSD: disable the use of hardware crypto offload drivers for nowMark Johnston2021-03-051-2/+13
| | | | | | | | | | | | | | | | | | | | | First, the crypto request completion handler contains a bug in that it fails to reset fs_done correctly after the request is completed. This is only a problem for asynchronous drivers. Second, some hardware drivers have input constraints which ZFS does not satisfy. For instance, ccp(4) apparently requires the AAD length for AES-GCM to be a multiple of the cipher block size, and with qat(4) the AES-GCM AAD length may not be longer than 240 bytes. FreeBSD's generic crypto framework doesn't have a mechanism to automatically fall back to a software implementation if a hardware driver cannot process a request, and ZFS does not tolerate such errors. The plan is to implement such a fallback mechanism, but with FreeBSD 13.0 approaching we should simply disable the use hardware drivers for now. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes #11612
* Set file mode during zfs_writeAntonio Russo2021-02-081-0/+1
| | | | | | | | | | | | | | | | | | | | 3d40b65 refactored zfs_vnops.c, which shared much code verbatim between Linux and BSD. After a successful write, the suid/sgid bits are reset, and the mode to be written is stored in newmode. On Linux, this was propagated to both the in-memory inode and znode, which is then updated with sa_update. 3d40b65 accidentally removed the initialization of newmode, which happened to occur on the same line as the inode update (which has been moved out of the function). The uninitialized newmode can be saved to disk, leading to a crash on stat() of that file, in addition to a merely incorrect file mode. Reviewed-by: Ryan Moeller <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Antonio Russo <[email protected]> Closes #11474 Closes #11576
* Avoid updating the L2ARC device header unnecessarilyGeorge Amanakis2021-01-281-1/+3
| | | | | | | | | If we do not write any buffers to the cache device and the evict hand has not advanced do not update the cache device header. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #11522 Closes #11537