summaryrefslogtreecommitdiffstats
path: root/module
Commit message (Collapse)AuthorAgeFilesLines
* Fix coverity defects: CID 147553cao2016-11-011-1/+2
| | | | | | | CID 147553: Type:Dereference null return value Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5305
* Fix coverity defects: CID 152975cao2016-10-311-2/+7
| | | | | | | CID 152975: Type:Dereference null return value Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5322
* Fix coverity defects: CID 147509GeLiXin2016-10-311-2/+12
| | | | | | | | CID 147509: Explicit null dereferenced - l2arc_sublist_lock is fragile as relied on caller too much. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: GeLiXin <[email protected]> Closes #5319
* Fix lookup_bdev() on UbuntuHajo Möller2016-10-261-1/+1
| | | | | | | | | | | | Ubuntu added support for checking inode permissions to lookup_bdev() in kernel commit 193fb6a2c94fab8eb8ce70a5da4d21c7d4023bee (merged in 4.4.0-6.21). Upstream bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1636517 This patch adds a test for Ubuntu's variant of lookup_bdev() to configure and calls the function in the correct way. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Hajo Möller <[email protected]> Closes #5336
* Disable zio_dva_throttle_enabled by defaultBrian Behlendorf2016-10-261-1/+1
| | | | | | | | | | Until it can be determined definitively that a performance regression wasn't introduced accidentally by 3dfb57a this functionality is being disabled by default. It can be re- enabled by setting zio_dva_throttle_enabled=1. Signed-off-by: Brian Behlendorf <[email protected]> Closes #5335 Issue #5289
* Fix statechange-led.sh & unnecessary libdevmapper warningTony Hutter2016-10-251-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Fix autoreplace behaviour on statechange-led.sh script. ZED sends the following events on an auto-replace: 1. statechange: Disk goes UNAVAIL->ONLINE 2. statechange: Disk goes ONLINE->UNAVAIL 3. vdev_attach: Disk goes ONLINE Events 1-2 happen when ZED first attempts to do an auto-online. When that fails, ZED then tries an auto-replace, generating the vdev_attach event in #3. In the previous code, statechange-led was only looking at the UNAVAIL->ONLINE transition to turn off the LED. It ignored the #2 ONLINE->UNAVAIL transition, assuming it was just the "old" VDEV going offline. This is problematic, as a drive can go from ONLINE->UNAVAIL when it's malfunctioning, and we don't want to ignore that. This new patch correctly turns on the fault LED every time a drive becomes UNAVAIL. It also monitors vdev_attach events to trigger turning off the LED when an auto-replaced disk comes online. - Remove unnecessary libdevmapper warning with --with-config=kernel This fixes an unnecessary libdevmapper warning when building --with-config=kernel. Kernel code does not use libdevmapper, so the warning is not needed. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #2375 Closes #5312 Closes #5331
* icp: mark asm files with noexec stackJason Zaman2016-10-251-0/+4
| | | | | | | | Similar to commit a3600a106. Asm files need an explicit note that they do not require an executable stack. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jason Zaman <[email protected]> Closes #5332
* Fix cred leak in zpl_fallocate_commontuxoko2016-10-241-2/+1
| | | | | | | | | This is caught by kmemleak when running compress_004_pos Reviewed-by: Tim Chase <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #5244 Closes #5330
* Fix taskq creation failure in vdev_open_children()Brian Behlendorf2016-10-241-0/+3
| | | | | | | | | | | | | | When creating and destroying pools in tight loop it's possible to exhaust the number of allowed threads on a system. This results in taskq_create() failling and a NULL dereference. Resolve the issue by falling back to opening the vdevs all synchronously. Reviewed-by: Denys Rtveliashvili <[email protected]> Reviewed-by: Håkan Johansson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes zfsonlinux/spl#521 Closes #4637
* Turn on/off enclosure slot fault LED even when disk isn't presentTony Hutter2016-10-243-0/+19
| | | | | | | | | | | | | | | | | Previously when a drive faulted, the statechange-led.sh script would lookup the drive's LED sysfs entry in /sys/block/sd*/device/enclosure_device, and turn it on. During testing we noticed that if you pulled out a drive, or if the drive was so badly broken that it no longer appeared to Linux, that the /sys/block/sd* path would be removed, and the script could not lookup the LED entry. To fix this, this patch looks up the disks's more persistent "/sys/class/enclosure/X:X:X:X/Slot N" LED sysfs path at pool import. It then passes that path to the statechange-led script to use, rather than having the script look it up on the fly. This allows the script to turn on/off the slot LEDs even when the drive is missing. Closes #5309 Closes #2375
* Fletcher4 algorithm implemented in pure NEON for Aarch64 / ARMv8 64 bitsRomain Dolbeau2016-10-213-0/+219
| | | | | | | | | | | | | | | | | | | | | | | | | This is not useful on micro-architecture with a weak NEON implementation (only 64 bits); the native version is slower & the byteswap barely faster than scalar. On A53 or A57, it's a small improvement on scalar but OK for byteswap. Results from an A53 system: 0 0 0x01 -1 0 1499068294333000 1499101101878000 implementation native byteswap scalar 1008227510 755880264 aarch64_neon 1198098720 1044818671 fastest aarch64_neon aarch64_neon Results from a A57 system: 0 0 0x01 -1 0 4407214734807033 4407233933777404 implementation native byteswap scalar 2302071241 1124873346 aarch64_neon 2542214946 2245570352 fastest aarch64_neon aarch64_neon Reviewed-by: Gvozden Neskovic <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes #5248
* Fix userquota_compare() functionBrian Behlendorf2016-10-211-1/+4
| | | | | | | | | | | | | | | | | The AVL tree compare function requires that either -1, 0, or 1 be returned. However the strcmp() function only guarantees that a negative, zero, or positive value is returned. Therefore, the return value of strcmp() needs to be sanitized with AVL_ISIGN. This was initially overlooked because the x86_64 implementation of strcmp() happens to only returns the allowed values. This was observed on an aarch64 platform which behaves correctly but differently as described above. Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #5311 Closes #5313
* Fix coverity defects: CID 153459luozhengzheng2016-10-201-1/+1
| | | | | | | | CID 153459: Null pointer dereferences (FORWARD_NULL) Accidentally introduced by #5159. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5310
* Fix coverity defects: CID 147551, 147552cao2016-10-201-0/+4
| | | | | | | | CID 147551: Type:dereference null return value CID 147552: Type:dereference null return value Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5279
* Fix coverity defects: CID 147472cao2016-10-202-4/+13
| | | | | | | CID 147472: Type: 'Constant' variable guards dead code Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5288
* Fix coverity defects: CID 150919, 150923luozhengzheng2016-10-202-2/+2
| | | | | | | | | CID 150919: Buffer not null terminated (BUFFER_SIZE_WARNING) CID 150923: Buffer not null terminated (BUFFER_SIZE_WARNING) Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5298
* Linux 4.9 compat: inode_change_ok() renamed setattr_prepare()Brian Behlendorf2016-10-201-1/+1
| | | | | | | | | | | | In torvalds/linux@31051c8 the inode_change_ok() function was renamed setattr_prepare() and updated to take a dentry ratheri than an inode. Update the code to call the setattr_prepare() and add a wrapper function which call inode_change_ok() for older kernels. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Requires-spl: refs/pull/581/head
* Linux 4.9 compat: remove iops->{set,get,remove}xattrChunwei Chen2016-10-201-0/+8
| | | | | | | | In Linux 4.9, torvalds/linux@fd50eca, iops->{set,get,remove}xattr and generic_{set,get,remove}xattr are removed. xattr operations will directly go through sb->s_xattr. Signed-off-by: Chunwei Chen <[email protected]>
* Linux 4.9 compat: iops->rename() wants flagsChunwei Chen2016-10-202-5/+39
| | | | | | | In Linux 4.9, torvalds/linux@2773bf0, iops->rename() and iops->rename2() are merged together into iops->rename(), it now wants flags. Signed-off-by: Chunwei Chen <[email protected]>
* Remove dir inode operations from zpl_inode_operationsChunwei Chen2016-10-201-8/+0
| | | | | | | These operations are dir specific, there's no point putting them in zpl_inode_operations which is for regular files. Signed-off-by: Chunwei Chen <[email protected]>
* Multipath autoreplace, control enclosure LEDs, event rate limitingTony Hutter2016-10-194-10/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | 1. Enable multipath autoreplace support for FMA. This extends FMA autoreplace to work with multipath disks. This requires libdevmapper to be installed at build time. 2. Turn on/off fault LEDs when VDEVs become degraded/faulted/online Set ZED_USE_ENCLOSURE_LEDS=1 in zed.rc to have ZED turn on/off the enclosure LED for a drive when a drive becomes FAULTED/DEGRADED. Your enclosure must be supported by the Linux SES driver for this to work. The enclosure LED scripts work for multipath devices as well. The scripts will clear the LED when the fault is cleared. 3. Rate limit ZIO delay and checksum events so as not to flood ZED ZIO delay and checksum events are rate limited to 5/sec in the zfs module. Reviewed-by: Richard Laager <[email protected]> Reviewed by: Don Brady <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #2449 Closes #3017 Closes #5159
* Fix coverity defects: CID 150926luozhengzheng2016-10-181-2/+8
| | | | | | | | | | CID 150926: Unchecked return value (CHECKED_RETURN) - This case cannot occur given the existing taskq implementation and flags passed to task_dispatch(). Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5272
* Fix unused variableBrian Behlendorf2016-10-181-2/+2
| | | | | | | | | Accidentally introduced by 3dfb57a, when building with debugging disabled several variables are unused. Resolve this by wrapping them in ASSERTV to remove them for non-debug builds. Reviewed by: Don Brady <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #5284
* Fix coverity defects: CID 49339, 153393cao2016-10-181-1/+1
| | | | | | | | CID 49339: Type:Buffer not null terminated CID 153393: Type:Buffer not null terminated Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: <cao.xuewen [email protected]> Closes #5296
* Fix coverity defects: CID 150924luozhengzheng2016-10-171-1/+5
| | | | | | | | | | CID 150924: Unchecked return value (CHECKED_RETURN) - On taskq_dispatch failure the reference must be dropped and this entry can be safely skipped. This case should be impossible in the existing implementation but should be handled regardless. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5278
* Fix coverity defects: CID 147488, 147490cao2016-10-141-1/+1
| | | | | | | | | CID 147488, Type:explicit null dereferenced CID 147490, Type:dereference null return value Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5237
* OpenZFS 7090 - zfs should throttle allocationsDon Brady2016-10-139-160/+888
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OpenZFS 7090 - zfs should throttle allocations Authored by: George Wilson <[email protected]> Reviewed by: Alex Reece <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Sebastien Roy <[email protected]> Approved by: Matthew Ahrens <[email protected]> Ported-by: Don Brady <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> When write I/Os are issued, they are issued in block order but the ZIO pipeline will drive them asynchronously through the allocation stage which can result in blocks being allocated out-of-order. It would be nice to preserve as much of the logical order as possible. In addition, the allocations are equally scattered across all top-level VDEVs but not all top-level VDEVs are created equally. The pipeline should be able to detect devices that are more capable of handling allocations and should allocate more blocks to those devices. This allows for dynamic allocation distribution when devices are imbalanced as fuller devices will tend to be slower than empty devices. The change includes a new pool-wide allocation queue which would throttle and order allocations in the ZIO pipeline. The queue would be ordered by issued time and offset and would provide an initial amount of allocation of work to each top-level vdev. The allocation logic utilizes a reservation system to reserve allocations that will be performed by the allocator. Once an allocation is successfully completed it's scheduled on a given top-level vdev. Each top-level vdev maintains a maximum number of allocations that it can handle (mg_alloc_queue_depth). The pool-wide reserved allocations (top-levels * mg_alloc_queue_depth) are distributed across the top-level vdevs metaslab groups and round robin across all eligible metaslab groups to distribute the work. As top-levels complete their work, they receive additional work from the pool-wide allocation queue until the allocation queue is emptied. OpenZFS-issue: https://www.illumos.org/issues/7090 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4756c3d7 Closes #5258 Porting Notes: - Maintained minimal stack in zio_done - Preserve linux-specific io sizes in zio_write_compress - Added module params and documentation - Updated to use optimize AVL cmp macros
* Fix coverity defects: CID 150943, 150938cao2016-10-132-4/+6
| | | | | | | | | CID:150943, Type:Unintentional integer overflow CID:150938, Type:Explicit null dereferenced Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5255
* Fix coverity defects: CID 147571, 147574luozhengzheng2016-10-132-2/+2
| | | | | | | | CID 147571: Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN) CID 147574: Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN) Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5268
* Fix coverity defects: CID 153394luozhengzheng2016-10-121-1/+1
| | | | | | | coverity scan CID 153394, Type:String overflow Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5263
* Fix ICP memleak introduced in #4760Tom Caputi2016-10-121-0/+13
| | | | | | | | | | | | | The ICP requires destructors to for each crypto module that is added. These do not necessarily exist in Illumos because they assume that these modules can never be unloaded from the kernel. Some of this cleanup code was missed when #4760 was merged, resulting in leaks. This patch simply fixes that. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Issue #4760 Closes #5265
* Fix zfsctl_snapshot_{,un}mount() issuesBrian Behlendorf2016-10-111-19/+10
|\ | | | | | | | | | | | | | | | | | | | | | | Fix use after free in zfsctl_snapshot_unmount(). Use /usr/bin/env instead of /bin/sh to fix a shell code injection flaw and allow use with grsecurity. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected] Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Stian Ellingsen <[email protected]> Closes #5250 Closes #4377
| * Use env, not sh in zfsctl_snapshot_{,un}mount()Stian Ellingsen2016-10-081-18/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Call mount and umount via /usr/bin/env instead of /bin/sh in zfsctl_snapshot_mount() and zfsctl_snapshot_unmount(). This change fixes a shell code injection flaw. The call to /bin/sh passed the mountpoint unescaped, only surrounded by single quotes. A mountpoint containing one or more single quotes would cause the command to fail or potentially execute arbitrary shell code. This change also provides compatibility with grsecurity patches. Grsecurity only allows call_usermodehelper() to use helper binaries in certain paths. /usr/bin/* is allowed, /bin/* is not.
| * Fix use after free in zfsctl_snapshot_unmount()Stian Ellingsen2016-10-081-1/+1
| |
* | Write issue taskq shouldn't be dynamicTim Chase2016-10-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is as much an upstream compatibility as it's a bit of a performance gain. The illumos taskq implemention doesn't allow a TASKQ_THREADS_CPU_PCT type to be dynamic and in fact enforces as much with an ASSERT. As to performance, if this taskq is dynamic, it can cause excessive contention on tq_lock as the threads are created and destroyed because it can see bursts of many thousands of tasks in a short time, particularly in heavy high-concurrency zvol write workloads. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #5236
* | Porting over some ICP code that was missed in #4760Tom Caputi2016-10-101-16/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | When #4760 was merged tests were added to ensure that the new checksums were working properly. However, some of the functionality for sha2 functions were not ported over, resulting in some Coverity defects and code that would be unstable when needed in the future. This patch simply ports over the missing code and fixes the defects in the process. Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Issue #4760 Closes #5251
* | Fix file permissionsBrian Behlendorf2016-10-084-0/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following new test cases need to have execute permissions set: userquota/groupspace_003_pos.ksh userquota/userquota_013_pos.ksh userquota/userspace_003_pos.ksh upgrade/upgrade_userobj_001_pos.ksh upgrade/setup.ksh upgrade/cleanup.ksh The following source files accidentally were marked executable: lib/libzpool/kernel.c lib/libshare/nfs.c lib/libzfs/libzfs_dataset.c lib/libzfs/libzfs_util.c tests/zfs-tests/cmd/rm_lnkcnt_zero_file/rm_lnkcnt_zero_file.c tests/zfs-tests/cmd/dir_rd_update/dir_rd_update.c cmd/zed/zed_exec.c module/icp/core/kcf_sched.c module/zfs/dsl_pool.c module/zfs/arc.c module/nvpair/nvpair.c man/man5/zfs-module-parameters.5 Reviewed-by: GeLiXin <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Jinshan Xiong <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #5241
* | Rename hole_birth tunable to match OpenZFSBrian Behlendorf2016-10-071-6/+12
| | | | | | | | | | | | | | | | | | | | | | OpenZFS decided that ignore_hole_birth was too imprecise and incorrect a name (and went with send_holes_without_birth_time). Rename it in ZoL too, while keeping the name "ignore_hole_birth" pointing to the same variable for existing consumers. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes #5239
* | Fix vdev_open_child() race on updating vdev_parent->vdev_nonrotHåkan Johansson2016-10-071-14/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Updating vd->vdev_parent->vdev_nonrot in vdev_open_child() is a race when vdev_open_child is called for many children from a task queue. vdev_open_child() is only called by vdev_open_children(), let the latter update the parent vdev_nonrot member. The update was already there, so done twice previously. Thus using the same logic at the end in vdev_open_children() to update vdev_nonrot, either we are vdev_uses_zvols() or not. Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Haakan T Johansson <[email protected]> Closes #5162
* | Fix coverity defects: CID 147565-147567cao2016-10-072-1/+3
| | | | | | | | | | | | | | | | | | | | coverity scan CID:147567, Type:dereference null return value coverity scan CID:147566, Type:dereference null return value coverity scan CID:147565, Type:dereference null return value Reviewed by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5166
* | Fletcher4: Incremental updates and ctx calculationBrian Behlendorf2016-10-074-202/+319
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | Fixes ABI issues with fletcher4 code, adds support for incremental updates, and adds ztest method for testing. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Gvozden Neskovic <[email protected]> Closes #5164
| * | Fletcher4: save/reload implementation contextGvozden Neskovic2016-10-054-191/+268
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Init, compute, and fini methods are changed to work on internal context object. This is necessary because ABI does not guarantee that SIMD registers will be preserved on function calls. This is technically the case in Linux kernel in between `kfpu_begin()/kfpu_end()`, but it breaks user-space tests and some kernels that don't require disabling preemption for using SIMD (osx). Use scalar compute methods in-place for small buffers, and when the buffer size does not meet SIMD size alignment. Signed-off-by: Gvozden Neskovic <[email protected]>
| * | Fletcher4: Incremental using SIMDGvozden Neskovic2016-10-051-18/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Combine incrementally computed fletcher4 checksums. Checksums are combined a posteriori, allowing for parallel computation on chunks to be implemented if required. The algorithm is general, and does not add changes in each SIMD implementation. New test in ztest verifies incremental fletcher computations. Checksum combining matrix for two buffers `a` and `b`, where `Ca` and `Cb` are respective fletcher4 checksums, `Cab` is combined checksum, `s` is size of buffer `b` (divided by sizeof(uint32_t)) is: Cab[A] = Cb[A] + Ca[A] Cab[B] = Cb[B] + Ca[B] + s * Ca[A] Cab[C] = Cb[C] + Ca[C] + s * Ca[B] + s(s+1)/2 * Ca[A] Cab[D] = Cb[D] + Ca[D] + s * Ca[C] + s(s+1)/2 * Ca[B] + s(s+1)(s+2)/6 * Ca[A] NOTE: this calculation overflows for larger buffers. Thus, internally, the calculation is performed on 8MiB chunks. Signed-off-by: Gvozden Neskovic <[email protected]>
* | | OpenZFS 6988 spa_sync() spends half its time in dmu_objset_do_userquota_updatesJinshan Xiong2016-10-072-36/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using a benchmark which creates 2 million files in one TXG, I observe that the thread running spa_sync() is on CPU almost the entire time we are syncing, and therefore can be a performance bottleneck. About 50% of the time in spa_sync() is in dmu_objset_do_userquota_updates(). The problem is that dmu_objset_do_userquota_updates() calls zap_increment_int(DMU_USERUSED_OBJECT) once for every file that was modified (or created). In this benchmark, all the files are owned by the same user/group, so all 2 million calls to zap_increment_int() are modifying the same entry in the zap. The same issue exists for the DMU_GROUPUSED_OBJECT. We should keep an in-memory map from user to space delta while we are syncing, and when we finish, iterate over the in-memory map and modify the ZAP once per entry. This reduces the number of calls to zap_increment_int() from "number of objects modified" to "number of owners/groups of modified files". This reduced the time spent in spa_sync() in the file create benchmark by ~33%, from 11 seconds to 7 seconds. Upstream bugs: DLPX-44799 Ported by: Ned Bass <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/6988 ZFSonLinux-issue: https://github.com/zfsonlinux/zfs/issues/4642 OpenZFS-commit: unmerged Porting notes: - Added curly braces around declaration of userquota_cache_t cache to quiet compiler warning; - Handled the userobj accounting the same way it proposed in this path. Signed-off-by: Jinshan Xiong <[email protected]>
* | | Add support for user/group dnode accounting & quotaJinshan Xiong2016-10-079-25/+371
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch tracks dnode usage for each user/group in the DMU_USER/GROUPUSED_OBJECT ZAPs. ZAP entries dedicated to dnode accounting have the key prefixed with "obj-" followed by the UID/GID in string format (as done for the block accounting). A new SPA feature has been added for dnode accounting as well as a new ZPL version. The SPA feature must be enabled in the pool before upgrading the zfs filesystem. During the zfs version upgrade, a "quotacheck" will be executed by marking all dnode as dirty. ZoL-bug-id: https://github.com/zfsonlinux/zfs/issues/3500 Signed-off-by: Jinshan Xiong <[email protected]> Signed-off-by: Johann Lombardi <[email protected]>
* | Refactor updating of immutable/appendonly flagslorddoskias2016-10-051-20/+13
|/ | | | | | | | | | | | Move the synchronization of inode/znode i_flgas/pflags into the respective internal zfs function. This is mostly mechanical work and shouldn't introduce any functional changes. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Issue #227 Closes #5223
* Fix coverity defects: CID 150953, 147603, 147610luozhengzheng2016-10-041-1/+1
| | | | | | | | | coverity scan CID:150953,type: uninitialized scalar variable coverity scan CID:147603,type: Resource leak coverity scan CID:147610,type: Resource leak Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5209
* OpenZFS 6585 - sha512, skein, and edonr have an unenforced dependency on ↵ilovezfs2016-10-031-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | extensible dataset Authored by: ilovezfs <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Richard Laager <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported by: Tony Hutter <[email protected]> In any pool without the extensible dataset feature flag already enabled, creating a dataset with dedup set to use one of the new checksums would result in the following panic as soon as any data was added: panic[cpu0]/thread=ffffff0006761c40: feature_get_refcount(spa, feature, &refcount) != 48 (0x30 != 0x30), file: ../../common/fs/zfs/zfeature.c line 390 Inpsection showed that feature->fi_feature was 7, which is the value of SPA_FEATURE_EXTENSIBLE_DATASET in the spa_feature enum. This commit adds extensible dataset as a dependency for the sha512, edonr, and skein feature flags, which prevents the panic. OpenZFS-issue: https://www.illumos.org/issues/6585 OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/892586e8a147c02d7f4053cc405229a13e796928 Porting Notes: This code was originally from Illumos, but I actually ported it from: openzfsonosx/zfs@b62a652
* OpenZFS 6541 - Pool feature-flag check defeated if "verify" is included in ↵ilovezfs2016-10-032-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the dedup property value Authored by: ilovezfs <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Richard Laager <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Tony Hutter <[email protected]> zio_checksum_to_feature() expects a zio_checksum enum not a raw property intval, so the new checksums weren't being detected when the ZIO_CHECKSUM_VERIFY flag got in the way. Given a pool without feature@sha512, zfs create -o dedup=sha512 naughty/fivetwelve_noverify_ds would fail as expected since the raw intval would indeed be equal to SPA_FEATURE_SHA512. However, zfs create -o dedup=sha512,verify naughty/fivetwelve_verify_ds would incorrectly succeed because ZIO_CHECKSUM_VERIFY would be in the way, the raw intval would not be a member of the enum, and zio_checksum_to_feature() would return SPA_FEATURE_NONE, with the result that spa_feature_is_enabled() would never be called. This was first detected with edonr, since in that case verify is required. This commit clears the ZIO_CHECKSUM_VERIFY flag before calling zio_checksum_to_feature() using the ZIO_CHECKSUM_MASK and verifies in zio_checksum_to_feature() that ZIO_CHECKSUM_MASK has been applied by the caller to attempt to prevent the same bug from occurring again in the future. OpenZFS-issue: https://www.illumos.org/issues/6541 OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/971640e6aa954c91b0706543741aa4570299f4d7 Porting notes: This code was originally from Illumos, but I actually ported it from: openzfsonosx/zfs@bef06e1
* OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-RTony Hutter2016-10-0338-281/+7220
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reviewed by: George Wilson <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Reviewed by: Richard Lowe <[email protected]> Approved by: Garrett D'Amore <[email protected]> Ported by: Tony Hutter <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/4185 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/45818ee Porting Notes: This code is ported on top of the Illumos Crypto Framework code: https://github.com/zfsonlinux/zfs/pull/4329/commits/b5e030c8dbb9cd393d313571dee4756fbba8c22d The list of porting changes includes: - Copied module/icp/include/sha2/sha2.h directly from illumos - Removed from module/icp/algs/sha2/sha2.c: #pragma inline(SHA256Init, SHA384Init, SHA512Init) - Added 'ctx' to lib/libzfs/libzfs_sendrecv.c:zio_checksum_SHA256() since it now takes in an extra parameter. - Added CTASSERT() to assert.h from for module/zfs/edonr_zfs.c - Added skein & edonr to libicp/Makefile.am - Added sha512.S. It was generated from sha512-x86_64.pl in Illumos. - Updated ztest.c with new fletcher_4_*() args; used NULL for new CTX argument. - In icp/algs/edonr/edonr_byteorder.h, Removed the #if defined(__linux) section to not #include the non-existant endian.h. - In skein_test.c, renane NULL to 0 in "no test vector" array entries to get around a compiler warning. - Fixup test files: - Rename <sys/varargs.h> -> <varargs.h>, <strings.h> -> <string.h>, - Remove <note.h> and define NOTE() as NOP. - Define u_longlong_t - Rename "#!/usr/bin/ksh" -> "#!/bin/ksh -p" - Rename NULL to 0 in "no test vector" array entries to get around a compiler warning. - Remove "for isa in $($ISAINFO); do" stuff - Add/update Makefiles - Add some userspace headers like stdio.h/stdlib.h in places of sys/types.h. - EXPORT_SYMBOL *_Init/*_Update/*_Final... routines in ICP modules. - Update scripts/zfs2zol-patch.sed - include <sys/sha2.h> in sha2_impl.h - Add sha2.h to include/sys/Makefile.am - Add skein and edonr dirs to icp Makefile - Add new checksums to zpool_get.cfg - Move checksum switch block from zfs_secpolicy_setprop() to zfs_check_settable() - Fix -Wuninitialized error in edonr_byteorder.h on PPC - Fix stack frame size errors on ARM32 - Don't unroll loops in Skein on 32-bit to save stack space - Add memory barriers in sha2.c on 32-bit to save stack space - Add filetest_001_pos.ksh checksum sanity test - Add option to write psudorandom data in file_write utility