aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* zvol_remove_minors do parallel zvol_freeChunwei Chen2016-12-021-6/+29
| | | | | | | | | On some kernel version, blk_cleanup_queue and put_disk will wait for more then 10ms. So a pool with a lot of zvols will easily wait for more then 1 min if we do zvol_free sequentially. Signed-off-by: Chunwei Chen <[email protected]> Requires-spl: refs/pull/588/head
* zpool_create_minors parallel prefetchChunwei Chen2016-12-021-10/+89
| | | | | | | | Do parallel prefetch all zvol dnodes before actually creating each individual. This will greatly reduce the import time when having a lot of zvols and disk is slow. Signed-off-by: Chunwei Chen <[email protected]>
* zvol: reduce linear list searchChunwei Chen2016-12-011-49/+65
| | | | | | | Use kernel ida to generate minor number, and use hash table to find zvol with name. Signed-off-by: Chunwei Chen <[email protected]>
* Use system_delay_taskq for long delay tasksChunwei Chen2016-12-017-21/+18
| | | | | | | | | Use it for spa_deadman, zpl_posix_acl_free, snapentry_expire. This free system_taskq from the above long delay tasks, and allow us to do taskq_wait_outstanding on system_taskq without being blocked forever, making system_taskq more generic and useful. Signed-off-by: Chunwei Chen <[email protected]>
* zstreamdump needs to initialize fletcher 4 supportTim Chase2016-11-291-0/+2
| | | | | | | Otherwise, the checksum function pointer isn't initialized. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #5411
* Add -c to zpool iostat & status to run commandTony Hutter2016-11-2916-135/+485
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a command (-c) option to zpool status and zpool iostat. The -c option allows you to run an arbitrary command on each vdev and display the first line of output in zpool status/iostat. The environment vars VDEV_PATH and VDEV_UPATH are set to the vdev's path and "underlying path" before running the command. For device mapper, multipath, or partitioned vdevs, VDEV_UPATH is the actual underlying /dev/sd* disk. This can be useful if the command you're running requires a /dev/sd* device. The patch also uses /sys/block/<dev>/slaves/ to lookup the underlying device instead of using libdevmapper. This not only removes the libdevmapper requirement at build time, but also allows you to resolve device mapper devices without being root. This means that UDEV_UPATH get set correctly when running zpool status/iostat as an unprivileged user. Example: $ zpool status -c 'echo I am $VDEV_PATH, $VDEV_UPATH' NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 mpatha ONLINE 0 0 0 I am /dev/mapper/mpatha, /dev/sdc sdb ONLINE 0 0 0 I am /dev/sdb1, /dev/sdb Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #5368
* Allow zfs unshare <protocol> -aLOLi2016-11-2917-46/+287
| | | | | | | | | | | | | | | | | | Allow `zfs unshare <protocol> -a` command to share or unshare all datasets of a given protocol, nfs or smb. Additionally, enable most of ZFS Test Suite zfs_share/zfs_unshare test cases. To work around some Illumos-specific functionalities ($SHARE/$UNSHARE) some function wrappers were added around them. Finally, fix and issue in smb_is_share_active() that would leave SMB shares exported when invoking 'zfs unshare -a' Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #3238 Closes #5367
* Ensure that perf regression tests cleanup properlyGiuseppe Di Natale2016-11-287-7/+49
| | | | | | | | | | | | | | | Each test in the performance regression test suite creates a pool and a dataset for use. Unfortunately, these tests do not cleanup the pool and dataset correctly once they complete. Each test now kills fio and iostat, destroys the dataset, and finally destroys the pool. Each test also now traps the SIGTERM signal to handle cases where test-runner kills a test. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Requires-builders: all Closes #5407
* Enable user_property_002_posChaoyuZhang2016-11-181-2/+1
| | | | | | | The user_property_002_pos passes as expected. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: ChaoyuZhang <[email protected]> Closes #5406
* Kernel 4.9 compat: file_operations->aio_fsync removalDeHackEd2016-11-153-0/+33
| | | | | | | Linux kernel commit 723c038475b78 removed this field. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: DHE <[email protected]> Closes #5393
* Fix man page formatting in zfs-module-parametersDeHackEd2016-11-141-5/+5
| | | | | | | | Bold and Normal codes were mixed up in a few places resulting in bad highlighting. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: DHE <[email protected]> Closes #5397
* Repair indent of zpool.8 man pageHåkan Johansson2016-11-141-0/+3
| | | | | | | | Repair indent of zpool.8 man page, just before zpool labelclear details. Accidentally introduced by 193a37cb2 (git bisect). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Haakan T Johansson <[email protected]> Closes #5394
* Fix 'zpool import' detection issueBrian Behlendorf2016-11-141-3/+16
| | | | | | | | | | | | | | | | | | | | | Before adding the entry to the configuration verify that the device can be opened exclusively. This ensures that as long as multipathd is running the underlying multipath devices, which otherwise appear identical to their /dev/mapper counterpart, are pruned from the configuration. Failure to do so can result in a result in the vdev appearing as UNAVAIL when the vdev path provided to the kernel can't be opened exclusively. This check would normally be performed in zpool_open_func() but placing it there would result in false positives because it is called concurrently for many devices. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #5387
* Add a statechange notify zedletDon Brady2016-11-106-82/+125
| | | | | | | | | | Now that ZED has internal fault diagnosis and the statechange event is generated for faulted states, we can replace the io-notify and checksum-notify zedlets with one based on statechange. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #5383
* Fix coverity defects: CID 147503luozhengzheng2016-11-101-0/+21
| | | | | | | CID 147503: Dereference after null check (FORWARD_NULL) Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5326
* Fix coverity defects: CID 147540, 147542cao2016-11-092-3/+2
| | | | | | | | | | | CID 147540: unsigned_compare - Cast nsec to a int32_t to properly detect the expected overflow. CID 147542: unsigned_compare - intval can never be less than ZIO_FAILURE_MODE_WAIT which is defined to be zero. Remove this useless check. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5379
* Fix ZFS_AC_KERNEL_SET_CACHED_ACL_USABLE checkGvozden Neskovic2016-11-091-2/+2
| | | | | | | | | | | | | Pass `ACL_TYPE_ACCESS` for type parameter of `set_cached_acl()` and `forget_cached_acl()` to avoid removal of dead code after BUG() in compile time. Tested on 3.2.0 kernel. Introduced in 3779913 Reviewed-by: Massimo Maggi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Gvozden Neskovic <[email protected]> Closes #5378
* Export symbol dmu_objset_userobjspace_upgradablejxiong2016-11-092-8/+11
| | | | | | | | | It's used by Lustre to determine if the objset can be upgraded. The inline version doesn't work because dmu_objset_is_snapshot() is not exported. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jinshan Xiong <[email protected]> Closes #5385
* Linux 3.14 compat: assign inode->set_acltuxoko2016-11-095-7/+42
| | | | | | | | | | | | | Linux 3.14 introduces inode->set_acl(). Normally, acl modification will come from setxattr, which will handle by the acl xattr_handler, and we already handles that well. However, nfsd will directly calls inode->set_acl or return error if it doesn't exists. Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Massimo Maggi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #5371 Closes #5375
* Fix symlinks for {vdev_clear,statechange}-led.shOlaf Faaland2016-11-091-2/+2
| | | | | | | | | These were named in the zed/Makefile.am as vdev_clear-blinkled.sh and statechange-blinkled.sh causing bad symlinks to be created. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #5384
* Fix coverity defects: CID 147586cao2016-11-081-3/+3
| | | | | | | | CID 147586: function:allow_usage Type:out-of-bounds read Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5364
* Fix coverity defects: CID 147629cao2016-11-081-1/+2
| | | | | | | CID 147629: Type:Dereference before null check Reviewed-by: Brian Behlendorf <[email protected] Signed-off-by: <cao.xuewen [email protected]> Closes #5376
* Fix coverity defects: 154021luozhengzheng2016-11-081-0/+3
| | | | | | | | CID 154021: Null pointer dereference Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5380
* Fix coverity defects: CID 147626, 147628cao2016-11-082-11/+8
| | | | | | | | | CID 147626: Type:Dereference before null check CID 147628: Type:Dereference before null check Reviewed-by: Brian Behlendorf <[email protected] Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5304
* Skip test suites on 32-bit TEST buildersBrian Behlendorf2016-11-081-0/+11
| | | | | | | | The ztest, filebench, xfstests, and zfsstress test suites should be skipped when testing on 32-bit platforms until they pass reliably. Signed-off-by: Brian Behlendorf <[email protected]> Closes #5381
* Add illumos FMD ZFS logic to ZED -- phase 2Don Brady2016-11-0717-338/+3592
| | | | | | | | | | | | | | | | | | | | | | | The phase 2 work primarily entails the Diagnosis Engine and the Retire Agent modules. It also includes infrastructure to support a crude FMD environment to host these modules. The Diagnosis Engine consumes I/O and checksum ereports and feeds them into a SERD engine which will generate a corres- ponding fault diagnosis when the SERD engine fires. All the diagnosis state data is collected into cases, one case per vdev being tracked. The Retire Agent responds to diagnosed faults by isolating the faulty VDEV. It will notify the ZFS kernel module of the new VDEV state (degraded or faulted). This agent is also responsible for managing hot spares across pools. When it encounters a device fault or a device removal it replaces the device with an appropriate spare if available. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #5343
* Fix coverity defects: CID 147575, 147577, 147578, 147579cao2016-11-074-4/+4
| | | | | | | | | | CID 147575, Type:Unintentional integer overflow CID 147577, Type:Unintentional integer overflow CID 147578, Type:Unintentional integer overflow CID 147579, Type:Unintentional integer overflow Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5365
* Use set_cached_acl and forget_cached_acl when possibleChunwei Chen2016-11-073-8/+38
| | | | | | | | | Originally, these two function are inline, so their usability is tied to posix_acl_release. However, since Linux 3.14, they became EXPORT_SYMBOL, so we can always use them. In this patch, we create an independent test for these two functions so we can use them when possible. Signed-off-by: Chunwei Chen <[email protected]>
* Batch free zpl_posix_acl_releaseChunwei Chen2016-11-073-9/+107
| | | | | | | | | | | | | | | | | | | | | | | Currently every calls to zpl_posix_acl_release will schedule a delayed task, and each delayed task will add a timer. This used to be fine except for possibly bad performance impact. However, in Linux 4.8, a new timer wheel implementation[1] is introduced. In this new implementation, the larger the delay, the less accuracy the timer is. So when we have a flood of timer from zpl_posix_acl_release, they will expire at the same time. Couple with the fact that task_expire will do linear search with lock held. This causes an extreme amount of contention inside interrupt and would actually lockup the system. We fix this by doing batch free to prevent a flood of delayed task. Every call to zpl_posix_acl_release will put the posix_acl to be freed on a lockless list. Every batch window, 1 sec, the zpl_posix_acl_free will fire up and free every posix_acl that passed the grace period on the list. This way, we only have one delayed task every second. [1] https://lwn.net/Articles/646950/ Signed-off-by: Chunwei Chen <[email protected]>
* Fix 'zpool import' detection issuesBrian Behlendorf2016-11-073-35/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch addresses multiple 'zpool import' block device indentification problems which are most likely to occur on a system configured to use blkid, by_vdev paths, multipath and failover. The symptom most commonly observed is the import uses different path names to import the pool than would normally be expected. * When using blkid to identify vdevs the listed devices may be added to the cache in any order. In order to apply the preferred search order heuristic a zfs_path_order() function was added to calculate the order given full path names. * Since it's possible to have multiple block devices with different vdev guids which refer to the same ZPOOL_CONFIG_PATH the slice cache must be indexed by guid and name. By avoiding collisions the preferred ordering can be maintaining even when multiple block devices claim the same ZPOOL_CONFIG_PATH. The preferred sorting by partition was never benefitial for a Linux system and was removed as part of this change. * When adding entries to the blkid cache avl_find/avl_insert are used instead of avl_add because collisions are possible and must be handled gracefully. * For pools using multipath devices there are, at a minimum, three devices where a vdev label may be read. They are the dm-* device and each underlying /dev/sd* device. Due to the way the block cache is implemented each of these devices may have a different cached copy of the vdev label. This can result in "ghost pools" which appear to persist even after a 'zpool labelclear' has been done to the dm-* device. In order to prevent this the vdev label is read with O_DIRECT in order to bypass any caching to get the on-disk version. * When opening a block device verify that vdev guid read from the disk matches the expected vdev guid. This allows for bad labels to be filtered out. Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #5359
* Allow 16M zio buffers in user spaceBrian Behlendorf2016-11-071-1/+1
| | | | | | | | | Only restrict the maximum zio alloc size to 32-bit kernel space. The same virtual address space limitations don't apply to user space. This resolves a memory allocation failure in raidz_test where it expects to be able to exercises all valid zio sizes. Signed-off-by: Brian Behlendorf <[email protected]>
* Replace ISAINFO with is_32bit functionBrian Behlendorf2016-11-074-4/+2
| | | | | | | | | The isainfo(1) utility was used by the ZFS Test Suite to determine when running on a 32-bit platform. This non-portable check has been replaced with an is_32bit helper function which uses getconf(1). The getconf(1) utility is available for Linux, FreeBSD, and Illumos. Signed-off-by: Brian Behlendorf <[email protected]>
* Allow autoreplace even when enclosure LED sysfs entries don't existTony Hutter2016-11-041-2/+2
| | | | | | | | | | | | The previous autoreplace code assumed that if you were using autoreplace, then you also had the enclosure SES driver loaded. This could lead to autoreplace not working if the SES driver wasn't loaded, or if it wasn't creating the proper enclosure_device symlinks (which has happened). This patch removes that assumption. Reviewed by: Don Brady <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #5363
* Add superscalar fletcher4Romain Dolbeau2016-11-048-2/+405
| | | | | | | | | | | | This is the Fletcher4 algorithm implemented in pure C, but using multiple counters using algorithms identical to those used for SSE/NEON and AVX2. This allows for faster execution on core with strong superscalar capabilities but weak SIMD capabilities. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes #5317
* Add support for O_TMPFILEChunwei Chen2016-11-0418-8/+644
| | | | | | | | | | | | | | | Linux 3.11 add O_TMPFILE to open(2), which allow creating an unlinked file on supported filesystem. It's basically doing open(2) and unlink(2) atomically. The filesystem support is added through i_op->tmpfile. We basically copy the create operation except we get rid of the link and name related stuff and add the new node to unlinked set. We also add support for linkat(2) to link tmpfile. However, since all previous file operation will skip ZIL, we force a txg_wait_synced to make sure we are sync safe. Signed-off-by: Chunwei Chen <[email protected]>
* Fix unlinked file cannot do xattr operationsChunwei Chen2016-11-045-42/+68
| | | | | | | | | | | | | | | | | Currently, doing things like fsetxattr(2) on an unlinked file will result in ENODATA. There's two places that cause this: zfs_dirent_lock and zfs_zget. The fix in zfs_dirent_lock is pretty straightforward. In zfs_zget though, we need it to not return error when the zp is unlinked. This is a pretty big change in behavior, but skimming through all the callers, I don't think this change would cause any problem. Also there's nothing preventing z_unlinked from being set after the z_lock mutex is dropped before but before zfs_zget returns anyway. The rest of the stuff is to make sure we don't log xattr stuff when owner is unlinked. Signed-off-by: Chunwei Chen <[email protected]>
* Add parity generation/rebuild using AVX-512 for x86-64Romain Dolbeau2016-11-028-0/+971
| | | | | | | | | | | | | | | avx512f should work on all AVX512 hardware, since it only uses Foundation instructions. avx512bw should be faster on hardware supporting the AVW512BW extension. We can use full-width pshufb (instead of relying on the 256 bits AVX2 pshufb). As a side-effect, the code is also unrolled more. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Gvozden Neskovic <[email protected]> Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes #5219
* Fix dsl_prop_get_all_dsl() memory leakBearBabyLiu2016-11-022-1/+7
| | | | | | | | | | | On error dsl_prop_get_all_ds() does not free the nvlist it allocates. This behavior may have been intentional when originally written but is atypical and often confusing. Since no callers rely on this behavior the function has been updated to always free the nvlist on error. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: BearBabyLiu <[email protected]> Closes #5320
* Skip async_destroy_001_pos on 32-bit systemsBrian Behlendorf2016-11-022-0/+17
| | | | | | | | | | | | The async_destroy_001_pos test case currently hangs when testing on a 32-bit system. Conditionally skip this test case on 32-bit systems until the root cause is identified and resolved. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #5352 Issue #5347
* Use vmem_size() for 32-bit systemsBrian Behlendorf2016-11-021-20/+41
| | | | | | | | | | | | | On 32-bit Linux systems use vmem_size() to correctly size the ARC and better determine when IO should be throttle due to low memory. On 64-bit systems this change has no effect since the virtual address space available far exceeds the physical memory available. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #5347
* Fix 32-bit maximum volume sizeBrian Behlendorf2016-11-022-1/+4
| | | | | | | | | | | A limit of 1TB exists for zvols on 32-bit systems. Update the code to correctly reflect this limitation in a similar manor as the OpenZFS implementation. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #5347
* Enable .zfs/snapshot for 32-bit systemsBrian Behlendorf2016-11-021-4/+0
| | | | | | | | | | | | Originally the .zfs/snapshot directory was disabled for 32-bit systems because 64-bit inode numbers were not supported. This is no longer the case and this functionality can be enabled by default. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #5347 Closes #2002
* Add TASKQID_INVALIDBrian Behlendorf2016-11-0212-19/+22
| | | | | | | | | | | | Add the TASKQID_INVALID macros and update callers to use the macro instead of testing against 0. There is no functional change even though the functions in zfs_ctldir.c incorrectly used -1 instead of 0. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #5347
* Process all systemd services through the systemd scriptletsNeal Gompa (ニール・ゴンパ)2016-11-021-3/+4
| | | | | | | | | | | | | This patch ensures that all systemd services are processed through the systemd scriptlets, so that services are properly configured per the preset file installed by the package. Without this, zfs.target is set, but none of the services are enabled per the preset file, meaning automounting filesystems and such won't work out of the box. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Neal Gompa <[email protected]> Closes #5356
* Fix sa_legacy_attr_count to use ARRAY_SIZEcao2016-11-021-1/+1
| | | | | | | | Replace magic value 16 with ARRAY_SIZE() to correctly handle when the sa_legacy_attrs array size changes. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5354
* Fix coverity defects: CID 147553cao2016-11-011-1/+2
| | | | | | | CID 147553: Type:Dereference null return value Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5305
* Fix coverity defects: CID 147548cao2016-10-313-5/+7
| | | | | | | CID 147548: Type:Dereference null return value Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5321
* Fix coverity defects: CID 152975cao2016-10-311-2/+7
| | | | | | | CID 152975: Type:Dereference null return value Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5322
* Fix coverity defects: CID 147509GeLiXin2016-10-311-2/+12
| | | | | | | | CID 147509: Explicit null dereferenced - l2arc_sublist_lock is fragile as relied on caller too much. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: GeLiXin <[email protected]> Closes #5319
* Update migration testslegend-hua2016-10-311-10/+3
| | | | | | | | | | | Due to the instability of the migration tests, the test will skip. The migration tests focus on migrating test file from fs to ZFS fs. We can create zpool and ext2 directly by loop device, rather than by set_partition Reviewed-by: Sydney Vanda <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: legend-hua <[email protected]> Closes #5315