summaryrefslogtreecommitdiffstats
path: root/module
Commit message (Collapse)AuthorAgeFilesLines
* Fix spellingka72017-01-0333-41/+41
| | | | | | | | | Reviewed-by: Brian Behlendorf <[email protected] Reviewed-by: Giuseppe Di Natale <[email protected]>> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Haakan T Johansson <[email protected]> Closes #5547 Closes #5543
* 4.10 compat - BIO flag changes and othersTim Chase2016-12-302-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [bio] The req_op enum was changed to req_opf. Update the "Linux 4.8 API" autotools checks to use an int to determine whether the various REQ_OP values are defined. This should work properly on kernels >= 4.8. [bio] bio_set_op_attrs() is now an inline function and can't be detected with #ifdef. Add a configure check to determine whether bio_set_op_attrs() is defined. Move the local definition of it from vdev_disk.c to blkdev_compat.h for consistency with other related compability shims. [bio] The read/write flags and their modifiers, including WRITE_FLUSH, WRITE_FUA and WRITE_FLUSH_FUA have been removed from fs.h. Add the new bio_set_flush() compatibility wrapper to replace VDEV_WRITE_FLUSH_FUA and set the flags appropriately for each supported kernel version. [vfs] The generic_readlink() function has been made static. If .readlink in inode_operations is NULL, generic_readlink() is used. [zol typo] Completely unrelated to 4.10 compat, fix a typo in the check for REQ_OP_SECURE_ERASE so that the proper macro is defined: s/HAVE_REQ_OP_SECURE_DISCARD/HAVE_REQ_OP_SECURE_ERASE/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #5499
* Don't persist temporary pool name on devicesLOLi2016-12-221-1/+1
| | | | | | | | | | | Fix a regression accidentally introduced by e0ab3ab. Additionally, add a new script zpool_import_014_pos.ksh to the ZFS test suite to exercise 'zpool import -t' functionality. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #5466 Closes #5515
* Use a dedicated taskq for vdev_fileChunwei Chen2016-12-212-2/+21
| | | | | | | | | | | | | | | | | The introduction of parallel zvol prefetch causes deadlock when using vdev_file. spa_async->(spa_namespace_lock)->txg_wait_synced->(wait for txg_sync) txg_sync->zio_wait->(wait for vdev_file_io_fsync on system_taskq) zvol_prefetch_minors_impl (on system_taskq)->spa_open_common->(wait for spa_namespace_lock) We fix this by using dedicated taskq for vdev_file. This same change was originally made in commit bc25c93 but reverted in commit aa9af22 when dynamic taskqs were added. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #5506 Closes #5495
* Fix dsl_props_set_sync_impl to work with nested nvlistLOLi2016-12-201-5/+9
| | | | | | | | | | | | | | | When iterating over the input nvlist in dsl_props_set_sync_impl() when we don't preserve the nvpair name before looking up ZPROP_VALUE, so when we later go to process it nvpair_name() is always "value" and not the actual property name. This fixes a couple of bugs in zfs_ioc_recv(): * Received properties were not restored correctly when failing to receive an incremental send stream * Received properties were not completely replaced by the new ones when successfully receiving an incremental send stream Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #5497
* Fix file attributesBrian Behlendorf2016-12-192-13/+37
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This branch contains the following fixes/improvements. * Fix setting i_flags * Fix wrong operator in xvattr.h * Fix fchange macro in zpl_ioctl_setflags() * Added configure check to use inode_set_flags() * Added a test case for chattr for better test coverage Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #5486 Closes #5470 Closes #5469
| * Use inode_set_flags when availableChunwei Chen2016-12-161-0/+9
| | | | | | | | Signed-off-by: Chunwei Chen <[email protected]>
| * Fix fchange in zpl_ioctl_setflagsChunwei Chen2016-12-161-2/+1
| | | | | | | | | | | | | | | | The fchange in zpl_ioctl_setflags was for detecting flag change. However it was incorrect and would always fail to detect a flag change from set to unset, causing users without CAP_LINUX_IMMUTABLE to be able to unset flags. Signed-off-by: Chunwei Chen <[email protected]>
| * Fix i_flags issue caused by 64c688dChunwei Chen2016-12-141-11/+27
| | | | | | | | | | | | | | | | | | Fix zfs_xvattr_set to set S_IMMUTABLE and S_APPEND flags correctly. Reinstate zfs_set_inode_flags and use it when zfs_xvatter_set and also when setting up inode in zfs_znode_alloc and zfs_rezget. Signed-off-by: Chunwei Chen <[email protected]>
* | Fix zmo leak when zfs_sb_create failsChunwei Chen2016-12-191-10/+9
| | | | | | | | | | | | | | | | | | zfs_sb_create would normally takes ownership of zmo, and it will be freed in zfs_sb_free. However, when zfs_sb_create fails we need to explicit free it. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #5490 Closes #5496
* | ABD: Adapt avx512bw raidz assemblyGvozden Neskovic2016-12-152-113/+79
|/ | | | | | | | | Adapt avx512bw implementation for use with abd buffers. Mul2 implementation is rewritten to take advantage of the BW instruction set. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Romain Dolbeau <[email protected]> Signed-off-by: Gvozden Neskovic <[email protected]> Closes #5477
* Add ida_destroy in zvol_fini to fix memleakChunwei Chen2016-12-141-0/+2
| | | | | | | | User of ida needs to call ida_destroy after using it. Otherwise ida->free_bitmap and/or other stuff may leak. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #5484
* Fix typos in dbuf.cbunder20152016-12-131-5/+5
| | | | | | | | This removes two large whitespaces in "modinfo zfs" as well as correcting a couple typos. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: bunder2015 <[email protected]> Closes #5475
* Use cstyle -cpP in `make cstyle` checkBrian Behlendorf2016-12-1241-214/+231
| | | | | | | | | | | | | | | | | | | | | | | Enable picky cstyle checks and resolve the new warnings. The vast majority of the changes needed were to handle minor issues with whitespace formatting. This patch contains no functional changes. Non-whitespace changes are as follows: * 8 times ; to { } in for/while loop * fix missing ; in cmd/zed/agents/zfs_diagnosis.c * comment (confim -> confirm) * change endline , to ; in cmd/zpool/zpool_main.c * a number of /* BEGIN CSTYLED */ /* END CSTYLED */ blocks * /* CSTYLED */ markers * change == 0 to ! * ulong to unsigned long in module/zfs/dsl_scan.c * rearrangement of module_param lines in module/zfs/metaslab.c * add { } block around statement after for_each_online_node Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: HÃ¥kan Johansson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #5465
* Don't count '@' for dataset namelen if not a snapshotChunwei Chen2016-12-091-1/+5
| | | | | | | | | | | | | | Don't count '@' for dataset namelen if not a snapshot. This fixes making a pool unimportable when the dataset namelen is 255. Add test file for zfs create name length 255. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #5432 Closes #5456
* Fix coverity defects: CID 154617luozhengzheng2016-12-081-1/+1
| | | | | | | | | | | | | | CID 154617: Memory - illegal accesses (UNINIT) The value here just needs to be initialized to make Coverity happy. When dsize == 0, then value of daiter.iter_mapaddr is irrelevant. That address won't be accessed, it's only used for some arithmetic. dsize can be zero either if dabd is null, or if code column is longer than the current data column. Reviewed-by: Gvozden Neskovic <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5437
* Speed up zvol import and export speedBrian Behlendorf2016-12-086-86/+195
|\ | | | | | | | | | | | | | | | | | | | | | | | | Speed up import and export speed by: * Add system delay taskq * Parallel prefetch zvol dnodes during zvol_create_minors * Parallel zvol_free during zvol_remove_minors * Reduce list linear search using ida and hash Reviewed-by: Boris Protopopov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #5433
| * zvol_remove_minors do parallel zvol_freeChunwei Chen2016-12-021-6/+29
| | | | | | | | | | | | | | | | | | On some kernel version, blk_cleanup_queue and put_disk will wait for more then 10ms. So a pool with a lot of zvols will easily wait for more then 1 min if we do zvol_free sequentially. Signed-off-by: Chunwei Chen <[email protected]> Requires-spl: refs/pull/588/head
| * zpool_create_minors parallel prefetchChunwei Chen2016-12-021-10/+89
| | | | | | | | | | | | | | | | Do parallel prefetch all zvol dnodes before actually creating each individual. This will greatly reduce the import time when having a lot of zvols and disk is slow. Signed-off-by: Chunwei Chen <[email protected]>
| * zvol: reduce linear list searchChunwei Chen2016-12-011-49/+65
| | | | | | | | | | | | | | Use kernel ida to generate minor number, and use hash table to find zvol with name. Signed-off-by: Chunwei Chen <[email protected]>
| * Use system_delay_taskq for long delay tasksChunwei Chen2016-12-015-21/+12
| | | | | | | | | | | | | | | | | | Use it for spa_deadman, zpl_posix_acl_free, snapentry_expire. This free system_taskq from the above long delay tasks, and allow us to do taskq_wait_outstanding on system_taskq without being blocked forever, making system_taskq more generic and useful. Signed-off-by: Chunwei Chen <[email protected]>
* | Revert "Disable zio_dva_throttle_enabled by default"Brian Behlendorf2016-12-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | Enable zio_dva_throttle_enabled=1 by default. Subsequent testing has been unable to reproduce the suspected regression. Tested-by: kernelOfTruth [email protected] Reviewed-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf [email protected] Reverts #5335 Closes #5289 Closes #5457
* | Cache ddt_get_dedup_dspace() value if there was no ddt changesGvozden Neskovic2016-12-022-2/+14
| | | | | | | | | | | | | | | | Save and reuse ddt dspace calculation when there have been no ddt changes. This avoids unnecessary traversal of 168KiB of ddt histograms. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Gvozden Neskovic <[email protected]> Closes #5425
* | Refactor txg history kstatBrian Behlendorf2016-12-022-32/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It was observed that even when the txg history is disabled by setting `zfs_txg_history=0` the txg_sync thread still fetches the vdev stats unnecessarily. This patch refactors the code such that vdev_get_stats() is no longer called when `zfs_txg_history=0`. And it further reduces the differences between upstream and the ZoL txg_sync_thread() function. Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #5412
* | OpenZFS 7143 - dbuf_read() creates unnecessary zio_root() for bonus bufBrian Behlendorf2016-12-011-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dbuf_read() creates a zio_root() to track and wait for all the zio's that may happen as part of this call. However, if the blkptr_t for this buffer is NULL or a hole, we will not create any more zio's, so this zio_root() is unnecessary. This is always the case when calling dbuf_read() on a bonus buffer, because it has no blkptr (it's part of the containing dnode). For workloads that read a lot of bonus buffers (e.g. file creation and removal), creating and destroying these unnecessary zio's can decrease performance by around 3%. The fix is to only create/destroy the zio_root() in dbuf_read() if the blkptr is not NULL and not a hole. Changes sponsored by Intel Corp. Authored by: Matthew Ahrens <[email protected]> Reviewed-by: Alex Zhuravlev <[email protected]> Ported-by: Brian Behlendorf <[email protected]> Issue openzfs/openzfs#137 Closes #4803 Closes #5382
* | Fix incorrect operator in abd_alloc_sametype()luozhengzheng2016-12-011-1/+1
| | | | | | | | | | | | | | | | This should be & and not | so is_metadata is set correctly. Reviewed-by: Dan Kimmel <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5438
* | Remove unused sa_update_from_cb()cao2016-12-011-21/+0
| | | | | | | | | | | | | | | | | | | | It looks like this was functionality which was added in the original SA implementation and then never needed. It can be safely removed now and easily added back if we find a use for it. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5440
* | Compile zio.h and zio_impl.h mutual includecao2016-12-011-1/+1
| | | | | | | | | | | | | | | | | | zio.h includes zio_impl.h but zio_impl.h also includes zio.h, so the header files to contain each other. Get rid of the zio_impl.h include in zio.h and update zio_inject.c to include zio.h instead of zio_impl.h. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5439
* | Convert zio_buf_alloc() consumersBrian Behlendorf2016-11-305-18/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In multiple cases zio_buf_alloc() was used instead of kmem_alloc() or vmem_alloc(). This was often done because the allocations could be large and it was easy to use zfs_buf_alloc() for them. But this isn't ideal for allocations which are small or short lived. In these cases it is better to use kmem_alloc() or vmem_alloc(). If possible we want to avoid the case where we have slabs allocated for kmem caches which are rarely used. Note for small allocations vmem_alloc() will be internally converted to kmem_alloc(). Therefore as long as large allocations are infrequent and short lived the penalty for using vmem_alloc() is small. Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #5409
* | ABD optimized page allocation codeChunwei Chen2016-11-291-131/+412
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Convert ABD to use the Linux Kernel scatterlist implementation instead of the hand rolled one from illumos. * Scatter ABDs are preferentially populated with higher order compound pages from a single zone. Allocation size is progressively decreased until it can be satisfied without performing reclaim or compaction. * An alternate page allocator is provided for kernels older than 3.6 and for CONFIG_HIGHMEM systems. This allocator is designed as a fallback for maximum compatibility. * Extended abdstats to provide visibility in the the allocator. * Add cached value for PAGESIZE in userspace. Contributions-by: Chunwei Chen <[email protected]> Gvozden Neskovic <[email protected]> Jinshan Xiong <[email protected]> Isaac Huang <[email protected]> David Quigley <[email protected]> Brian Behlendorf <[email protected]>
* | ABD kmap to kmap_atomicChunwei Chen2016-11-291-35/+49
| | | | | | | | | | Convert usage of kmap to kmap_atomic while correctly saving off irq state.
* | ABD raidz NEON supportRomain Dolbeau2016-11-294-94/+229
| | | | | | | | | | | | Port NEON implementation of RAID-Z functions to ABD. Signed-off-by: Roomain Dolbeau <[email protected]>
* | ABD raidz avx512f supportGvozden Neskovic2016-11-296-391/+258
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement shift based multiplication for 512f. Higher IPC over lookup based methods yields up to 40% better performance on the current hardware. Results on Xeon Phi(TM) CPU 7210: implementation gen_p gen_pq gen_pqr rec_p rec_q rec_r rec_pq rec_pr rec_qr rec_pqr original 142232671 24411492 12948205 283053705 22348167 4215911 9171609 2265548 2378370 1648495 scalar 295711162 49851491 33253815 293198109 88179448 61866752 27941684 25764416 17384442 12138153 sse2 410055998 199642658 117973654 406240463 152688682 121092250 84968180 79291076 47473657 20779719 ssse3 411641595 199669571 117937647 406211024 137638508 117050346 81263322 76120405 46281559 32696722 avx2 616485806 311515332 188595628 605455115 260602390 230554476 148198817 138800254 92273356 62937819 avx512f 832191523 408509425 253599522 810094481 404325734 317590971 218235687 197204920 133101937 94001219 fastest avx512f avx512f avx512f avx512f avx512f avx512f avx512f avx512f avx512f avx512f Signed-off-by: Gvozden Neskovic <[email protected]>
* | ABD Vectorized raidzGvozden Neskovic2016-11-2910-979/+1363
| | | | | | | | | | | | | | | | | | Enable vectorized raidz code on ABD buffers. The avx512f, avx512bw, neon and aarch64_neonx2 are disabled in this commit. With the exception of avx512bw these implementations are updated for ABD in the subsequent commits. Signed-off-by: Gvozden Neskovic <[email protected]>
* | ABD changes for vectorized RAIDZGvozden Neskovic2016-11-292-14/+200
| | | | | | | | | | | | | | | | | | | | * userspace: aligned buffers. Minimum of 32B alignment is needed for AVX2. Kernel buffers are aligned 512B or more. * add abd_get_offset_size() interface * abd_iter_map(): fix calculation of iter_mapsize * add abd_raidz_gen_iterate() and abd_raidz_rec_iterate() Signed-off-by: Gvozden Neskovic <[email protected]>
* | ABD page support to vdev_disk.cIsaac Huang2016-11-292-56/+83
| | | | | | | | Signed-off-by: Isaac Huang <[email protected]>
* | DLPX-44812 integrate EP-220 large memory scalabilityDavid Quigley2016-11-2931-688/+2294
|/
* Kernel 4.9 compat: file_operations->aio_fsync removalDeHackEd2016-11-151-0/+11
| | | | | | | Linux kernel commit 723c038475b78 removed this field. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: DHE <[email protected]> Closes #5393
* Fix coverity defects: CID 147503luozhengzheng2016-11-101-0/+21
| | | | | | | CID 147503: Dereference after null check (FORWARD_NULL) Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5326
* Fix coverity defects: CID 147540, 147542cao2016-11-091-2/+1
| | | | | | | | | | | CID 147540: unsigned_compare - Cast nsec to a int32_t to properly detect the expected overflow. CID 147542: unsigned_compare - intval can never be less than ZIO_FAILURE_MODE_WAIT which is defined to be zero. Remove this useless check. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5379
* Export symbol dmu_objset_userobjspace_upgradablejxiong2016-11-091-0/+10
| | | | | | | | | It's used by Lustre to determine if the objset can be upgraded. The inline version doesn't work because dmu_objset_is_snapshot() is not exported. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Jinshan Xiong <[email protected]> Closes #5385
* Linux 3.14 compat: assign inode->set_acltuxoko2016-11-092-6/+15
| | | | | | | | | | | | | Linux 3.14 introduces inode->set_acl(). Normally, acl modification will come from setxattr, which will handle by the acl xattr_handler, and we already handles that well. However, nfsd will directly calls inode->set_acl or return error if it doesn't exists. Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Massimo Maggi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #5371 Closes #5375
* Fix coverity defects: CID 147626, 147628cao2016-11-082-11/+8
| | | | | | | | | CID 147626: Type:Dereference before null check CID 147628: Type:Dereference before null check Reviewed-by: Brian Behlendorf <[email protected] Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5304
* Add illumos FMD ZFS logic to ZED -- phase 2Don Brady2016-11-071-0/+11
| | | | | | | | | | | | | | | | | | | | | | | The phase 2 work primarily entails the Diagnosis Engine and the Retire Agent modules. It also includes infrastructure to support a crude FMD environment to host these modules. The Diagnosis Engine consumes I/O and checksum ereports and feeds them into a SERD engine which will generate a corres- ponding fault diagnosis when the SERD engine fires. All the diagnosis state data is collected into cases, one case per vdev being tracked. The Retire Agent responds to diagnosed faults by isolating the faulty VDEV. It will notify the ZFS kernel module of the new VDEV state (degraded or faulted). This agent is also responsible for managing hot spares across pools. When it encounters a device fault or a device removal it replaces the device with an appropriate spare if available. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #5343
* Fix coverity defects: CID 147575, 147577, 147578, 147579cao2016-11-073-3/+3
| | | | | | | | | | CID 147575, Type:Unintentional integer overflow CID 147577, Type:Unintentional integer overflow CID 147578, Type:Unintentional integer overflow CID 147579, Type:Unintentional integer overflow Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5365
* Batch free zpl_posix_acl_releaseChunwei Chen2016-11-072-1/+104
| | | | | | | | | | | | | | | | | | | | | | | Currently every calls to zpl_posix_acl_release will schedule a delayed task, and each delayed task will add a timer. This used to be fine except for possibly bad performance impact. However, in Linux 4.8, a new timer wheel implementation[1] is introduced. In this new implementation, the larger the delay, the less accuracy the timer is. So when we have a flood of timer from zpl_posix_acl_release, they will expire at the same time. Couple with the fact that task_expire will do linear search with lock held. This causes an extreme amount of contention inside interrupt and would actually lockup the system. We fix this by doing batch free to prevent a flood of delayed task. Every call to zpl_posix_acl_release will put the posix_acl to be freed on a lockless list. Every batch window, 1 sec, the zpl_posix_acl_free will fire up and free every posix_acl that passed the grace period on the list. This way, we only have one delayed task every second. [1] https://lwn.net/Articles/646950/ Signed-off-by: Chunwei Chen <[email protected]>
* Allow 16M zio buffers in user spaceBrian Behlendorf2016-11-071-1/+1
| | | | | | | | | Only restrict the maximum zio alloc size to 32-bit kernel space. The same virtual address space limitations don't apply to user space. This resolves a memory allocation failure in raidz_test where it expects to be able to exercises all valid zio sizes. Signed-off-by: Brian Behlendorf <[email protected]>
* Add superscalar fletcher4Romain Dolbeau2016-11-046-2/+396
| | | | | | | | | | | | This is the Fletcher4 algorithm implemented in pure C, but using multiple counters using algorithms identical to those used for SSE/NEON and AVX2. This allows for faster execution on core with strong superscalar capabilities but weak SIMD capabilities. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes #5317
* Add support for O_TMPFILEChunwei Chen2016-11-043-8/+194
| | | | | | | | | | | | | | | Linux 3.11 add O_TMPFILE to open(2), which allow creating an unlinked file on supported filesystem. It's basically doing open(2) and unlink(2) atomically. The filesystem support is added through i_op->tmpfile. We basically copy the create operation except we get rid of the link and name related stuff and add the new node to unlinked set. We also add support for linkat(2) to link tmpfile. However, since all previous file operation will skip ZIL, we force a txg_wait_synced to make sure we are sync safe. Signed-off-by: Chunwei Chen <[email protected]>
* Fix unlinked file cannot do xattr operationsChunwei Chen2016-11-044-42/+67
| | | | | | | | | | | | | | | | | Currently, doing things like fsetxattr(2) on an unlinked file will result in ENODATA. There's two places that cause this: zfs_dirent_lock and zfs_zget. The fix in zfs_dirent_lock is pretty straightforward. In zfs_zget though, we need it to not return error when the zp is unlinked. This is a pretty big change in behavior, but skimming through all the callers, I don't think this change would cause any problem. Also there's nothing preventing z_unlinked from being set after the z_lock mutex is dropped before but before zfs_zget returns anyway. The rest of the stuff is to make sure we don't log xattr stuff when owner is unlinked. Signed-off-by: Chunwei Chen <[email protected]>