aboutsummaryrefslogtreecommitdiffstats
path: root/include/sys/zfs_vnops.h
Commit message (Collapse)AuthorAgeFilesLines
* Add pn_alloc()/pn_free() functionsBrian Behlendorf2016-04-211-2/+2
| | | | | | | | | | | | | | In order to remove the HAVE_PN_UTILS wrappers the pn_alloc() and pn_free() functions must be implemented. The existing illumos implementation were used for this purpose. The `flags` argument which was used in places wrapped by the HAVE_PN_UTILS condition has beed added back to zfs_remove() and zfs_link() functions. This removes a small point of divergence between the ZoL code and upstream. Signed-off-by: Brian Behlendorf <[email protected]> Closes #4522
* Add zfs_iput_async() interfaceBrian Behlendorf2014-08-111-0/+1
| | | | | | | | | | | Handle all iputs in zfs_purgedir() and zfs_inode_destroy() asynchronously to prevent deadlocks. When the iputs are allowed to run synchronously in the destroy call path deadlocks between xattr directory inodes and their parent file inodes are possible. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #457
* Linux 3.11 compat: fops->iterate()Richard Yao2013-08-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit torvalds/linux@2233f31aade393641f0eaed43a71110e629bb900 replaced ->readdir() with ->iterate() in struct file_operations. All filesystems must now use the new ->iterate method. To handle this the code was reworked to use the new ->iterate interface. Care was taken to keep the majority of changes confined to the ZPL layer which is already Linux specific. However, minor changes were required to the common zfs_readdir() function. Compatibility with older kernels was accomplished by adding versions of the trivial dir_emit* helper functions. Also the various *_readdir() functions were reworked in to wrappers which create a dir_context structure to pass to the new *_iterate() functions. Unfortunately, the new dir_emit* functions prevent us from passing a private pointer to the filldir function. The xattr directory code leveraged this ability through zfs_readdir() to generate the list of xattr names. Since we can no longer use zfs_readdir() a simplified zpl_xattr_readdir() function was added to perform the same task. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1653 Issue #1591
* Add SEEK_DATA/SEEK_HOLE to lseek()/llseek()Li Dongyang2013-07-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | The approach taken was the rework zfs_holey() as little as possible and then just wrap the code as needed to ensure correct locking and error handling. Tested with xfstests 285 and 286. All tests pass except for 7-9 of 285 which try to reserve blocks first via fallocate(2) and fail because fallocate(2) is not yet supported. Note that the filp->f_lock spinlock did not exist prior to Linux 2.6.30, but we avoid the need for autotools check by virtue of the fact that SEEK_DATA/SEEK_HOLE support was not added until Linux 3.1. An autoconf check was added for lseek_execute() which is currently a private function but the expectation is that it will be exported perhaps as early as Linux 3.11. Reviewed-by: Richard Laager <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1384
* Update SAs when an inode is dirtiedBrian Behlendorf2012-12-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Revert the portion of commit d3aa3ea which always resulted in the SAs being update when an mmap()'ed file was closed. That change accidentally resulted in unexpected ctime updates which upset tools like git. That was always a horrible hack and I'm happy it will never make it in to a tagged release. The right fix is something I initially resisted doing because I was worried about the additional overhead. However, in hindsight the overhead isn't as bad as I feared. This patch implemented the sops->dirty_inode() callback which is unsurprisingly called when an inode is dirtied. We leverage this callback to keep the znode SAs strictly in sync with the inode. However, for now we're going to go slowly to avoid introducing any new unexpected issues by only updating the atime, mtime, and ctime. This will cover the callpath of most concern to us. ->filemap_page_mkwrite->file_update_time->update_time-> mark_inode_dirty_sync->__mark_inode_dirty->dirty_inode Signed-off-by: Brian Behlendorf <[email protected]> Closes #764 Closes #1140
* Cleanup mmap(2) writesBrian Behlendorf2011-08-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While the existing implementation of .writepage()/zpl_putpage() was functional it was not entirely correct. In particular, it would move dirty pages in to a clean state simply after copying them in to the ARC cache. This would result in the pages being lost if the system were to crash enough though the Linux VFS believed them to be safe on stable storage. Since at the moment virtually all I/O, except mmap(2), bypasses the page cache this isn't as bad as it sounds. However, as hopefully start using the page cache more getting this right becomes more important so it's good to improve this now. This patch takes a big step in that direction by updating the code to correctly move dirty pages through a writeback phase before they are marked clean. When a dirty page is copied in to the ARC it will now be set in writeback and a completion callback is registered with the transaction. The page will stay in writeback until the dmu runs the completion callback indicating the page is on stable storage. At this point the page can be safely marked clean. This process is normally entirely asynchronous and will be repeated for every dirty page. This may initially sound inefficient but most of these pages will end up in a few txgs. That means when they are eventually written to disk they should be nicely batched. However, there is room for improvement. It may still be desirable to batch up the pages in to larger writes for the dmu. This would reduce the number of callbacks and small 4k buffer required by the ARC. Finally, if the caller requires that the I/O be done synchronously by setting WB_SYNC_ALL or if ZFS_SYNC_ALWAYS is set. Then the I/O will trigger a zil_commit() to flush the data to stable storage. At which point the registered callbacks will be run leaving the date safe of disk and marked clean before returning from .writepage. Signed-off-by: Brian Behlendorf <[email protected]>
* Improve fstat(2) performanceBrian Behlendorf2011-07-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is at most a factor of 3x performance improvement to be had by using the Linux generic_fillattr() helper. However, to use it safely we need to ensure the values in a cached inode are kept rigerously up to date. Unfortunately, this isn't the case for the blksize, blocks, and atime fields. At the moment the authoritative values are still stored in the znode. This patch introduces an optimized zfs_getattr_fast() call. The idea is to use the up to date values from the inode and the blksize, block, and atime fields from the znode. At some latter date we should be able to strictly use the inode values and further improve performance. The remaining overhead in the zfs_getattr_fast() call can be attributed to having to take the znode mutex. This overhead is unavoidable until the inode is kept strictly up to date. The the careful reader will notice the we do not use the customary ZFS_ENTER()/ZFS_EXIT() macros. These macro's are designed to ensure the filesystem is not torn down in the middle of an operation. However, in this case the VFS is holding a reference on the active inode so we know this is impossible. =================== Performance Tests ======================== This test calls the fstat(2) system call 10,000,000 times on an open file description in a tight loop. The test results show the zfs stat(2) performance is now only 22% slower than ext4. This is a 2.5x improvement and there is a clear long term plan to get to parity with ext4. filesystem | test-1 test-2 test-3 | average | times-ext4 --------------+-------------------------+---------+----------- ext4 | 7.785s 7.899s 7.284s | 7.656s | 1.000x zfs-0.6.0-rc4 | 24.052s 22.531s 23.857s | 23.480s | 3.066x zfs-faststat | 9.224s 9.398s 9.485s | 9.369s | 1.223x The second test is to run 'du' of a copy of the /usr tree which contains 110514 files. The test is run multiple times both using both a cold cache (/proc/sys/vm/drop_caches) and a hot cache. As expected this change signigicantly improved the zfs hot cache performance and doesn't quite bring zfs to parity with ext4. A little surprisingly the zfs cold cache performance is better than ext4. This can probably be attributed to the zfs allocation policy of co-locating all the meta data on disk which minimizes seek times. By default the ext4 allocator will spread the data over the entire disk only co-locating each directory. filesystem | cold | hot --------------+---------+-------- ext4 | 13.318s | 1.040s zfs-0.6.0-rc4 | 4.982s | 1.762s zfs-faststat | 4.933s | 1.345s
* Add ZFS specific mmap() checksBrian Behlendorf2011-07-011-0/+2
| | | | | | | | | | | | | | | | | | Under Linux the VFS handles virtually all of the mmap() access checks. Filesystem specific checks are left to be handled in the .mmap() hook and normally there arn't any. However, ZFS provides a few attributes which can influence the mmap behavior and should be honored. Note, currently the code to modify these attributes has not been implemented under Linux. * ZFS_IMMUTABLE | ZFS_READONLY | ZFS_APPENDONLY: when any of these attributes are set a file may not be mmaped with write access. * ZFS_AV_QUARANTINED: when set a file file may not be mmaped with read or exec access. Signed-off-by: Brian Behlendorf <[email protected]>
* MMAP OptimizationPrasad Joshi2011-07-011-0/+3
| | | | | | | | | | | | | | | | | Enable zfs_getpage, zfs_fillpage, zfs_putpage, zfs_putapage functions. The functions have been modified to make them Linux friendly. ZFS uses these functions to read/write the mmapped pages. Using them from readpage/writepage results in clear code. The patch also adds readpages and writepages interface functions to read/write list of pages in one function call. The code change handles the first mmap optimization mentioned on https://github.com/behlendorf/zfs/issues/225 Signed-off-by: Prasad Joshi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #255
* Add zfs_open()/zfs_close()Brian Behlendorf2011-03-081-0/+2
| | | | | | | | | | | | | | | | | | | In the original implementation the zfs_open()/zfs_close() hooks were dropped for simplicity. This was functional but not 100% correct with the expected ZFS sematics. Updating and re-adding the zfs_open()/zfs_close() hooks resolves the following issues. 1) The ZFS_APPENDONLY file attribute is once again honored. While there are still no Linux tools to set/clear these attributes once there are it should behave correctly. 2) Minimal virus scan file attribute hooks were added. Once again this support in disabled but the infrastructure is back in place. 3) Most importantly correctly handle assigning files which were opened syncronously to the intent log. Without this change O_SYNC modifications could be lost during a system crash even though they were marked synchronous.
* Drop HAVE_XVATTR macrosBrian Behlendorf2011-03-021-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I began work on the Posix layer it immediately became clear to me that to integrate cleanly with the Linux VFS certain Solaris specific things would have to go. One of these things was to elimate as many Solaris specific types from the ZPL layer as possible. They would be replaced with their Linux equivalents. This would not only be good for performance, but for the general readability and health of the code. The Solaris and Linux VFS are different beasts and should be treated as such. Most of the code remains common for constructing transactions and such, but there are subtle and important differenced which need to be repsected. This policy went quite for for certain types such as the vnode_t, and it initially seemed to be working out well for the vattr_t. There was a relatively small amount of related xvattr_t code I was forced to comment out with HAVE_XVATTR. But it didn't look that hard to come back soon and replace it all with a native Linux type. However, after going doing this path with xvattr some distance it clear that this code was woven in the ZPL more deeply than I thought. In particular its hooks went very deep in to the ZPL replay code and replacing it would not be as easy as I originally thought. Rather than continue persuing replacing and removing this code I've taken a step back and reevaluted things. This commit reverts many of my previous commits which removed xvattr related code. It restores much of the code to its original upstream state and now relies on improved xvattr_t support in the zfs package itself. The result of this is that much of the code which I had commented out, which accidentally broke things like replay, is now back in place and working. However, there may be a small performance impact for getattr/setattr operations because they now require a translation from native Linux to Solaris types. For now that's a price I'm willing to pay. Once everything is completely functional we can revisting the issue of removing the vattr_t/xvattr_t types. Closes #111
* Add xvattr supportBrian Behlendorf2011-03-021-0/+1
| | | | | | | | | | | | | | | With the removal of the minimal xvattr support from the spl this support needs to be replaced in the zfs package. This is fairly easily accomplished by directly adding portions of the sys/vnode.h header from OpenSolaris. These xvattr additions have been placed in the sys/xvattr.h header file and included as needed where simply a sys/vnode.h was included before. In additon to the xvattr types and helper macros two functions were also included. The xva_init() and xva_getxoptattr() functions were included as static inline functions in xvattr.h. They are simple enough and it was simpler to place them here rather than in their own .c file.
* Prototype/structure update for LinuxBrian Behlendorf2011-02-101-39/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I appologize in advance why to many things ended up in this commit. When it could be seperated in to a whole series of commits teasing that all apart now would take considerable time and I'm not sure there's much merrit in it. As such I'll just summerize the intent of the changes which are all (or partly) in this commit. Broadly the intent is to remove as much Solaris specific code as possible and replace it with native Linux equivilants. More specifically: 1) Replace all instances of zfsvfs_t with zfs_sb_t. While the type is largely the same calling it private super block data rather than a zfsvfs is more consistent with how Linux names this. While non critical it makes the code easier to read when your thinking in Linux friendly VFS terms. 2) Replace vnode_t with struct inode. The Linux VFS doesn't have the notion of a vnode and there's absolutely no good reason to create one. There are in fact several good reasons to remove it. It just adds overhead on Linux if we were to manage one, it conplicates the code, and it likely will lead to bugs so there's a good change it will be out of date. The code has been updated to remove all need for this type. 3) Replace all vtype_t's with umode types. Along with this shift all uses of types to mode bits. The Solaris code would pass a vtype which is redundant with the Linux mode. Just update all the code to use the Linux mode macros and remove this redundancy. 4) Remove using of vn_* helpers and replace where needed with inode helpers. The big example here is creating iput_aync to replace vn_rele_async. Other vn helpers will be addressed as needed but they should be be emulated. They are a Solaris VFS'ism and should simply be replaced with Linux equivilants. 5) Update znode alloc/free code. Under Linux it's common to embed the inode specific data with the inode itself. This removes the need for an extra memory allocation. In zfs this information is called a znode and it now embeds the inode with it. Allocators have been updated accordingly. 6) Minimal integration with the vfs flags for setting up the super block and handling mount options has been added this code will need to be refined but functionally it's all there. This will be the first and last of these to large to review commits.
* Export required vfs/vn symbolsBrian Behlendorf2011-02-101-0/+80