summaryrefslogtreecommitdiffstats
path: root/module/zfs
Commit message (Collapse)AuthorAgeFilesLines
* Add Hooks for Linux Xattr OperationsBrian Behlendorf2011-02-101-0/+432
| | | | | | | | The Linux specific xattr operations have all been located in the file zpl_xattr.c. These functions primarily rely on the reworked zfs_* functions to do their job. They are also responsible for converting the possible Solaris style error codes to negative Linux errors.
* Add Hooks for Linux Super Block OperationsBrian Behlendorf2011-02-101-0/+168
| | | | | | | | The Linux specific super block operations have all been located in the file zpl_super.c. These functions primarily rely on the reworked zfs_* functions to do their job. They are also responsible for converting the possible Solaris style error codes to negative Linux errors.
* Add Hooks for Linux Inode OperationsBrian Behlendorf2011-02-101-0/+331
| | | | | | | | The Linux specific inode operations have all been located in the file zpl_inode.c. These functions primarily rely on the reworked zfs_* functions to do their job. They are also responsible for converting the possible Solaris style error codes to negative Linux errors.
* Add Hooks for Linux File OperationsBrian Behlendorf2011-02-102-0/+177
| | | | | | | | | | | | | The Linux specific file operations have all been located in the file zpl_file.c. These functions primarily rely on the reworked zfs_* functions to do their job. They are also responsible for converting the possible Solaris style error codes to negative Linux errors. This first zpl_* commit also includes a common zpl.h header with minimal entries to register the Linux specific hooks. In also adds all the new zpl_* file to the Makefile.in. This is not a standalone commit, you required the following zpl_* commits.
* Wrap with HAVE_XVATTRBrian Behlendorf2011-02-102-167/+213
| | | | | | For the moment exactly how to handle xvattr is not clear. This change largely consists of the code to comment out the offending bits until something reasonable can be done.
* Add zp->z_is_zvol flagBrian Behlendorf2011-02-102-2/+4
| | | | | | | A new flag is required for the zfs_rlock code to determine if it is operation of the zvol of zpl dataset. This used to be keyed off the zp->z_vnode, which was a hack to begin with, but with the removal of vnodes we needed a dedicated flag.
* Prototype/structure update for LinuxBrian Behlendorf2011-02-1015-3230/+1976
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I appologize in advance why to many things ended up in this commit. When it could be seperated in to a whole series of commits teasing that all apart now would take considerable time and I'm not sure there's much merrit in it. As such I'll just summerize the intent of the changes which are all (or partly) in this commit. Broadly the intent is to remove as much Solaris specific code as possible and replace it with native Linux equivilants. More specifically: 1) Replace all instances of zfsvfs_t with zfs_sb_t. While the type is largely the same calling it private super block data rather than a zfsvfs is more consistent with how Linux names this. While non critical it makes the code easier to read when your thinking in Linux friendly VFS terms. 2) Replace vnode_t with struct inode. The Linux VFS doesn't have the notion of a vnode and there's absolutely no good reason to create one. There are in fact several good reasons to remove it. It just adds overhead on Linux if we were to manage one, it conplicates the code, and it likely will lead to bugs so there's a good change it will be out of date. The code has been updated to remove all need for this type. 3) Replace all vtype_t's with umode types. Along with this shift all uses of types to mode bits. The Solaris code would pass a vtype which is redundant with the Linux mode. Just update all the code to use the Linux mode macros and remove this redundancy. 4) Remove using of vn_* helpers and replace where needed with inode helpers. The big example here is creating iput_aync to replace vn_rele_async. Other vn helpers will be addressed as needed but they should be be emulated. They are a Solaris VFS'ism and should simply be replaced with Linux equivilants. 5) Update znode alloc/free code. Under Linux it's common to embed the inode specific data with the inode itself. This removes the need for an extra memory allocation. In zfs this information is called a znode and it now embeds the inode with it. Allocators have been updated accordingly. 6) Minimal integration with the vfs flags for setting up the super block and handling mount options has been added this code will need to be refined but functionally it's all there. This will be the first and last of these to large to review commits.
* Remove dmu_write_pages() supportBrian Behlendorf2011-02-101-57/+0
| | | | | | | For the moment we do not use dmu_write_pages() to write pages directly in to a dmu object. It may be required at some point in the future, but for now is simplest and cleanest to drop it. It can be easily readded if/when needed.
* Create a root znode without VFS dependenciesBrian Behlendorf2011-02-101-85/+4
| | | | | | | For portability reasons it's handy to be able to create a root znode and basic filesystem components without requiring the full cooperation of the VFS. We are committing to this to simply the filesystem creations code.
* Remove zfs_ctldir.[ch]Brian Behlendorf2011-02-106-1348/+1
| | | | | | | This code is used for snapshot and heavily leverages Solaris functionality we do not want to reimplement. These files have been removed, including references to them, and will be replaced by a zfs_snap.c/zpl_snap.c implementation which handles snapshots.
* Disable fuid featuresBrian Behlendorf2011-02-101-0/+6
| | | | | These features should probably be enabled in the Linux zpl code. For now I'm disabling them until it's clear what needs to be done.
* Disable zfs_sync during oops/panicBrian Behlendorf2011-02-101-1/+1
| | | | | | | | Minor update to ensure zfs_sync() is disabled if a kernel oops/panic is triggered. As the comment says 'data integrity is job one'. This change could have been done by defining panicstr to oops_in_progress in the SPL. But I felt it was better to use the native Linux API here since to be clear.
* Disable Shutdown/RebootBrian Behlendorf2011-02-101-1/+5
| | | | | This support has been disable with HAVE_SHUTDOWN. We can support this at some point by adding the needed reboot notifiers.
* Remove SYNC_ATTR checkBrian Behlendorf2011-02-101-9/+0
| | | | | | This flag does not need to be support under Linux. As the comment says it was only there to support fsflush() for old filesystem like UFS. This is not needed under Linux.
* Remove mount optionsBrian Behlendorf2011-02-101-21/+0
| | | | | | | Mount option parsing is still very Linux specific and will be handled above this zfs filesystem layer. Honoring those mount options once set if of course the responsibility of the lower layers.
* Remove zfs_active_fs_countBrian Behlendorf2011-02-101-11/+0
| | | | | | This variable was used to ensure that the ZFS module is never removed while the filesystem is mounted. Once again the generic Linux VFS handles this case for us so it can be removed.
* Remove unused mount functionsBrian Behlendorf2011-02-101-333/+0
| | | | | | | | The functions zfs_mount_label_policy(), zfs_mountroot(), zfs_mount() will not be needed because most of what they do is already handled by the generic Linux VFS layer. They all call zfs_domount() which creates the actual dataset, the caller of this library call which will be in the zpl layer is responsible for what's left.
* Remove zfs_major/zfs_minor/zfsfstypeBrian Behlendorf2011-02-101-105/+2
| | | | | | | | | | Under Linux we don't need to reserve a major or minor number for the filesystem. We can rely on the VFS to handle colisions without this being handled by the lower ZFS layers. Additionally, there is no need to keep a zfsfstype around. We are not limited on Linux by the OpenSolaris infrastructure which needed this. The upper zpl layer can specify the filesystem type.
* Remove Solaris VFS HooksBrian Behlendorf2011-02-103-159/+0
| | | | | | | | The ZFS code is being restructured to act as a library and a stand alone module. This allows us to leverage most of the existing code with minimal modification. It also means we need to drop the Solaris vfs/vnode functions they will be replaced by Linux equivilants and updated to be Linux friendly.
* VFS: Add zfs_inode_update() helperBrian Behlendorf2011-02-103-2/+76
| | | | | | | | | | | | | For the moment we have left ZFS unchanged and it updates many values as part of the znode. However, some of these values should be set in the inode. For the moment this is handled by adding a function called zfs_inode_update() which updates the inode based on the znode. This is considered a workaround until we can systematically go through the ZFS code and have it directly update the inode. At which point zfs_update_inode() can be dropped entirely. Keeping two copies of the same data isn't only inefficient it's a breeding ground for bugs.
* VFS: Integrate zfs_znode_alloc()Brian Behlendorf2011-02-101-59/+27
| | | | | | | | | | | | | | | | Under Linux the convention for filesystem specific data structure is to embed it along with the generic vfs data structure. This differs significantly from Solaris. Since we want to integrates as cleanly with the Linux VFS as possible. This changes modifies zfs_znode_alloc() to allocate a znode with an embedded inode for use with the generic VFS. This is done by calling iget_locked() which will allocate a new inode if needed by calling sb->alloc_inode(). This function allocates enough memory for a znode_t by returns a pointer to the inode structure for Linux's VFS. This function is also responsible for setting the callback znode->z_set_ops_inodes() which is used to register the correct handlers for the inode.
* Enable zfs_znode compilationBrian Behlendorf2011-02-102-18/+21
| | | | | | | | | | | | | | | | | | | | Basic compilation of the bulk of zfs_znode.c has been enabled. After much consideration it was decided to convert the existing vnode based interfaces to more friendly Linux interfaces. The following commits will systematically replace update the requiter interfaces. There are of course pros and cons to this decision. Pros: * This simplifies intergration with Linux in the long term. There is no longer any need to manage vnodes which are a foreign concept to the Linux VFS. * Improved long term maintainability. * Minor performance improvements by removing vnode overhead. Cons: * Added work in the short term to modify multiple ZFS interfaces. * Harder to pull in changes if we ever see any new code from Solaris. * Mixed Solaris and Linux interfaces in some ZFS code.
* ACL related changesBrian Behlendorf2011-02-102-5/+27
| | | | | | A small collection of ACL related changes related to not supporting fuid mapping. This whole are will need to be closely investigated.
* Init/destroy tsdBrian Behlendorf2011-02-101-2/+1
| | | | Add missing tsd_destroy() call for rrw_tsd_key to avoid a leak.
* Replace VOP_* calls with direct zfs_* callsBrian Behlendorf2011-02-101-15/+16
| | | | | These generic Solaris wrappers are no longer required. Simply directly call the correct zfs functions for clarity.
* Add trivial acl helpersBrian Behlendorf2011-02-101-1/+120
| | | | | | | | | | | | The zfs acl code makes use of the two OpenSolaris helper functions acl_trivial_access_masks() and ace_trivial_common(). Since they are only called from zfs_acl.c I've brought them over from OpenSolaris and added them as static function to this file. This way I don't need to reimplement this functionality from scratch in the SPL. Long term once I take a more careful look at the acl implementation it may be the case that these functions really aren't needed. If that turns out to be the case they can then be removed.
* Remove dead ACL codeBrian Behlendorf2011-02-101-75/+1
| | | | | The following code was unused which caused gcc to complain. Since it was deadcode it has simply been removed.
* Remove zfs_parse_bootfs() supportBrian Behlendorf2011-02-101-55/+0
| | | | | | Remove unneeded bootfs functions. This support shouldn't be required for the Linux port, and even if it is it would need to be reworked to integrate cleanly with Linux.
* VFS: Wrap with HAVE_SHAREBrian Behlendorf2011-02-102-18/+22
| | | | | | | | | | Certain NFS/SMB share functionality is not yet in place. These functions used to be wrapped with the generic HAVE_ZPL to prevent them from being compiled. I still don't want them compiled but I'm working toward eliminating the use of HAVE_ZPL. So I'm just renaming the wrapper here to HAVE_SHARE. They still won't be compiled until all the share issues are worked through. Share support is the last missing piece from zfs_ioctl.c.
* Wrap with HAVE_MLSLABELBrian Behlendorf2011-02-101-0/+2
| | | | | | | The zfs_check_global_label() function is part of the HAVE_MLSLABEL support which was previously commented out by a HAVE_ZPL check. Since we're still deciding what to do about mls labels wrap it with the preexisting macro to keep it compiled out.
* Remove znode move functionalityBrian Behlendorf2011-02-101-184/+0
| | | | | | | | Unlike Solaris the Linux implementation embeds the inode in the znode, and has no use for a vnode. So while it's true that fragmention of the znode cache may occur it should not be worse than any of the other Linux FS inode caches. Until proven that this is a problem it's just added complexity we don't need.
* Conserve stack in zfs_mkdir()Brian Behlendorf2011-02-101-1/+3
| | | | | Move the sa_attrs array from the stack to the heap to minimize stack space usage.
* Conserve stack in zfs_sa_upgrade()Brian Behlendorf2011-02-101-4/+8
| | | | | | As always under Linux stack space is at a premium. Relocate two 20 element sa_bulk_attr_t arrays in zfs_sa_upgrade() from the stack to the heap.
* Export required vfs/vn symbolsBrian Behlendorf2011-02-102-29/+61
|
* Add HAVE_SCANSTAMPBrian Behlendorf2011-02-101-2/+6
| | | | | This functionality is not supported under Linux, perhaps it will be some day if it's decided it's useful.
* Add initial rw_uio functions to the dmuBrian Behlendorf2011-02-041-3/+109
| | | | | | | These functions were dropped originally because I felt they would need to be rewritten anyway to avoid using uios. However, this patch readds then with they dea they can just be reworked and the uio bits dropped.
* Documentation updatesBrian Behlendorf2011-02-041-1/+1
| | | | | Minor Linux specific documentation updates to the comments and man pages.
* Fix ZVOL rename minor devicesBrian Behlendorf2011-01-071-3/+9
| | | | | | | During a rename we need to be careful to destroy and create a new minor for the ZVOL _only_ if the rename succeeded. The previous code would both destroy you minor device unconditionally, it would also fail to create the new minor device on success.
* Fix minor compiler warningsBrian Behlendorf2011-01-065-57/+60
| | | | | | | These compiler warnings were introduced when code which was previously #ifdef'ed out by HAVE_ZPL was re-added for use by the posix layer. All of the following changes should be obviously correct and will cause no semantic changes.
* Use cv_timedwait_interruptible in arcBrian Behlendorf2010-12-141-3/+3
| | | | | | | | | | | The issue is that cv_timedwait() sleeps uninterruptibly to block signals and avoid waking up early. Under Linux this counts against the load average keeping it artificially high. This change allows the arc to sleep interruptibly which mean it may be woken up early due to a signal. Normally this means some extra care must be taken to handle a potential signal. But for the arcs usage of cv_timedwait() there is no harm in waking up before the timeout expires so no extra handling is required.
* Enable rrwlock.c compilationBrian Behlendorf2010-12-071-3/+0
| | | | | | With the addition of the thread specific data interfaces to the SPL it is safe to enable compilation of the re-enterant read reader/writer locks.
* Fix for access beyond end of device errorNed Bass2010-11-102-1/+3
| | | | | | | | | | | | | | | | | | | | | | This commit fixes a sign extension bug affecting l2arc devices. Extremely large offsets may be passed down to the low level block device driver on reads, generating errors similar to attempt to access beyond end of device sdbi1: rw=14, want=36028797014862705, limit=125026959 The unwanted sign extension occurrs because the function arc_read_nolock() stores the offset as a daddr_t, a 32-bit signed int type in the Linux kernel. This offset is then passed to zio_read_phys() as a uint64_t argument, causing sign extension for values of 0x80000000 or greater. To avoid this, we store the offset in a uint64_t. This change also changes a few daddr_t struct members to uint64_t in the libspl headers to avoid similar bugs cropping up in the future. We also add an ASSERT to __vdev_disk_physio() to check for invalid offsets. Closes #66 Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 2.6.36 compat, synchronous bio flagBrian Behlendorf2010-11-101-2/+23
| | | | | | | | | | | | | | The name of the flag used to mark a bio as synchronous has changed again in the 2.6.36 kernel due to the unification of the BIO_RW_* and REQ_* flags. The new flag is called REQ_SYNC. To simplify checking this flag I have introduced the vdev_disk_dio_is_sync() helper function. Based on the results of several new autoconf tests it uses the correct mask to check for a synchronous bio. Preferred interface for flagging a synchronous bio: 2.6.12-2.6.29: BIO_RW_SYNC 2.6.30-2.6.35: BIO_RW_SYNCIO 2.6.36-2.6.xx: REQ_SYNC
* Remove inconsistent use of EOPNOTSUPPNed Bass2010-11-101-1/+1
| | | | | | | | | | | Commit 3ee56c292bbcd7e6b26e3c2ad8f0e50eee236bcc changed an ENOTSUP return value in one location to ENOTSUPP to fix user programs seeing an invalid ioctl() error code. However, use of ENOTSUP is widespread in the zfs module. Instead of changing all of those uses, we fixed the ENOTSUP definition in the SPL to be consistent with user space. The changed return value in the above commit is therefore no longer needed, so this commit reverses it to maintain consistency. Signed-off-by: Brian Behlendorf <[email protected]>
* Make rollbacks fail gracefullyNed Bass2010-11-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Support for rolling back datasets require a functional ZPL, which we currently do not have. The zfs command does not check for ZPL support before attempting a rollback, and in preparation for rolling back a zvol it removes the minor node of the device. To prevent the zvol device node from disappearing after a failed rollback operation, this change wraps the zfs_do_rollback() function in an #ifdef HAVE_ZPL and returns ENOSYS in the absence of a ZPL. This is consistent with the behavior of other ZPL dependent commands such as mount. The orginal error message observed with this bug was rather confusing: internal error: Unknown error 524 Aborted This was because zfs_ioc_rollback() returns ENOTSUP if we don't HAVE_ZPL, but Linux actually has no such error code. It should instead return EOPNOTSUPP, as that is how ENOTSUP is defined in user space. With that we would have gotten the somewhat more helpful message cannot rollback 'tank/fish': unsupported version This is rather a moot point with the above changes since we will no longer make that ioctl call without a ZPL. But, this change updates the error code just in case. Signed-off-by: Brian Behlendorf <[email protected]>
* Increate zio write interrupt thread count.Brian Behlendorf2010-11-081-1/+1
| | | | | | | Increasing the default zio_wr_int thread count from 8 to 16 improves write performence by 13% on large systems. More testing need to be done but I suspect the ideal tuning here is ZTI_BATCH() with a minimum of 8 threads.
* Shorten zio_* thread namesBrian Behlendorf2010-11-082-3/+2
| | | | | | Linux kernel thread names are expected to be short. This change shortens the zio thread names to 10 characters leaving a few chracters to append the /<cpuid> to which the thread is bound. For example: z_wr_iss/0.
* Fix panic mounting unformatted zvolNed Bass2010-10-291-0/+4
| | | | | | | | | | On some older kernels, i.e. 2.6.18, zvol_ioctl_by_inode() may get passed a NULL file pointer if the user tries to mount a zvol without a filesystem on it. This change adds checks to prevent a null pointer dereference. Closes #73. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix missing 'zpool events'Brian Behlendorf2010-10-122-6/+20
| | | | | | | | | | | | | | | | | | | | It turns out that 'zpool events' over 1024 bytes in size where being silently dropped. This was discovered while writing the zfault.sh tests to validate common failure modes. This could occur because the zfs interface for passing an arbitrary size nvlist_t over an ioctl() is to provide a buffer for the packed nvlist which is usually big enough. In this case 1024 byte is the default. If the kernel determines the buffer is to small it returns ENOMEM and the minimum required size of the nvlist_t. This was working properly but in the case of 'zpool events' the event stream was advanced dispite the error. Thus the retry with the bigger buffer would succeed but it would skip over the previous event. The fix is to pass this size to zfs_zevent_next() and determine before removing the event from the list if it will fit. This was preferable to checking after the event was returned because this avoids the need to rewind the stream.
* Initial zio delay timingBrian Behlendorf2010-10-123-3/+26
| | | | | | | | | | | | | | | | | | | | | | | While there is no right maximum timeout for a disk IO we can start laying the ground work to measure how long they do take in practice. This change simply measures the IO time and if it exceeds 30s an event is posted for 'zpool events'. This value was carefully selected because for sd devices it implies that at least one timeout (SD_TIMEOUT) has occured. Unfortunately, even with FAILFAST set we may retry and request and not get an error. This behavior is strongly dependant on the device driver and how it is hooked in to the scsi error handling stack. However by setting the limit at 30s we can log the event even if no error was returned. Slightly longer term we can start recording these delays perhaps as a simple power-of-two histrogram. This histogram can then be reported as part of the 'zpool status' command when given an command line option. None of this code changes the internal behavior of ZFS. Currently it is simply for reporting excessively long delays.