summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Fix symlink(2) inode reference countBrian Behlendorf2011-02-171-1/+0
| | | | | | | | | | | | | | | | Under Linux sys_symlink(2) should result in a inode being created with one reference for the inode itself, and a second reference on the inode which is held by the new dentry. Under Solaris this appears not to be the case. Their zfs_symlink() handler drops the inode reference before returning. The result of this under Linux is that the reference count for symlinks is always one smaller than it should have been. This results in a BUG() when the symlink is unlinked. To handle this the Linux port now keeps the inode reference which differs from the Solaris behavior. This results in correct reference counts. Closes #96
* Use -zfs_readlink() errorBrian Behlendorf2011-02-171-1/+1
| | | | | | | The zfs_readlink() function returns a Solaris positive error value and that needs to be converted to a Linux negative error value. While in this case nothing would actually go wrong, it's still incorrect and should be fixed if for no other reason than clarity.
* Improve 'zpool import' safetyBrian Behlendorf2011-02-171-9/+12
| | | | | | | | | | | | | | | | | | There are three improvements here to 'zpool import' proposed by Fajar in Github issue #98. They are all good so I'm commiting all three. 1) Add descriptions for "hpet" and "core" blacklist entries. 2) Add "core" to the blacklist, as described in the issue accessing this device will crash Xen dom0. 3) Refine probing behavior to use fstatat64(). This allows us to determine if a device is a block device or a regular file without having to open it. This is the safest appraoch when probing /dev/ because the simple act of opening a device may have unexpected consequences. Closes #98
* Fix readlink(2)Brian Behlendorf2011-02-162-28/+36
| | | | | | | | | | | | | | | | | | | | | | This patch addresses three issues related to symlinks. 1) Revert the zfs_follow_link() function to a modified version of the original zfs_readlink(). The only changes from the original OpenSolaris version relate to using Linux types. For the moment this means no vnode's and no zfsvfs_t. The caller zpl_follow_link() was also updated accordingly. This change was reverted because it was slightly gratuitious. 2) Update zpl_follow_link() to use local variables for the link buffer. I'd forgotten that iov.iov_base is updated by uiomove() so after the call to zfs_readlink() it can not longer be used. We need our own private copy of the link pointer. 3) Allocate MAXPATHLEN instead of MAXPATHLEN+1. By default MAXPATHLEN is 4096 bytes which is a full page, adding one to it pushes it slightly over a page. That means you'll likely end up allocating 2 pages which is wasteful of memory and possibly slightly slower.
* Update 'zfs.sh -u' to umount all zfs filesystemsBrian Behlendorf2011-02-161-0/+1
| | | | | | | | | Before it is safe to unload the zfs module stack all mounted zfs filesystems must be unmounted. If they are not unmounted, there will be references held on the modules and the stack cannot be removed. To handle this have 'zfs.sh -u' which is used by all of the test scripts umount all zfs filesystem before attempting to unload the module stack.
* Suppress share error on mountBrian Behlendorf2011-02-161-1/+2
| | | | | | | | Until code is added to support automatically sharing datasets we should return success instead of failure. This prevents the command line tools from returning a non-zero error code. While a user likely won't notice this, test scripts like zconfig.sh do and correctly fail because of it.
* Add get/setattr, get/setxattr hooksBrian Behlendorf2011-02-161-4/+11
| | | | | | | | While the attr/xattr hooks were already in place for regular files this hooks can also apply to directories and special files. While they aren't typically used in this way, it should be supported. This patch registers these additional callbacks for both directory and special inode types.
* Fix FIFO and socket handlingBrian Behlendorf2011-02-161-3/+4
| | | | | | | | | | | Under Linux when creating a fifo or socket type device in the ZFS filesystem it's critical that the rdev is stored in a SA. This was already being correctly done for character and block devices, but that logic needed to be extended to include FIFOs and sockets. This patch takes care of device creation but a follow on patch may still be required to verify that the dev_t is being correctly packed/unpacked from the SA.
* Create minors for all zvolsBrian Behlendorf2011-02-161-1/+2
| | | | | | | | | | | | | | | | It was noticed that when you have zvols in multiple datasets not all of the zvol devices are created at module load time. Fajarnugraha did the leg work to identify that the root cause of this bug is a non-zero return value from zvol_create_minors_cb(). Returning a non-zero value from the dmu_objset_find_spa() callback function results in aborting processing the remaining children in a dataset. Since we want to ensure that the callback in run on all children regardless of error simply unconditionally return zero from the zvol_create_minors_cb(). This callback function is solely used for this purpose so surpressing the error is safe. Closes #96
* Linux 2.6.36 compat, sops->evict_inode()Brian Behlendorf2011-02-1150-11/+250
| | | | | | The new prefered inteface for evicting an inode from the inode cache is the ->evict_inode() callback. It replaces both the ->delete_inode() and ->clear_inode() callbacks which were previously used for this.
* Linux 2.6.33 compat, get/set xattr callbacksBrian Behlendorf2011-02-116-7/+393
| | | | | | | | | The xattr handler prototypes were sanitized with the idea being that the same handlers could be used for multiple methods. The result of this was the inode type was changes to a dentry, and both the get() and set() hooks had a handler_flags argument added. The list() callback was similiarly effected but no autoconf check was added because we do not use the list() callback.
* Linux 2.6.35 compat, fops->fsync()Brian Behlendorf2011-02-1152-7/+257
| | | | | | | | | The fsync() callback in the file_operations structure used to take 3 arguments. The callback now only takes 2 arguments because the dentry argument was determined to be unused by all consumers. To handle this a compatibility prototype was added to ensure the right prototype is used. Our implementation never used the dentry argument either so it's just a matter of using the right prototype.
* Linux 2.6.35 compat, const struct xattr_handlerBrian Behlendorf2011-02-1052-5/+284
| | | | | | | The const keyword was added to the 'struct xattr_handler' in the generic Linux super_block structure. To handle this we define an appropriate xattr_handler_t typedef which can be used. This was the preferred solution because it keeps the code clean and readable.
* Prefer /lib/modules/$(uname -r)/ linksBrian Behlendorf2011-02-102-9/+24
| | | | | | | | Preferentially use the /lib/modules/$(uname -r)/source and /lib/modules/$(uname -r)/build links. Only if neither of these links exist fallback to alternate methods for deducing which kernel to build with. This resolves the need to manually specify --with-linux= and --with-linux-obj= on Debian systems.
* MS_DIRSYNC and MS_REC compatBrian Behlendorf2011-02-102-0/+11
| | | | | | | | | | | | | It turns out that older versions of the glibc headers do not properly define MS_DIRSYNC despite it being explicitly mentioned in the man pages. They instead call it S_WRITE, so for system where this is not correct defined map MS_DIRSYNC to S_WRITE. At the time of this commit both Ubuntu Lucid, and Debian Squeeze both use the out of date glibc headers. As for MS_REC this field is also not available in the older headers. Since there is no obvious mapping in this case we simply disable the recursive mount option which used it.
* Add missing -ldl linker optionBrian Behlendorf2011-02-1010-10/+10
| | | | | | The inclusion on dlsym(), dlopen(), and dlclose() symbols require us to link against the dl library. Be careful to add the flag to both the libzfs library and the commands which depend on the library.
* Update AUTHORS fileBrian Behlendorf2011-02-101-4/+34
| | | | | | | | | This file has gotten stale and needed to be updated. There are individuals who deserve to be recognized for their contributions to the project. I've done my best to assemble names from the commit logs of those who have submitted patches. This list may not be comprehensive, if you feel I've overlooked your contribution please let me know and we can get your name added.
* Use 'noop' IO SchedulerBrian Behlendorf2011-02-102-0/+49
| | | | | | | | | | | | | | | Initial testing has shown the the right IO scheduler to use under Linux is noop. This strikes the ideal balance by allowing the zfs elevator to do all request ordering and prioritization. While allowing the Linux elevator to do the maximum front/back merging allowed by the physical device. This yields the largest possible requests for the device with the lowest total overhead. While 'noop' should be right for your system you can choose a different IO scheduler with the 'zfs_vdev_scheduler' option. You may set this value to any of the standard Linux schedulers: noop, cfq, deadline, anticipatory. In addition, if you choose 'none' zfs will not attempt to change the IO scheduler for the block device.
* Suppress large kmem_alloc() warningBrian Behlendorf2011-02-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | The following warning was observed under normal operation. It's not fatal but it's something to be addressed long term. Flag the offending allocation with KM_NODEBUG to suppress the warning and flag the call site. SPL: Showing stack for process 21761 Pid: 21761, comm: iozone Tainted: P ---------------- 2.6.32-71.14.1.el6.x86_64 #1 Call Trace: [<ffffffffa05465a7>] spl_debug_dumpstack+0x27/0x40 [spl] [<ffffffffa054a84d>] kmem_alloc_debug+0x11d/0x130 [spl] [<ffffffffa05de166>] dmu_buf_hold_array_by_dnode+0xa6/0x4e0 [zfs] [<ffffffffa05de825>] dmu_buf_hold_array+0x65/0x90 [zfs] [<ffffffffa05de891>] dmu_read_uio+0x41/0xd0 [zfs] [<ffffffffa0654827>] zfs_read+0x147/0x470 [zfs] [<ffffffffa06644a2>] zpl_read_common+0x52/0x70 [zfs] [<ffffffffa0664503>] zpl_read+0x43/0x70 [zfs] [<ffffffff8116d905>] vfs_read+0xb5/0x1a0 [<ffffffff8116da41>] sys_read+0x51/0x90 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
* Update META to 0.6.0Brian Behlendorf2011-02-101-1/+1
| | | | | Roll the version forward to 0.6.0, the addition of the Posix layer warrents updating the major version number.
* Invalidate dcache and inode cacheBrian Behlendorf2011-02-101-7/+7
| | | | | | When performing a 'zfs rollback' it's critical to invalidate the previous dcache and inode cache. If we don't there will stale cache entries which when accessed will result in EIOs.
* Remove useless libefi warningsBrian Behlendorf2011-02-101-10/+3
| | | | | | | These two warnings in libefi serve no real purpose. When running without DEBUG they are already supressed, and even when DEBUG is enabled all they indicate is the device doesn't already have an EFI label. For a Linux machine this is probably the common case.
* Move cv_destroy() outside zp->z_range_lock()Brian Behlendorf2011-02-101-23/+36
| | | | | | | | | | | | | | | | | | | | | | | | With the recent SPL change (d599e4fa) that forces cv_destroy() to block until all waiters have been woken. It is now unsafe to call cv_destroy() under the zp->z_range_lock() because it is used as the condition variable mutex. If there are waiters cv_destroy() will block until they wake up and aquire the mutex. However, they will never aquire the mutex because cv_destroy() will not return allowing it's caller to drop the lock. Deadlock. To avoid this cv_destroy() is now run asynchronously in a taskq. This solves two problems: 1) It is no longer run under the zp->z_range_lock so no deadlock. 2) Since cv_destroy() may now block we don't want this slowing down zfs_range_unlock() and throttling the system. This was not as much of an issue under OpenSolaris because their cv_destroy() implementation does not do anything. They do however risk a bad paging request if cv_destroy() returns, the memory holding the condition variable is free'd, and then the waiters wake up and try to reference it. It's a very small unlikely race, but it is possible.
* Add mmap(2) supportBrian Behlendorf2011-02-103-104/+245
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's worth taking a moment to describe how mmap is implemented for zfs because it differs considerably from other Linux filesystems. However, this issue is handled the same way under OpenSolaris. The issue is that by design zfs bypasses the Linux page cache and leaves all caching up to the ARC. This has been shown to work well for the common read(2)/write(2) case. However, mmap(2) is problem because it relies on being tightly integrated with the page cache. To handle this we cache mmap'ed files twice, once in the ARC and a second time in the page cache. The code is careful to keep both copies synchronized. When a file with an mmap'ed region is written to using write(2) both the data in the ARC and existing pages in the page cache are updated. For a read(2) data will be read first from the page cache then the ARC if needed. Neither a write(2) or read(2) will will ever result in new pages being added to the page cache. New pages are added to the page cache only via .readpage() which is called when the vfs needs to read a page off disk to back the virtual memory region. These pages may be modified without notifying the ARC and will be written out periodically via .writepage(). This will occur due to either a sync or the usual page aging behavior. Note because a read(2) of a mmap'ed file will always check the page cache first even when the ARC is out of date correct data will still be returned. While this implementation ensures correct behavior it does have have some drawbacks. The most obvious of which is that it increases the required memory footprint when access mmap'ed files. It also adds additional complexity to the code keeping both caches synchronized. Longer term it may be possible to cleanly resolve this wart by mapping page cache pages directly on to the ARC buffers. The Linux address space operations are flexible enough to allow selection of which pages back a particular index. The trick would be working out the details of which subsystem is in charge, the ARC, the page cache, or both. It may also prove helpful to move the ARC buffers to a scatter-gather lists rather than a vmalloc'ed region. Additionally, zfs_write/read_common() were used in the readpage and writepage hooks because it was fairly easy. However, it would be better to update zfs_fillpage and zfs_putapage to be Linux friendly and use them instead.
* Add Hooks for Linux Xattr OperationsBrian Behlendorf2011-02-101-0/+432
| | | | | | | | The Linux specific xattr operations have all been located in the file zpl_xattr.c. These functions primarily rely on the reworked zfs_* functions to do their job. They are also responsible for converting the possible Solaris style error codes to negative Linux errors.
* Add Hooks for Linux Super Block OperationsBrian Behlendorf2011-02-101-0/+168
| | | | | | | | The Linux specific super block operations have all been located in the file zpl_super.c. These functions primarily rely on the reworked zfs_* functions to do their job. They are also responsible for converting the possible Solaris style error codes to negative Linux errors.
* Add Hooks for Linux Inode OperationsBrian Behlendorf2011-02-101-0/+331
| | | | | | | | The Linux specific inode operations have all been located in the file zpl_inode.c. These functions primarily rely on the reworked zfs_* functions to do their job. They are also responsible for converting the possible Solaris style error codes to negative Linux errors.
* Add Hooks for Linux File OperationsBrian Behlendorf2011-02-107-0/+251
| | | | | | | | | | | | | The Linux specific file operations have all been located in the file zpl_file.c. These functions primarily rely on the reworked zfs_* functions to do their job. They are also responsible for converting the possible Solaris style error codes to negative Linux errors. This first zpl_* commit also includes a common zpl.h header with minimal entries to register the Linux specific hooks. In also adds all the new zpl_* file to the Makefile.in. This is not a standalone commit, you required the following zpl_* commits.
* Wrap with HAVE_XVATTRBrian Behlendorf2011-02-102-167/+213
| | | | | | For the moment exactly how to handle xvattr is not clear. This change largely consists of the code to comment out the offending bits until something reasonable can be done.
* Add zp->z_is_zvol flagBrian Behlendorf2011-02-103-3/+5
| | | | | | | A new flag is required for the zfs_rlock code to determine if it is operation of the zvol of zpl dataset. This used to be keyed off the zp->z_vnode, which was a hack to begin with, but with the removal of vnodes we needed a dedicated flag.
* Prototype/structure update for LinuxBrian Behlendorf2011-02-1022-3344/+2096
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I appologize in advance why to many things ended up in this commit. When it could be seperated in to a whole series of commits teasing that all apart now would take considerable time and I'm not sure there's much merrit in it. As such I'll just summerize the intent of the changes which are all (or partly) in this commit. Broadly the intent is to remove as much Solaris specific code as possible and replace it with native Linux equivilants. More specifically: 1) Replace all instances of zfsvfs_t with zfs_sb_t. While the type is largely the same calling it private super block data rather than a zfsvfs is more consistent with how Linux names this. While non critical it makes the code easier to read when your thinking in Linux friendly VFS terms. 2) Replace vnode_t with struct inode. The Linux VFS doesn't have the notion of a vnode and there's absolutely no good reason to create one. There are in fact several good reasons to remove it. It just adds overhead on Linux if we were to manage one, it conplicates the code, and it likely will lead to bugs so there's a good change it will be out of date. The code has been updated to remove all need for this type. 3) Replace all vtype_t's with umode types. Along with this shift all uses of types to mode bits. The Solaris code would pass a vtype which is redundant with the Linux mode. Just update all the code to use the Linux mode macros and remove this redundancy. 4) Remove using of vn_* helpers and replace where needed with inode helpers. The big example here is creating iput_aync to replace vn_rele_async. Other vn helpers will be addressed as needed but they should be be emulated. They are a Solaris VFS'ism and should simply be replaced with Linux equivilants. 5) Update znode alloc/free code. Under Linux it's common to embed the inode specific data with the inode itself. This removes the need for an extra memory allocation. In zfs this information is called a znode and it now embeds the inode with it. Allocators have been updated accordingly. 6) Minimal integration with the vfs flags for setting up the super block and handling mount options has been added this code will need to be refined but functionally it's all there. This will be the first and last of these to large to review commits.
* Remove dmu_write_pages() supportBrian Behlendorf2011-02-102-61/+0
| | | | | | | For the moment we do not use dmu_write_pages() to write pages directly in to a dmu object. It may be required at some point in the future, but for now is simplest and cleanest to drop it. It can be easily readded if/when needed.
* Create a root znode without VFS dependenciesBrian Behlendorf2011-02-101-85/+4
| | | | | | | For portability reasons it's handy to be able to create a root znode and basic filesystem components without requiring the full cooperation of the VFS. We are committing to this to simply the filesystem creations code.
* Remove zfs_ctldir.[ch]Brian Behlendorf2011-02-1012-1429/+30
| | | | | | | This code is used for snapshot and heavily leverages Solaris functionality we do not want to reimplement. These files have been removed, including references to them, and will be replaced by a zfs_snap.c/zpl_snap.c implementation which handles snapshots.
* Disable fuid featuresBrian Behlendorf2011-02-101-0/+6
| | | | | These features should probably be enabled in the Linux zpl code. For now I'm disabling them until it's clear what needs to be done.
* Disable zfs_sync during oops/panicBrian Behlendorf2011-02-101-1/+1
| | | | | | | | Minor update to ensure zfs_sync() is disabled if a kernel oops/panic is triggered. As the comment says 'data integrity is job one'. This change could have been done by defining panicstr to oops_in_progress in the SPL. But I felt it was better to use the native Linux API here since to be clear.
* Disable Shutdown/RebootBrian Behlendorf2011-02-101-1/+5
| | | | | This support has been disable with HAVE_SHUTDOWN. We can support this at some point by adding the needed reboot notifiers.
* Remove SYNC_ATTR checkBrian Behlendorf2011-02-101-9/+0
| | | | | | This flag does not need to be support under Linux. As the comment says it was only there to support fsflush() for old filesystem like UFS. This is not needed under Linux.
* Remove mount optionsBrian Behlendorf2011-02-101-21/+0
| | | | | | | Mount option parsing is still very Linux specific and will be handled above this zfs filesystem layer. Honoring those mount options once set if of course the responsibility of the lower layers.
* Remove zfs_active_fs_countBrian Behlendorf2011-02-101-11/+0
| | | | | | This variable was used to ensure that the ZFS module is never removed while the filesystem is mounted. Once again the generic Linux VFS handles this case for us so it can be removed.
* Remove unused mount functionsBrian Behlendorf2011-02-101-333/+0
| | | | | | | | The functions zfs_mount_label_policy(), zfs_mountroot(), zfs_mount() will not be needed because most of what they do is already handled by the generic Linux VFS layer. They all call zfs_domount() which creates the actual dataset, the caller of this library call which will be in the zpl layer is responsible for what's left.
* Remove zfs_major/zfs_minor/zfsfstypeBrian Behlendorf2011-02-101-105/+2
| | | | | | | | | | Under Linux we don't need to reserve a major or minor number for the filesystem. We can rely on the VFS to handle colisions without this being handled by the lower ZFS layers. Additionally, there is no need to keep a zfsfstype around. We are not limited on Linux by the OpenSolaris infrastructure which needed this. The upper zpl layer can specify the filesystem type.
* Remove Solaris VFS HooksBrian Behlendorf2011-02-103-159/+0
| | | | | | | | The ZFS code is being restructured to act as a library and a stand alone module. This allows us to leverage most of the existing code with minimal modification. It also means we need to drop the Solaris vfs/vnode functions they will be replaced by Linux equivilants and updated to be Linux friendly.
* VFS: Add zfs_inode_update() helperBrian Behlendorf2011-02-104-2/+77
| | | | | | | | | | | | | For the moment we have left ZFS unchanged and it updates many values as part of the znode. However, some of these values should be set in the inode. For the moment this is handled by adding a function called zfs_inode_update() which updates the inode based on the znode. This is considered a workaround until we can systematically go through the ZFS code and have it directly update the inode. At which point zfs_update_inode() can be dropped entirely. Keeping two copies of the same data isn't only inefficient it's a breeding ground for bugs.
* VFS: Integrate zfs_znode_alloc()Brian Behlendorf2011-02-102-61/+36
| | | | | | | | | | | | | | | | Under Linux the convention for filesystem specific data structure is to embed it along with the generic vfs data structure. This differs significantly from Solaris. Since we want to integrates as cleanly with the Linux VFS as possible. This changes modifies zfs_znode_alloc() to allocate a znode with an embedded inode for use with the generic VFS. This is done by calling iget_locked() which will allocate a new inode if needed by calling sb->alloc_inode(). This function allocates enough memory for a znode_t by returns a pointer to the inode structure for Linux's VFS. This function is also responsible for setting the callback znode->z_set_ops_inodes() which is used to register the correct handlers for the inode.
* Enable zfs_znode compilationBrian Behlendorf2011-02-102-18/+21
| | | | | | | | | | | | | | | | | | | | Basic compilation of the bulk of zfs_znode.c has been enabled. After much consideration it was decided to convert the existing vnode based interfaces to more friendly Linux interfaces. The following commits will systematically replace update the requiter interfaces. There are of course pros and cons to this decision. Pros: * This simplifies intergration with Linux in the long term. There is no longer any need to manage vnodes which are a foreign concept to the Linux VFS. * Improved long term maintainability. * Minor performance improvements by removing vnode overhead. Cons: * Added work in the short term to modify multiple ZFS interfaces. * Harder to pull in changes if we ever see any new code from Solaris. * Mixed Solaris and Linux interfaces in some ZFS code.
* ACL related changesBrian Behlendorf2011-02-103-7/+27
| | | | | | A small collection of ACL related changes related to not supporting fuid mapping. This whole are will need to be closely investigated.
* Init/destroy tsdBrian Behlendorf2011-02-101-2/+1
| | | | Add missing tsd_destroy() call for rrw_tsd_key to avoid a leak.
* Add Linux Compat InfrastructureBrian Behlendorf2011-02-108-3/+625
| | | | | | | | | | | | | | Lay the initial ground work for a include/linux/ compatibility directory. This was less critical in the past because the bulk of the ZFS code consumes the Solaris API via the SPL. This API was stable and the bulk Linux API differences were handled in the SPL. However, with the addition of a full Posix layer written directly against the Linux APIs we are going to need more compatibility code. It makes sense that all this code should be cleanly located in one place. Subsequent patches should move the existing zvol and vdev_disk compatibility code in to this directory.
* Replace VOP_* calls with direct zfs_* callsBrian Behlendorf2011-02-101-15/+16
| | | | | These generic Solaris wrappers are no longer required. Simply directly call the correct zfs functions for clarity.