summaryrefslogtreecommitdiffstats
path: root/lib/libspl
Commit message (Collapse)AuthorAgeFilesLines
* Add backing_device_info per-filesystemBrian Behlendorf2011-08-0412-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For a long time now the kernel has been moving away from using the pdflush daemon to write 'old' dirty pages to disk. The primary reason for this is because the pdflush daemon is single threaded and can be a limiting factor for performance. Since pdflush sequentially walks the dirty inode list for each super block any delay in processing can slow down dirty page writeback for all filesystems. The replacement for pdflush is called bdi (backing device info). The bdi system involves creating a per-filesystem control structure each with its own private sets of queues to manage writeback. The advantage is greater parallelism which improves performance and prevents a single filesystem from slowing writeback to the others. For a long time both systems co-existed in the kernel so it wasn't strictly required to implement the bdi scheme. However, as of Linux 2.6.36 kernels the pdflush functionality has been retired. Since ZFS already bypasses the page cache for most I/O this is only an issue for mmap(2) writes which must go through the page cache. Even then adding this missing support for newer kernels was overlooked because there are other mechanisms which can trigger writeback. However, there is one critical case where not implementing the bdi functionality can cause problems. If an application handles a page fault it can enter the balance_dirty_pages() callpath. This will result in the application hanging until the number of dirty pages in the system drops below the dirty ratio. Without a registered backing_device_info for the filesystem the dirty pages will not get written out. Thus the application will hang. As mentioned above this was less of an issue with older kernels because pdflush would eventually write out the dirty pages. This change adds a backing_device_info structure to the zfs_sb_t which is already allocated per-super block. It is then registered when the filesystem mounted and unregistered on unmount. It will not be registered for mounted snapshots which are read-only. This change will result in flush-<pool> thread being dynamically created and destroyed per-mounted filesystem for writeback. Signed-off-by: Brian Behlendorf <[email protected]> Closes #174
* Provide a rc.d script for archlinuxzfs-0.6.0-rc5Kyle Fuller2011-07-1112-0/+12
| | | | | | | | | | | Unlike most other Linux distributions archlinux installs its init scripts in /etc/rc.d insead of /etc/init.d. This commit provides an archlinux rc.d script for zfs and extends the build infrastructure to ensure it get's installed in the correct place. Signed-off-by: Brian Behlendorf <[email protected]> Closes #322
* Add proper library versioningBrian Behlendorf2011-07-062-3/+3
| | | | | | | | The zfs libraries were never properly versioned. Since the API has remained static for quite some time this we never an issue. However, going forward they should be versioned. This commit versions all of the libraries to 1.0.0. From here on out this version must be updated to reflect changes to the library.
* Implemented sharing datasets via NFS using libshare.Gunnar Beutner2011-07-061-1/+55
| | | | | | | | The sharenfs and sharesmb properties depend on the libshare library to export datasets via NFS and SMB. This commit implements the base libshare functionality as well as support for managing NFS shares. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix implicit declaration of 'mkdirp'Brian Behlendorf2011-07-012-0/+2
| | | | | | | | | | | The lib/libspl/include/libgen.h header file was being mistakenly left out of the 'make dist' tarball. It just happens this doesn't cause a build failure when creating packages because the system libgen/h is included instead. This simply results in the following warning due to the missing forward declaration of mkdirp(). ../../lib/libzfs/libzfs_mount.c:417:3: warning: implicit declaration of function 'mkdirp' [-Wimplicit-function-declaration]
* Linux compat 2.6.39: mount_nodev()Brian Behlendorf2011-07-0113-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The .get_sb callback has been replaced by a .mount callback in the file_system_type structure. When using the new interface the caller must now use the mount_nodev() helper. Unfortunately, the new interface no longer passes the vfsmount down to the zfs layers. This poses a problem for the existing implementation because we currently save this pointer in the super block for latter use. It provides our only entry point in to the namespace layer for manipulating certain mount options. This needed to be done originally to allow commands like 'zfs set atime=off tank' to work properly. It also allowed me to keep more of the original Solaris code unmodified. Under Solaris there is a 1-to-1 mapping between a mount point and a file system so this is a fairly natural thing to do. However, under Linux they many be multiple entries in the namespace which reference the same filesystem. Thus keeping a back reference from the filesystem to the namespace is complicated. Rather than introduce some ugly hack to get the vfsmount and continue as before. I'm leveraging this API change to update the ZFS code to do things in a more natural way for Linux. This has the upside that is resolves the compatibility issue for the long term and fixes several other minor bugs which have been reported. This commit updates the code to remove this vfsmount back reference entirely. All modifications to filesystem mount options are now passed in to the kernel via a '-o remount'. This is the expected Linux mechanism and allows the namespace to properly handle any options which apply to it before passing them on to the file system itself. Aside from fixing the compatibility issue, removing the vfsmount has had the benefit of simplifying the code. This change which fairly involved has turned out nicely. Closes #246 Closes #217 Closes #187 Closes #248 Closes #231
* Linux compat 2.6.39: security_inode_init_security()Brian Behlendorf2011-07-0112-0/+12
| | | | | | | | | | | The security_inode_init_security() function now takes an additional qstr argument which must be passed in from the dentry if available. Passing a NULL is safe when no qstr is available the relevant security checks will just be skipped. Closes #246 Closes #217 Closes #187
* Tear down and flush the mmap regionPrasad Joshi2011-06-2712-0/+12
| | | | | | | | | | | | | | The inode eviction should unmap the pages associated with the inode. These pages should also be flushed to disk to avoid the data loss. Therefore, use truncate_setsize() in evict_inode() to release the pagecache. The API truncate_setsize() was added in 2.6.35 kernel. To ensure compatibility with the old kernel, the patch defines its own truncate_setsize function. Signed-off-by: Prasad Joshi <[email protected]> Closes #255
* Always check -Wno-unused-but-set-variable gcc supportBrian Behlendorf2011-06-1412-12/+12
| | | | | | | | | | | The previous commit 8a7e1ceefa430988c8f888ca708ab307333b4464 wasn't quite right. This check applies to both the user and kernel space build and as such we must make sure it runs regardless of what the --with-config option is set too. For example, if --with-config=kernel then the autoconf test does not run and we generate build warnings when compiling the kernel packages.
* Check for -Wno-unused-but-set-variable gcc supportBrian Behlendorf2011-06-1412-2/+26
| | | | | | | | | | | | | Gcc versions 4.3.2 and earlier do not support the compiler flag -Wno-unused-but-set-variable. This can lead to build failures on older Linux platforms such as Debian Lenny. Since this is an optional build argument this changes add a new autoconf check for the option. If it is supported by the installed version of gcc then it is used otherwise it is omited. See commit's 12c1acde76683108441827ae9affba1872f3afe5 and 79713039a2b6e0ed223d141b4a8a8455f282d2f2 for the reason the -Wno-unused-but-set-variable options was originally added.
* Fix 'zfs set volsize=N pool/dataset'Brian Behlendorf2011-05-0212-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change fixes a kernel panic which would occur when resizing a dataset which was not open. The objset_t stored in the zvol_state_t will be set to NULL when the block device is closed. To avoid this issue we pass the correct objset_t as the third arg. The code has also been updated to correctly notify the kernel when the block device capacity changes. For 2.6.28 and newer kernels the capacity change will be immediately detected. For earlier kernels the capacity change will be detected when the device is next opened. This is a known limitation of older kernels. Online ext3 resize test case passes on 2.6.28+ kernels: $ dd if=/dev/zero of=/tmp/zvol bs=1M count=1 seek=1023 $ zpool create tank /tmp/zvol $ zfs create -V 500M tank/zd0 $ mkfs.ext3 /dev/zd0 $ mkdir /mnt/zd0 $ mount /dev/zd0 /mnt/zd0 $ df -h /mnt/zd0 $ zfs set volsize=800M tank/zd0 $ resize2fs /dev/zd0 $ df -h /mnt/zd0 Original-patch-by: Fajar A. Nugraha <[email protected]> Closes #68 Closes #84
* Correct MAXUIDBrian Behlendorf2011-04-291-1/+1
| | | | | | | | | | The uid_t on most systems is in fact and unsigned 32-bit value. This is almost always correct, however you could compile your kernel to use an unsigned 16-bit value for uid_t. In practice I've never encountered a distribution which does this so I'm willing to overlook this corner case for now. Closes #165
* Implemented NFS export_operations.Gunnar Beutner2011-04-2912-0/+12
| | | | | Implemented the required NFS operations for exporting ZFS datasets using the in-kernel NFS daemon.
* Use gethostid in the Linux convention.Darik Horn2011-04-251-10/+0
| | | | | | | | | | Disable the gethostid() override for Solaris behavior because Linux systems implement the POSIX standard in a way that allows a negative result. Mask the gethostid() result to the lower four bytes, like coreutils does in /usr/bin/hostid, to prevent junk bits or sign-extension on systems that have an eight byte long type. This can cause a spurious hostid mismatch that prevents zpool import on 64-bit systems.
* Fix 32-bit MAXOFFSET_T definitionBrian Behlendorf2011-04-221-7/+2
| | | | | | | | | | | Having MAXOFFSET_T defined to 0x7fffffffl was artificially limiting the maximum file size on 32-bit systems. In reality MAXOFFSET_T is used when working with 'long long' types and as such we now define it as LLONG_MAX. This resolves the 2GB file size limit for files and additionally allows zvols greater than 2GB on 32-bit systems. Closes #136 Closes #81
* Set -Wno-unused-but-set-variable globallyBrian Behlendorf2011-04-192-7/+9
| | | | | | | | | | | | | As of gcc-4.6 the option -Wunused-but-set-variable is enabled by default. While this is a useful warning there are numerous places in the ZFS code when a variable is set and then only checked in an ASSERT(). To avoid having to update every instance of this in the code we now set -Wno-unused-but-set-variable to suppress the warning. Additionally, when building with --enable-debug and -Werror set these warning also become fatal. We can reevaluate the suppression of these error at a later time if it becomes an issue. For now we are basically just reverting to the previous gcc behavior.
* Linux 2.6.28 compat, insert_inode_locked()Brian Behlendorf2011-03-2212-0/+12
| | | | | | | Added insert_inode_locked() helper function, prior to this most callers used insert_inode_hash(). The older method doesn't check for collisions in the inode_hashtable but it still acceptible for use. Fallback to using insert_inode_hash() when insert_inode_locked() is unavailable.
* Linux compat, umount2(2) flagsBrian Behlendorf2011-03-221-2/+17
| | | | | | | | Older glibc <sys/mount.h> headers did not define all the available umount2(2) flags. Both MNT_FORCE and MNT_DETACH are supported in the kernel back to 2.4.11 so we define them correctly if they are missing. Closes #95
* Add init scriptsBrian Behlendorf2011-03-1712-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | To support automatically mounting your zfs on filesystem on boot a basic init script is needed. Unfortunately, every distribution has their own idea of the _right_ way to do things. Rather than write one very complicated portable init script, which would be invariably replaced by the distributions own anyway. I have instead added support to provide multiple distribution specific init scripts. The correct init script for your distribution will be selected by ZFS_AC_DEFAULT_PACKAGE which will set DEFAULT_INIT_SCRIPT. During 'make install' the correct script for your system will be installed from zfs/etc/init.d/zfs.DEFAULT_INIT_SCRIPT to the usual /etc/init.d/zfs location. Currently, there is zfs.fedora and a more generic zfs.lsb init script. Hopefully, the distribution maintainers who know best how they want their init scripts to function will feedback their approved versions to be included in the project. This change does not consider upstart jobs but I'm not at all opposed to add that sort of thing.
* Fix mount helperBrian Behlendorf2011-03-091-7/+5
| | | | | | | | | | | | | | | | | Several issues related to strange mount/umount behavior were reported and this commit should address most of them. The original idea was to put in place a zfs mount helper (mount.zfs). This helper is used to enforce 'legacy' mount behavior, and perform any extra mount argument processing (selinux, zfsutil, etc). This helper wasn't ready for the 0.6.0-rc1 release but with this change it's functional but needs to extensively tested. This change addresses the following open issues. Closes #101 Closes #107 Closes #113 Closes #115 Closes #119
* Linux 2.6.38 compat, blkdev_get_by_path()Brian Behlendorf2011-02-2312-0/+12
| | | | | | | | | | | The open_bdev_exclusive() function has been replaced (again) by the more generic blkdev_get_by_path() function. Additionally, the counterpart function close_bdev_exclusive() has been replaced by blkdev_put(). Because these functions are more generic versions of the functions they replaced the compatibility macro must add the FMODE_EXCL mask to ensure they are exclusive. Closes #114
* Linux 2.6.36 compat, sops->evict_inode()Brian Behlendorf2011-02-1112-0/+12
| | | | | | The new prefered inteface for evicting an inode from the inode cache is the ->evict_inode() callback. It replaces both the ->delete_inode() and ->clear_inode() callbacks which were previously used for this.
* Linux 2.6.35 compat, fops->fsync()Brian Behlendorf2011-02-1112-0/+12
| | | | | | | | | The fsync() callback in the file_operations structure used to take 3 arguments. The callback now only takes 2 arguments because the dentry argument was determined to be unused by all consumers. To handle this a compatibility prototype was added to ensure the right prototype is used. Our implementation never used the dentry argument either so it's just a matter of using the right prototype.
* Linux 2.6.35 compat, const struct xattr_handlerBrian Behlendorf2011-02-1012-0/+12
| | | | | | | The const keyword was added to the 'struct xattr_handler' in the generic Linux super_block structure. To handle this we define an appropriate xattr_handler_t typedef which can be used. This was the preferred solution because it keeps the code clean and readable.
* MS_DIRSYNC and MS_REC compatBrian Behlendorf2011-02-101-0/+9
| | | | | | | | | | | | | It turns out that older versions of the glibc headers do not properly define MS_DIRSYNC despite it being explicitly mentioned in the man pages. They instead call it S_WRITE, so for system where this is not correct defined map MS_DIRSYNC to S_WRITE. At the time of this commit both Ubuntu Lucid, and Debian Squeeze both use the out of date glibc headers. As for MS_REC this field is also not available in the older headers. Since there is no obvious mapping in this case we simply disable the recursive mount option which used it.
* Minimal libshare infrastructureBrian Behlendorf2011-02-0413-17/+13
| | | | | | | | | | | | | | | | | | ZFS even under Solaris does not strictly require libshare to be available. The current implementation attempts to dlopen() the library to access the needed symbols. If this fails libshare support is simply disabled. This means that on Linux we only need the most minimal libshare implementation. In fact just enough to prevent the build from failing. Longer term we can decide if we want to implement a libshare library like Solaris. At best this would be an abstraction layer between ZFS and NFS/SMB. Alternately, we can drop libshare entirely and directly integrate ZFS with Linux's NFS/SMB. Finally the bare bones user-libshare.m4 test was dropped. If we do decide to implement libshare at some point it will surely be as part of this package so the check is not needed.
* Add 'zfs mount' supportBrian Behlendorf2011-02-043-112/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | By design the zfs utility is supposed to handle mounting and unmounting a zfs filesystem. We could allow zfs to do this directly. There are system calls available to mount/umount a filesystem. And there are library calls available to manipulate /etc/mtab. But there are a couple very good reasons not to take this appraoch... for now. Instead of directly calling the system and library calls to (u)mount the filesystem we fork and exec a (u)mount process. The principle reason for this is to delegate the responsibility for locking and updating /etc/mtab to (u)mount(8). This ensures maximum portability and ensures the right locking scheme for your version of (u)mount will be used. If we didn't do this we would have to resort to an autoconf test to determine what locking mechanism is used. The downside to using mount(8) instead of mount(2) is that we lose the exact errno which was returned by the kernel. The return code from mount(8) provides some insight in to what went wrong but it not quite as good. For the moment this is translated as a best guess in to a errno for the higher layers of zfs. In the long term a shared library called libmount is under development which provides a common API to address the locking and errno issues. Once the standard mount utility has been updated to use this library we can then leverage it. Until then this is the only safe solution. http://www.kernel.org/pub/linux/utils/util-linux/libmount-docs/index.html
* Autoconf selinux supportBrian Behlendorf2011-01-2812-0/+24
| | | | | | | | | | | | | | | | | If libselinux is detected on your system at configure time link against it. This allows us to use a library call to detect if selinux is enabled and if it is to pass the mount option: "context=\"system_u:object_r:file_t:s0" For now this is required because none of the existing selinux policies are aware of the zfs filesystem type. Because of this they do not properly enable xattr based labeling even though zfs supports all of the required hooks. Until distro's add zfs as a known xattr friendly fs type we must use mntpoint labeling. Alternately, end users could modify their existing selinux policy with a little guidance.
* Add missing mkdirp prototypeBrian Behlendorf2010-12-141-0/+34
| | | | | | For while now mkdirp has been built as part of libspl however the protoype was never added to libgen.h. This went unnoticed until enabling the mount support which uses mkdirp().
* Fix block device-related issues in zdb.Ricardo M. Correia2010-12-143-0/+52
| | | | | | | | | | | Specifically, this fixes the two following errors in zdb when a pool is composed of block devices: 1) 'Value too large for defined data type' when running 'zdb <dataset>'. 2) 'character device required' when running 'zdb -l <block-device>'. Signed-off-by: Ricardo M. Correia <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Fix for access beyond end of device errorNed Bass2010-11-103-10/+10
| | | | | | | | | | | | | | | | | | | | | | This commit fixes a sign extension bug affecting l2arc devices. Extremely large offsets may be passed down to the low level block device driver on reads, generating errors similar to attempt to access beyond end of device sdbi1: rw=14, want=36028797014862705, limit=125026959 The unwanted sign extension occurrs because the function arc_read_nolock() stores the offset as a daddr_t, a 32-bit signed int type in the Linux kernel. This offset is then passed to zio_read_phys() as a uint64_t argument, causing sign extension for values of 0x80000000 or greater. To avoid this, we store the offset in a uint64_t. This change also changes a few daddr_t struct members to uint64_t in the libspl headers to avoid similar bugs cropping up in the future. We also add an ASSERT to __vdev_disk_physio() to check for invalid offsets. Closes #66 Signed-off-by: Brian Behlendorf <[email protected]>
* Add FAILFAST supportBrian Behlendorf2010-10-1212-0/+12
| | | | | | | | | | | | | | | | | | | | ZFS works best when it is notified as soon as possible when a device failure occurs. This allows it to immediately start any recovery actions which may be needed. In theory Linux supports a flag which can be set on bio's called FAILFAST which provides this quick notification by disabling the retry logic in the lower scsi layers. That's the theory at least. In practice is turns out that while the flag exists you oddly have to set it with the BIO_RW_AHEAD flag. And even when it's set it you may get retries in the low level drivers decides that's the right behavior, or if you don't get the right error codes reported to the scsi midlayer. Unfortunately, without additional kernels patchs there's not much which can be done to improve this. Basically, this just means that it may take 2-3 minutes before a ZFS is notified properly that a device has failed. This can be improved and I suspect I'll be submitting patches upstream to handle this.
* Exclude atomic.S source from dist rulesBrian Behlendorf2010-09-101-0/+6
| | | | | | | | | | | | | | | | | | | | The zfs package supports the option --with-config=srpm which is used to bootstrap configure to allow the 'make srpm' target to work. This has the advantage of allowing creation of source rpms without having all your -devel packages installed. This source package can then be feed back in to an automated build farm which only installs the required packages listed by the srpm. This ensures that all proper dependencies are expressed by the source package, because if they are not you will get configure/build failures. The trouble here is that --with-config=srpm prevents the architecture check from running resulting in TARGET_ASM_DIR being set to the default asm-generic. The 'make dist' rule then fails because there is no asm-generic/atomic.S file because it is generated at build time. To handle this I have added an empty file asm-generic/atomic.S simply as a place holder for 'make dist'.
* Support custom build directories and move includesBrian Behlendorf2010-09-0825-200/+4504
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the neat tricks an autoconf style project is capable of is allow configurion/building in a directory other than the source directory. The major advantage to this is that you can build the project various different ways while making changes in a single source tree. For example, this project is designed to work on various different Linux distributions each of which work slightly differently. This means that changes need to verified on each of those supported distributions perferably before the change is committed to the public git repo. Using nfs and custom build directories makes this much easier. I now have a single source tree in nfs mounted on several different systems each running a supported distribution. When I make a change to the source base I suspect may break things I can concurrently build from the same source on all the systems each in their own subdirectory. wget -c http://github.com/downloads/behlendorf/zfs/zfs-x.y.z.tar.gz tar -xzf zfs-x.y.z.tar.gz cd zfs-x-y-z ------------------------- run concurrently ---------------------- <ubuntu system> <fedora system> <debian system> <rhel6 system> mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6 cd ubuntu cd fedora cd debian cd rhel6 ../configure ../configure ../configure ../configure make make make make make check make check make check make check This change also moves many of the include headers from individual incude/sys directories under the modules directory in to a single top level include directory. This has the advantage of making the build rules cleaner and logically it makes a bit more sense.
* Add initial autoconf productsBrian Behlendorf2010-08-315-0/+2953
| | | | | | | Add the initial products from autogen.sh. These products will be updated incrementally after this point as development occurs. Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux libspl supportBrian Behlendorf2010-08-31101-0/+10111
| | | | | | | | All changes needed for the libspl layer. This includes modifications to files directly copied from OpenSolaris and the addition of new files needed to fill in the gaps. Signed-off-by: Brian Behlendorf <[email protected]>
* Moving lib/libspl to linux-libspl branchBrian Behlendorf2008-12-118-2870/+0
|
* Fix libspl move to the wrong placeBrian Behlendorf2008-12-118-0/+0
|
* Move the world out of /zfs/ and seperate out module build treeBrian Behlendorf2008-12-118-0/+2870