| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For a long time now the kernel has been moving away from using the
pdflush daemon to write 'old' dirty pages to disk. The primary reason
for this is because the pdflush daemon is single threaded and can be
a limiting factor for performance. Since pdflush sequentially walks
the dirty inode list for each super block any delay in processing can
slow down dirty page writeback for all filesystems.
The replacement for pdflush is called bdi (backing device info). The
bdi system involves creating a per-filesystem control structure each
with its own private sets of queues to manage writeback. The advantage
is greater parallelism which improves performance and prevents a single
filesystem from slowing writeback to the others.
For a long time both systems co-existed in the kernel so it wasn't
strictly required to implement the bdi scheme. However, as of
Linux 2.6.36 kernels the pdflush functionality has been retired.
Since ZFS already bypasses the page cache for most I/O this is only
an issue for mmap(2) writes which must go through the page cache.
Even then adding this missing support for newer kernels was overlooked
because there are other mechanisms which can trigger writeback.
However, there is one critical case where not implementing the bdi
functionality can cause problems. If an application handles a page
fault it can enter the balance_dirty_pages() callpath. This will
result in the application hanging until the number of dirty pages in
the system drops below the dirty ratio.
Without a registered backing_device_info for the filesystem the
dirty pages will not get written out. Thus the application will hang.
As mentioned above this was less of an issue with older kernels because
pdflush would eventually write out the dirty pages.
This change adds a backing_device_info structure to the zfs_sb_t
which is already allocated per-super block. It is then registered
when the filesystem mounted and unregistered on unmount. It will
not be registered for mounted snapshots which are read-only. This
change will result in flush-<pool> thread being dynamically created
and destroyed per-mounted filesystem for writeback.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #174
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike most other Linux distributions archlinux installs its
init scripts in /etc/rc.d insead of /etc/init.d. This commit
provides an archlinux rc.d script for zfs and extends the
build infrastructure to ensure it get's installed in the
correct place.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #322
|
|
|
|
|
|
|
|
| |
The zfs libraries were never properly versioned. Since the API has
remained static for quite some time this we never an issue. However,
going forward they should be versioned. This commit versions all
of the libraries to 1.0.0. From here on out this version must be
updated to reflect changes to the library.
|
|
|
|
|
|
|
|
| |
The sharenfs and sharesmb properties depend on the libshare library
to export datasets via NFS and SMB. This commit implements the base
libshare functionality as well as support for managing NFS shares.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The lib/libspl/include/libgen.h header file was being mistakenly
left out of the 'make dist' tarball. It just happens this doesn't
cause a build failure when creating packages because the system
libgen/h is included instead. This simply results in the following
warning due to the missing forward declaration of mkdirp().
../../lib/libzfs/libzfs_mount.c:417:3: warning: implicit declaration
of function 'mkdirp' [-Wimplicit-function-declaration]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The .get_sb callback has been replaced by a .mount callback
in the file_system_type structure. When using the new
interface the caller must now use the mount_nodev() helper.
Unfortunately, the new interface no longer passes the vfsmount
down to the zfs layers. This poses a problem for the existing
implementation because we currently save this pointer in the
super block for latter use. It provides our only entry point
in to the namespace layer for manipulating certain mount options.
This needed to be done originally to allow commands like
'zfs set atime=off tank' to work properly. It also allowed me
to keep more of the original Solaris code unmodified. Under
Solaris there is a 1-to-1 mapping between a mount point and a
file system so this is a fairly natural thing to do. However,
under Linux they many be multiple entries in the namespace
which reference the same filesystem. Thus keeping a back
reference from the filesystem to the namespace is complicated.
Rather than introduce some ugly hack to get the vfsmount and
continue as before. I'm leveraging this API change to update
the ZFS code to do things in a more natural way for Linux.
This has the upside that is resolves the compatibility issue
for the long term and fixes several other minor bugs which
have been reported.
This commit updates the code to remove this vfsmount back
reference entirely. All modifications to filesystem mount
options are now passed in to the kernel via a '-o remount'.
This is the expected Linux mechanism and allows the namespace
to properly handle any options which apply to it before passing
them on to the file system itself.
Aside from fixing the compatibility issue, removing the
vfsmount has had the benefit of simplifying the code. This
change which fairly involved has turned out nicely.
Closes #246
Closes #217
Closes #187
Closes #248
Closes #231
|
|
|
|
|
|
|
|
|
|
|
| |
The security_inode_init_security() function now takes an additional
qstr argument which must be passed in from the dentry if available.
Passing a NULL is safe when no qstr is available the relevant
security checks will just be skipped.
Closes #246
Closes #217
Closes #187
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The inode eviction should unmap the pages associated with the inode.
These pages should also be flushed to disk to avoid the data loss.
Therefore, use truncate_setsize() in evict_inode() to release the
pagecache.
The API truncate_setsize() was added in 2.6.35 kernel. To ensure
compatibility with the old kernel, the patch defines its own
truncate_setsize function.
Signed-off-by: Prasad Joshi <[email protected]>
Closes #255
|
|
|
|
|
|
|
|
|
|
|
| |
The previous commit 8a7e1ceefa430988c8f888ca708ab307333b4464 wasn't
quite right. This check applies to both the user and kernel space
build and as such we must make sure it runs regardless of what
the --with-config option is set too.
For example, if --with-config=kernel then the autoconf test does
not run and we generate build warnings when compiling the kernel
packages.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gcc versions 4.3.2 and earlier do not support the compiler flag
-Wno-unused-but-set-variable. This can lead to build failures
on older Linux platforms such as Debian Lenny. Since this is
an optional build argument this changes add a new autoconf check
for the option. If it is supported by the installed version of
gcc then it is used otherwise it is omited.
See commit's 12c1acde76683108441827ae9affba1872f3afe5 and
79713039a2b6e0ed223d141b4a8a8455f282d2f2 for the reason the
-Wno-unused-but-set-variable options was originally added.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change fixes a kernel panic which would occur when resizing
a dataset which was not open. The objset_t stored in the
zvol_state_t will be set to NULL when the block device is closed.
To avoid this issue we pass the correct objset_t as the third arg.
The code has also been updated to correctly notify the kernel
when the block device capacity changes. For 2.6.28 and newer
kernels the capacity change will be immediately detected. For
earlier kernels the capacity change will be detected when the
device is next opened. This is a known limitation of older
kernels.
Online ext3 resize test case passes on 2.6.28+ kernels:
$ dd if=/dev/zero of=/tmp/zvol bs=1M count=1 seek=1023
$ zpool create tank /tmp/zvol
$ zfs create -V 500M tank/zd0
$ mkfs.ext3 /dev/zd0
$ mkdir /mnt/zd0
$ mount /dev/zd0 /mnt/zd0
$ df -h /mnt/zd0
$ zfs set volsize=800M tank/zd0
$ resize2fs /dev/zd0
$ df -h /mnt/zd0
Original-patch-by: Fajar A. Nugraha <[email protected]>
Closes #68
Closes #84
|
|
|
|
|
|
|
|
|
|
| |
The uid_t on most systems is in fact and unsigned 32-bit value.
This is almost always correct, however you could compile your
kernel to use an unsigned 16-bit value for uid_t. In practice
I've never encountered a distribution which does this so I'm
willing to overlook this corner case for now.
Closes #165
|
|
|
|
|
| |
Implemented the required NFS operations for exporting ZFS datasets
using the in-kernel NFS daemon.
|
|
|
|
|
|
|
|
|
|
| |
Disable the gethostid() override for Solaris behavior because Linux systems
implement the POSIX standard in a way that allows a negative result.
Mask the gethostid() result to the lower four bytes, like coreutils does in
/usr/bin/hostid, to prevent junk bits or sign-extension on systems that have an
eight byte long type. This can cause a spurious hostid mismatch that prevents
zpool import on 64-bit systems.
|
|
|
|
|
|
|
|
|
|
|
| |
Having MAXOFFSET_T defined to 0x7fffffffl was artificially limiting
the maximum file size on 32-bit systems. In reality MAXOFFSET_T is
used when working with 'long long' types and as such we now define
it as LLONG_MAX. This resolves the 2GB file size limit for files
and additionally allows zvols greater than 2GB on 32-bit systems.
Closes #136
Closes #81
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As of gcc-4.6 the option -Wunused-but-set-variable is enabled by
default. While this is a useful warning there are numerous places
in the ZFS code when a variable is set and then only checked in an
ASSERT(). To avoid having to update every instance of this in the
code we now set -Wno-unused-but-set-variable to suppress the warning.
Additionally, when building with --enable-debug and -Werror set these
warning also become fatal. We can reevaluate the suppression of these
error at a later time if it becomes an issue. For now we are basically
just reverting to the previous gcc behavior.
|
|
|
|
|
|
|
| |
Added insert_inode_locked() helper function, prior to this most callers
used insert_inode_hash(). The older method doesn't check for collisions
in the inode_hashtable but it still acceptible for use. Fallback to
using insert_inode_hash() when insert_inode_locked() is unavailable.
|
|
|
|
|
|
|
|
| |
Older glibc <sys/mount.h> headers did not define all the available
umount2(2) flags. Both MNT_FORCE and MNT_DETACH are supported in the
kernel back to 2.4.11 so we define them correctly if they are missing.
Closes #95
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To support automatically mounting your zfs on filesystem on boot
a basic init script is needed. Unfortunately, every distribution
has their own idea of the _right_ way to do things. Rather than
write one very complicated portable init script, which would be
invariably replaced by the distributions own anyway. I have
instead added support to provide multiple distribution specific
init scripts.
The correct init script for your distribution will be selected
by ZFS_AC_DEFAULT_PACKAGE which will set DEFAULT_INIT_SCRIPT.
During 'make install' the correct script for your system will
be installed from zfs/etc/init.d/zfs.DEFAULT_INIT_SCRIPT to the
usual /etc/init.d/zfs location.
Currently, there is zfs.fedora and a more generic zfs.lsb init
script. Hopefully, the distribution maintainers who know best
how they want their init scripts to function will feedback their
approved versions to be included in the project.
This change does not consider upstart jobs but I'm not at all
opposed to add that sort of thing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Several issues related to strange mount/umount behavior were reported
and this commit should address most of them. The original idea was
to put in place a zfs mount helper (mount.zfs). This helper is used
to enforce 'legacy' mount behavior, and perform any extra mount argument
processing (selinux, zfsutil, etc). This helper wasn't ready for the
0.6.0-rc1 release but with this change it's functional but needs to
extensively tested.
This change addresses the following open issues.
Closes #101
Closes #107
Closes #113
Closes #115
Closes #119
|
|
|
|
|
|
|
|
|
|
|
| |
The open_bdev_exclusive() function has been replaced (again) by the
more generic blkdev_get_by_path() function. Additionally, the
counterpart function close_bdev_exclusive() has been replaced by
blkdev_put(). Because these functions are more generic versions
of the functions they replaced the compatibility macro must add
the FMODE_EXCL mask to ensure they are exclusive.
Closes #114
|
|
|
|
|
|
| |
The new prefered inteface for evicting an inode from the inode cache
is the ->evict_inode() callback. It replaces both the ->delete_inode()
and ->clear_inode() callbacks which were previously used for this.
|
|
|
|
|
|
|
|
|
| |
The fsync() callback in the file_operations structure used to take
3 arguments. The callback now only takes 2 arguments because the
dentry argument was determined to be unused by all consumers. To
handle this a compatibility prototype was added to ensure the right
prototype is used. Our implementation never used the dentry argument
either so it's just a matter of using the right prototype.
|
|
|
|
|
|
|
| |
The const keyword was added to the 'struct xattr_handler' in the
generic Linux super_block structure. To handle this we define an
appropriate xattr_handler_t typedef which can be used. This was
the preferred solution because it keeps the code clean and readable.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out that older versions of the glibc headers do not
properly define MS_DIRSYNC despite it being explicitly mentioned
in the man pages. They instead call it S_WRITE, so for system
where this is not correct defined map MS_DIRSYNC to S_WRITE.
At the time of this commit both Ubuntu Lucid, and Debian Squeeze
both use the out of date glibc headers.
As for MS_REC this field is also not available in the older headers.
Since there is no obvious mapping in this case we simply disable
the recursive mount option which used it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZFS even under Solaris does not strictly require libshare to be
available. The current implementation attempts to dlopen() the
library to access the needed symbols. If this fails libshare
support is simply disabled.
This means that on Linux we only need the most minimal libshare
implementation. In fact just enough to prevent the build from
failing. Longer term we can decide if we want to implement a
libshare library like Solaris. At best this would be an abstraction
layer between ZFS and NFS/SMB. Alternately, we can drop libshare
entirely and directly integrate ZFS with Linux's NFS/SMB.
Finally the bare bones user-libshare.m4 test was dropped. If we
do decide to implement libshare at some point it will surely be
as part of this package so the check is not needed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By design the zfs utility is supposed to handle mounting and unmounting
a zfs filesystem. We could allow zfs to do this directly. There are
system calls available to mount/umount a filesystem. And there are
library calls available to manipulate /etc/mtab. But there are a
couple very good reasons not to take this appraoch... for now.
Instead of directly calling the system and library calls to (u)mount
the filesystem we fork and exec a (u)mount process. The principle
reason for this is to delegate the responsibility for locking and
updating /etc/mtab to (u)mount(8). This ensures maximum portability
and ensures the right locking scheme for your version of (u)mount
will be used. If we didn't do this we would have to resort to an
autoconf test to determine what locking mechanism is used.
The downside to using mount(8) instead of mount(2) is that we lose
the exact errno which was returned by the kernel. The return code
from mount(8) provides some insight in to what went wrong but it
not quite as good. For the moment this is translated as a best
guess in to a errno for the higher layers of zfs.
In the long term a shared library called libmount is under development
which provides a common API to address the locking and errno issues.
Once the standard mount utility has been updated to use this library
we can then leverage it. Until then this is the only safe solution.
http://www.kernel.org/pub/linux/utils/util-linux/libmount-docs/index.html
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If libselinux is detected on your system at configure time link
against it. This allows us to use a library call to detect if
selinux is enabled and if it is to pass the mount option:
"context=\"system_u:object_r:file_t:s0"
For now this is required because none of the existing selinux
policies are aware of the zfs filesystem type. Because of this
they do not properly enable xattr based labeling even though
zfs supports all of the required hooks.
Until distro's add zfs as a known xattr friendly fs type we
must use mntpoint labeling. Alternately, end users could modify
their existing selinux policy with a little guidance.
|
|
|
|
|
|
| |
For while now mkdirp has been built as part of libspl however
the protoype was never added to libgen.h. This went unnoticed
until enabling the mount support which uses mkdirp().
|
|
|
|
|
|
|
|
|
|
|
| |
Specifically, this fixes the two following errors in zdb when a pool
is composed of block devices:
1) 'Value too large for defined data type' when running 'zdb <dataset>'.
2) 'character device required' when running 'zdb -l <block-device>'.
Signed-off-by: Ricardo M. Correia <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit fixes a sign extension bug affecting l2arc devices. Extremely
large offsets may be passed down to the low level block device driver on
reads, generating errors similar to
attempt to access beyond end of device
sdbi1: rw=14, want=36028797014862705, limit=125026959
The unwanted sign extension occurrs because the function arc_read_nolock()
stores the offset as a daddr_t, a 32-bit signed int type in the Linux kernel.
This offset is then passed to zio_read_phys() as a uint64_t argument, causing
sign extension for values of 0x80000000 or greater. To avoid this, we store
the offset in a uint64_t.
This change also changes a few daddr_t struct members to uint64_t in the libspl
headers to avoid similar bugs cropping up in the future. We also add an ASSERT
to __vdev_disk_physio() to check for invalid offsets.
Closes #66
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZFS works best when it is notified as soon as possible when a device
failure occurs. This allows it to immediately start any recovery
actions which may be needed. In theory Linux supports a flag which
can be set on bio's called FAILFAST which provides this quick
notification by disabling the retry logic in the lower scsi layers.
That's the theory at least. In practice is turns out that while the
flag exists you oddly have to set it with the BIO_RW_AHEAD flag.
And even when it's set it you may get retries in the low level
drivers decides that's the right behavior, or if you don't get the
right error codes reported to the scsi midlayer.
Unfortunately, without additional kernels patchs there's not much
which can be done to improve this. Basically, this just means that
it may take 2-3 minutes before a ZFS is notified properly that a
device has failed. This can be improved and I suspect I'll be
submitting patches upstream to handle this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The zfs package supports the option --with-config=srpm which
is used to bootstrap configure to allow the 'make srpm' target
to work. This has the advantage of allowing creation of source
rpms without having all your -devel packages installed. This
source package can then be feed back in to an automated build
farm which only installs the required packages listed by the
srpm. This ensures that all proper dependencies are expressed
by the source package, because if they are not you will get
configure/build failures.
The trouble here is that --with-config=srpm prevents the
architecture check from running resulting in TARGET_ASM_DIR
being set to the default asm-generic. The 'make dist' rule
then fails because there is no asm-generic/atomic.S file
because it is generated at build time. To handle this I
have added an empty file asm-generic/atomic.S simply as a
place holder for 'make dist'.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One of the neat tricks an autoconf style project is capable of
is allow configurion/building in a directory other than the
source directory. The major advantage to this is that you can
build the project various different ways while making changes
in a single source tree.
For example, this project is designed to work on various different
Linux distributions each of which work slightly differently. This
means that changes need to verified on each of those supported
distributions perferably before the change is committed to the
public git repo.
Using nfs and custom build directories makes this much easier.
I now have a single source tree in nfs mounted on several different
systems each running a supported distribution. When I make a
change to the source base I suspect may break things I can
concurrently build from the same source on all the systems each
in their own subdirectory.
wget -c http://github.com/downloads/behlendorf/zfs/zfs-x.y.z.tar.gz
tar -xzf zfs-x.y.z.tar.gz
cd zfs-x-y-z
------------------------- run concurrently ----------------------
<ubuntu system> <fedora system> <debian system> <rhel6 system>
mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6
cd ubuntu cd fedora cd debian cd rhel6
../configure ../configure ../configure ../configure
make make make make
make check make check make check make check
This change also moves many of the include headers from individual
incude/sys directories under the modules directory in to a single
top level include directory. This has the advantage of making
the build rules cleaner and logically it makes a bit more sense.
|
|
|
|
|
|
|
| |
Add the initial products from autogen.sh. These products will
be updated incrementally after this point as development occurs.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
| |
All changes needed for the libspl layer. This includes modifications
to files directly copied from OpenSolaris and the addition of new
files needed to fill in the gaps.
Signed-off-by: Brian Behlendorf <[email protected]>
|
| |
|
| |
|
|
|