| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added the necessary build infrastructure for building packages
compatible with the Arch Linux distribution. As such, one can now run:
$ ./configure
$ make pkg # Alternatively, one can run 'make arch' as well
on the Arch Linux machine to create two binary packages compatible with
the pacman package manager, one for the zfs userland utilities and
another for the zfs kernel modules. The new packages can then be
installed by running:
# pacman -U $package.pkg.tar.xz
In addition, source-only packages suitable for an Arch Linux chroot
environment or remote builder can also be build using the 'sarch' make
rule.
NOTE: Since the source dist tarball is created on the fly from the head
of the build tree, it's MD5 hash signature will be continually influx.
As a result, the md5sum variable was intentionally omitted from the
PKGBUILD files, and the '--skipinteg' makepkg option is used. This may
or may not have any serious security implications, as the source tarball
is not being downloaded from an outside source.
Signed-off-by: Prakash Surya <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #491
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update the code to use the bdi_setup_and_register() helper to
simplify the bdi integration code. The updated code now just
registers the bdi during mount and destroys it during unmount.
The only complication is that for 2.6.32 - 2.6.33 kernels the
helper wasn't available so in these cases the zfs code must
provide it. Luckily the bdi_setup_and_register() function
is trivial.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #367
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Export all symbols already marked extern in the zfs_vfsops.h
header. Several non-static symbols have also been added to
the header and exportewd. This allows external modules to
more easily create and manipulate properly created ZFS
filesystem type datasets.
Rename zfsvfs_teardown() to zfs_sb_teardown and export it.
This is done simply for consistency with the rest of the code
base. All other zfsvfs_* functions have already been renamed.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
| |
Run autogen.sh using the same autotools versions as upstream:
* autoconf-2.63
* automake-1.11.1
* libtool-2.2.6b
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For a long time now the kernel has been moving away from using the
pdflush daemon to write 'old' dirty pages to disk. The primary reason
for this is because the pdflush daemon is single threaded and can be
a limiting factor for performance. Since pdflush sequentially walks
the dirty inode list for each super block any delay in processing can
slow down dirty page writeback for all filesystems.
The replacement for pdflush is called bdi (backing device info). The
bdi system involves creating a per-filesystem control structure each
with its own private sets of queues to manage writeback. The advantage
is greater parallelism which improves performance and prevents a single
filesystem from slowing writeback to the others.
For a long time both systems co-existed in the kernel so it wasn't
strictly required to implement the bdi scheme. However, as of
Linux 2.6.36 kernels the pdflush functionality has been retired.
Since ZFS already bypasses the page cache for most I/O this is only
an issue for mmap(2) writes which must go through the page cache.
Even then adding this missing support for newer kernels was overlooked
because there are other mechanisms which can trigger writeback.
However, there is one critical case where not implementing the bdi
functionality can cause problems. If an application handles a page
fault it can enter the balance_dirty_pages() callpath. This will
result in the application hanging until the number of dirty pages in
the system drops below the dirty ratio.
Without a registered backing_device_info for the filesystem the
dirty pages will not get written out. Thus the application will hang.
As mentioned above this was less of an issue with older kernels because
pdflush would eventually write out the dirty pages.
This change adds a backing_device_info structure to the zfs_sb_t
which is already allocated per-super block. It is then registered
when the filesystem mounted and unregistered on unmount. It will
not be registered for mounted snapshots which are read-only. This
change will result in flush-<pool> thread being dynamically created
and destroyed per-mounted filesystem for writeback.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #174
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Moving the zil_free() cleanup to zil_close() prevents this
problem from occurring in the first place. There is a very
good description of the issue and fix in Illumus #883.
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Albert Lee <[email protected]>
Reviewed by: Gordon Ross <[email protected]>
Reviewed by: Garrett D'Amore <[email protected]>
Reivewed by: Dan McDonald <[email protected]>
Approved by: Gordon Ross <[email protected]>
References to Illumos issue and patch:
- https://www.illumos.org/issues/883
- https://github.com/illumos/illumos-gate/commit/c9ba2a43cb
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #340
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Today zfs tries to allocate blocks evenly across all devices.
This means when devices are imbalanced zfs will use lots of
CPU searching for space on devices which tend to be pretty
full. It should instead fail quickly on the full LUNs and
move onto devices which have more availability.
Reviewed by: Eric Schrock <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Albert Lee <[email protected]>
Reviewed by: Gordon Ross <[email protected]>
Approved by: Garrett D'Amore <[email protected]>
References to Illumos issue and patch:
- https://www.illumos.org/issues/510
- https://github.com/illumos/illumos-gate/commit/5ead3ed965
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #340
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike most other Linux distributions archlinux installs its
init scripts in /etc/rc.d insead of /etc/init.d. This commit
provides an archlinux rc.d script for zfs and extends the
build infrastructure to ensure it get's installed in the
correct place.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #322
|
|
|
|
|
|
|
| |
Unfortunately, ztest is hard coded to export the zdb utility to
be installed in a certain location. When the packaging was updated
to install zdb in /sbin/ ztest was broken. To fix this I'm updating
ztest to check both common install paths.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The .get_sb callback has been replaced by a .mount callback
in the file_system_type structure. When using the new
interface the caller must now use the mount_nodev() helper.
Unfortunately, the new interface no longer passes the vfsmount
down to the zfs layers. This poses a problem for the existing
implementation because we currently save this pointer in the
super block for latter use. It provides our only entry point
in to the namespace layer for manipulating certain mount options.
This needed to be done originally to allow commands like
'zfs set atime=off tank' to work properly. It also allowed me
to keep more of the original Solaris code unmodified. Under
Solaris there is a 1-to-1 mapping between a mount point and a
file system so this is a fairly natural thing to do. However,
under Linux they many be multiple entries in the namespace
which reference the same filesystem. Thus keeping a back
reference from the filesystem to the namespace is complicated.
Rather than introduce some ugly hack to get the vfsmount and
continue as before. I'm leveraging this API change to update
the ZFS code to do things in a more natural way for Linux.
This has the upside that is resolves the compatibility issue
for the long term and fixes several other minor bugs which
have been reported.
This commit updates the code to remove this vfsmount back
reference entirely. All modifications to filesystem mount
options are now passed in to the kernel via a '-o remount'.
This is the expected Linux mechanism and allows the namespace
to properly handle any options which apply to it before passing
them on to the file system itself.
Aside from fixing the compatibility issue, removing the
vfsmount has had the benefit of simplifying the code. This
change which fairly involved has turned out nicely.
Closes #246
Closes #217
Closes #187
Closes #248
Closes #231
|
|
|
|
|
|
|
|
|
|
|
| |
The security_inode_init_security() function now takes an additional
qstr argument which must be passed in from the dentry if available.
Passing a NULL is safe when no qstr is available the relevant
security checks will just be skipped.
Closes #246
Closes #217
Closes #187
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The inode eviction should unmap the pages associated with the inode.
These pages should also be flushed to disk to avoid the data loss.
Therefore, use truncate_setsize() in evict_inode() to release the
pagecache.
The API truncate_setsize() was added in 2.6.35 kernel. To ensure
compatibility with the old kernel, the patch defines its own
truncate_setsize function.
Signed-off-by: Prasad Joshi <[email protected]>
Closes #255
|
|
|
|
|
|
|
|
|
|
|
| |
The previous commit 8a7e1ceefa430988c8f888ca708ab307333b4464 wasn't
quite right. This check applies to both the user and kernel space
build and as such we must make sure it runs regardless of what
the --with-config option is set too.
For example, if --with-config=kernel then the autoconf test does
not run and we generate build warnings when compiling the kernel
packages.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gcc versions 4.3.2 and earlier do not support the compiler flag
-Wno-unused-but-set-variable. This can lead to build failures
on older Linux platforms such as Debian Lenny. Since this is
an optional build argument this changes add a new autoconf check
for the option. If it is supported by the installed version of
gcc then it is used otherwise it is omited.
See commit's 12c1acde76683108441827ae9affba1872f3afe5 and
79713039a2b6e0ed223d141b4a8a8455f282d2f2 for the reason the
-Wno-unused-but-set-variable options was originally added.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change fixes a kernel panic which would occur when resizing
a dataset which was not open. The objset_t stored in the
zvol_state_t will be set to NULL when the block device is closed.
To avoid this issue we pass the correct objset_t as the third arg.
The code has also been updated to correctly notify the kernel
when the block device capacity changes. For 2.6.28 and newer
kernels the capacity change will be immediately detected. For
earlier kernels the capacity change will be detected when the
device is next opened. This is a known limitation of older
kernels.
Online ext3 resize test case passes on 2.6.28+ kernels:
$ dd if=/dev/zero of=/tmp/zvol bs=1M count=1 seek=1023
$ zpool create tank /tmp/zvol
$ zfs create -V 500M tank/zd0
$ mkfs.ext3 /dev/zd0
$ mkdir /mnt/zd0
$ mount /dev/zd0 /mnt/zd0
$ df -h /mnt/zd0
$ zfs set volsize=800M tank/zd0
$ resize2fs /dev/zd0
$ df -h /mnt/zd0
Original-patch-by: Fajar A. Nugraha <[email protected]>
Closes #68
Closes #84
|
|
|
|
|
| |
Implemented the required NFS operations for exporting ZFS datasets
using the in-kernel NFS daemon.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As of gcc-4.6 the option -Wunused-but-set-variable is enabled by
default. While this is a useful warning there are numerous places
in the ZFS code when a variable is set and then only checked in an
ASSERT(). To avoid having to update every instance of this in the
code we now set -Wno-unused-but-set-variable to suppress the warning.
Additionally, when building with --enable-debug and -Werror set these
warning also become fatal. We can reevaluate the suppression of these
error at a later time if it becomes an issue. For now we are basically
just reverting to the previous gcc behavior.
|
|
|
|
|
|
|
| |
Added insert_inode_locked() helper function, prior to this most callers
used insert_inode_hash(). The older method doesn't check for collisions
in the inode_hashtable but it still acceptible for use. Fallback to
using insert_inode_hash() when insert_inode_locked() is unavailable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To support automatically mounting your zfs on filesystem on boot
a basic init script is needed. Unfortunately, every distribution
has their own idea of the _right_ way to do things. Rather than
write one very complicated portable init script, which would be
invariably replaced by the distributions own anyway. I have
instead added support to provide multiple distribution specific
init scripts.
The correct init script for your distribution will be selected
by ZFS_AC_DEFAULT_PACKAGE which will set DEFAULT_INIT_SCRIPT.
During 'make install' the correct script for your system will
be installed from zfs/etc/init.d/zfs.DEFAULT_INIT_SCRIPT to the
usual /etc/init.d/zfs location.
Currently, there is zfs.fedora and a more generic zfs.lsb init
script. Hopefully, the distribution maintainers who know best
how they want their init scripts to function will feedback their
approved versions to be included in the project.
This change does not consider upstart jobs but I'm not at all
opposed to add that sort of thing.
|
|
|
|
|
|
|
|
|
|
|
| |
The open_bdev_exclusive() function has been replaced (again) by the
more generic blkdev_get_by_path() function. Additionally, the
counterpart function close_bdev_exclusive() has been replaced by
blkdev_put(). Because these functions are more generic versions
of the functions they replaced the compatibility macro must add
the FMODE_EXCL mask to ensure they are exclusive.
Closes #114
|
|
|
|
|
|
| |
The new prefered inteface for evicting an inode from the inode cache
is the ->evict_inode() callback. It replaces both the ->delete_inode()
and ->clear_inode() callbacks which were previously used for this.
|
|
|
|
|
|
|
|
|
| |
The fsync() callback in the file_operations structure used to take
3 arguments. The callback now only takes 2 arguments because the
dentry argument was determined to be unused by all consumers. To
handle this a compatibility prototype was added to ensure the right
prototype is used. Our implementation never used the dentry argument
either so it's just a matter of using the right prototype.
|
|
|
|
|
|
|
| |
The const keyword was added to the 'struct xattr_handler' in the
generic Linux super_block structure. To handle this we define an
appropriate xattr_handler_t typedef which can be used. This was
the preferred solution because it keeps the code clean and readable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZFS even under Solaris does not strictly require libshare to be
available. The current implementation attempts to dlopen() the
library to access the needed symbols. If this fails libshare
support is simply disabled.
This means that on Linux we only need the most minimal libshare
implementation. In fact just enough to prevent the build from
failing. Longer term we can decide if we want to implement a
libshare library like Solaris. At best this would be an abstraction
layer between ZFS and NFS/SMB. Alternately, we can drop libshare
entirely and directly integrate ZFS with Linux's NFS/SMB.
Finally the bare bones user-libshare.m4 test was dropped. If we
do decide to implement libshare at some point it will surely be
as part of this package so the check is not needed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If libselinux is detected on your system at configure time link
against it. This allows us to use a library call to detect if
selinux is enabled and if it is to pass the mount option:
"context=\"system_u:object_r:file_t:s0"
For now this is required because none of the existing selinux
policies are aware of the zfs filesystem type. Because of this
they do not properly enable xattr based labeling even though
zfs supports all of the required hooks.
Until distro's add zfs as a known xattr friendly fs type we
must use mntpoint labeling. Alternately, end users could modify
their existing selinux policy with a little guidance.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZFS works best when it is notified as soon as possible when a device
failure occurs. This allows it to immediately start any recovery
actions which may be needed. In theory Linux supports a flag which
can be set on bio's called FAILFAST which provides this quick
notification by disabling the retry logic in the lower scsi layers.
That's the theory at least. In practice is turns out that while the
flag exists you oddly have to set it with the BIO_RW_AHEAD flag.
And even when it's set it you may get retries in the low level
drivers decides that's the right behavior, or if you don't get the
right error codes reported to the scsi midlayer.
Unfortunately, without additional kernels patchs there's not much
which can be done to improve this. Basically, this just means that
it may take 2-3 minutes before a ZFS is notified properly that a
device has failed. This can be improved and I suspect I'll be
submitting patches upstream to handle this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One of the neat tricks an autoconf style project is capable of
is allow configurion/building in a directory other than the
source directory. The major advantage to this is that you can
build the project various different ways while making changes
in a single source tree.
For example, this project is designed to work on various different
Linux distributions each of which work slightly differently. This
means that changes need to verified on each of those supported
distributions perferably before the change is committed to the
public git repo.
Using nfs and custom build directories makes this much easier.
I now have a single source tree in nfs mounted on several different
systems each running a supported distribution. When I make a
change to the source base I suspect may break things I can
concurrently build from the same source on all the systems each
in their own subdirectory.
wget -c http://github.com/downloads/behlendorf/zfs/zfs-x.y.z.tar.gz
tar -xzf zfs-x.y.z.tar.gz
cd zfs-x-y-z
------------------------- run concurrently ----------------------
<ubuntu system> <fedora system> <debian system> <rhel6 system>
mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6
cd ubuntu cd fedora cd debian cd rhel6
../configure ../configure ../configure ../configure
make make make make
make check make check make check make check
This change also moves many of the include headers from individual
incude/sys directories under the modules directory in to a single
top level include directory. This has the advantage of making
the build rules cleaner and logically it makes a bit more sense.
|
|
|
|
|
|
|
| |
Add the initial products from autogen.sh. These products will
be updated incrementally after this point as development occurs.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
| |
Minor changes to ztest for this environment. These including
updating ztest to run in the local development tree, as well
as relocating some local variables in this function to the heap.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
| |
Add autoconf style build infrastructure to the ZFS tree. This
includes autogen.sh, configure.ac, m4 macros, some scripts/*,
and makefiles for all the core ZFS components.
|
|
|
|
|
|
|
|
|
|
|
|
| |
While ztest does run in user space we run it with the same stack
restrictions it would have in kernel space. This ensures that any
stack related issues which would be hit in the kernel can be caught
and debugged in user space instead.
This patch is a first pass to limit the stack usage of every ztest
function to 1024 bytes. Subsequent updates can further reduce this.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is a portability change which removes the dependence of the Solaris
thread library. All locations where Solaris thread API was used before
have been replaced with equivilant Solaris kernel style thread calls.
In user space the kernel style threading API is implemented in term of
the portable pthreads library. This includes all threads, mutexs,
condition variables, reader/writer locks, and taskqs.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The upstream commit cb code had a few bugs:
1) The arguments of the list_move_tail() call in txg_dispatch_callbacks()
were reversed by mistake. This caused the commit callbacks to not be
called at all.
2) ztest had a bug in ztest_dmu_commit_callbacks() where "error" was not
initialized correctly. This seems to have caused the test to always take
the simulated error code path, which made ztest unable to detect whether
commit cbs were being called for transactions that successfuly complete.
3) ztest had another bug in ztest_dmu_commit_callbacks() where the commit
cb threshold was not being compared correctly.
4) The commit cb taskq was using 'max_ncpus * 2' as the maxalloc argument
of taskq_create(), which could have caused unnecessary delays in the txg
sync thread.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
| |
Gcc -Wall warn: 'unused variable'
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
| |
Gcc ASSERT() missing cases are impossible
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
| |
Gcc -Wall warn: 'invalid prototype'
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Resolve issues uncovered by -D_FORTIFY_SOURCE=2, the default redhat
macro's file adds this option to the cflags. This causes warnings
of the following type designed to keep the developer honest:
warning: ignoring return value of 'foo', declared
with attribute warn_unused_result
The short term fix is to wrap these calls in VERIFY() to check the
return code. The code was already assusing these would never fail.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
| |
Gcc -Wall warn: 'lacks a cast'
Gcc -Wall warn: 'comparison between pointer and integer'
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
| |
Fix non-c90 compliant code, for the most part these changes
simply deal with where a particular variable is declared.
Under c90 it must alway be done at the very start of a block.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
| |
Add 'ull' suffix to 64-bit constants.
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
| |
This is the last official OpenSolaris tag before the public
development tree was closed.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|