aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Fix missing 'zpool events'Brian Behlendorf2010-10-123-7/+21
| | | | | | | | | | | | | | | | | | | | It turns out that 'zpool events' over 1024 bytes in size where being silently dropped. This was discovered while writing the zfault.sh tests to validate common failure modes. This could occur because the zfs interface for passing an arbitrary size nvlist_t over an ioctl() is to provide a buffer for the packed nvlist which is usually big enough. In this case 1024 byte is the default. If the kernel determines the buffer is to small it returns ENOMEM and the minimum required size of the nvlist_t. This was working properly but in the case of 'zpool events' the event stream was advanced dispite the error. Thus the retry with the bigger buffer would succeed but it would skip over the previous event. The fix is to pass this size to zfs_zevent_next() and determine before removing the event from the list if it will fit. This was preferable to checking after the event was returned because this avoids the need to rewind the stream.
* Initial zio delay timingBrian Behlendorf2010-10-125-3/+34
| | | | | | | | | | | | | | | | | | | | | | | While there is no right maximum timeout for a disk IO we can start laying the ground work to measure how long they do take in practice. This change simply measures the IO time and if it exceeds 30s an event is posted for 'zpool events'. This value was carefully selected because for sd devices it implies that at least one timeout (SD_TIMEOUT) has occured. Unfortunately, even with FAILFAST set we may retry and request and not get an error. This behavior is strongly dependant on the device driver and how it is hooked in to the scsi error handling stack. However by setting the limit at 30s we can log the event even if no error was returned. Slightly longer term we can start recording these delays perhaps as a simple power-of-two histrogram. This histogram can then be reported as part of the 'zpool status' command when given an command line option. None of this code changes the internal behavior of ZFS. Currently it is simply for reporting excessively long delays.
* Add FAILFAST supportBrian Behlendorf2010-10-1249-10/+241
| | | | | | | | | | | | | | | | | | | | ZFS works best when it is notified as soon as possible when a device failure occurs. This allows it to immediately start any recovery actions which may be needed. In theory Linux supports a flag which can be set on bio's called FAILFAST which provides this quick notification by disabling the retry logic in the lower scsi layers. That's the theory at least. In practice is turns out that while the flag exists you oddly have to set it with the BIO_RW_AHEAD flag. And even when it's set it you may get retries in the low level drivers decides that's the right behavior, or if you don't get the right error codes reported to the scsi midlayer. Unfortunately, without additional kernels patchs there's not much which can be done to improve this. Basically, this just means that it may take 2-3 minutes before a ZFS is notified properly that a device has failed. This can be improved and I suspect I'll be submitting patches upstream to handle this.
* Fix 'zpool events' formatting for awkBrian Behlendorf2010-10-121-4/+9
| | | | | | | | | | To make the 'zpool events' output simple to parse with awk the extra newline after embedded nvlists has been dropped. This allows the entire event to be parsed as a single whitespace seperated record. The -H option has been added to operate in scripted mode. For the 'zpool events' command this means don't print the header. The usage of -H is consistent with scripted mode for other zpool commands.
* Generate zevents for speculative and soft errorsBrian Behlendorf2010-10-122-16/+3
| | | | | | | | | | | | | By default the Solaris code does not log speculative or soft io errors in either 'zpool status' or post an event. Under Linux we don't want to change the expected behavior of 'zpool status' so these io errors are still suppressed there. However, since we do need to know about these events for Linux FMA and the 'zpool events' interface is new we do post the events. With the addition of the zio_flags field the posted events now contain enough information that a user space consumer can identify and discard these events if it sees fit.
* Fix negative zio->io_error which must be positive.Brian Behlendorf2010-10-121-2/+19
| | | | | | | | | | | | | All the upper layers of zfs expect zio->io_error to be positive. I was careful but I missed one instance in vdev_disk_physio_completion() which could return a negative error. To ensure all cases are always caught I had additionally added an ASSERT() to check this before zio_interpret(). Finally, as a debugging aid when zfs is build with --enable-debug all errors from the backing block devices will be reported to the console with an error message like this: ZFS: zio error=5 type=1 offset=4217856 size=8192 flags=60440
* Suppress large kmem_alloc() warning.Brian Behlendorf2010-10-121-2/+2
| | | | | | | | Observed during failure mode testing, dsl_scan_setup_sync() allocates 73920 bytes. This is way over the limit of what is wise to do with a kmem_alloc() and it should probably be moved to a slab. For now I'm just flagging it with KM_NODEBUG to quiet the error until this can be revisited.
* Fix undersized buffer in is_shorthand_path()Ned Bass2010-10-121-1/+1
| | | | | | | The string array 'char dirs[5][8]' was too small to accomodate the terminating NUL character in "by-label". This change adds the needed additional byte. Signed-off-by: Brian Behlendorf <[email protected]>
* Make commands load zfs module on demandNed Bass2010-10-111-0/+70
| | | | | | | | This commit modifies libzfs_init() to attempt to load the zfs kernel module if it is not already loaded. This is done to simplify initialization by letting users simply import their zpools without having to first load the module. Signed-off-by: Brian Behlendorf <[email protected]>
* Strip partition from device name for whole disksNed Bass2010-10-041-21/+31
| | | | | | | | | | | | | | | | Under Solaris, the slice number is chopped off when displaying the device name if the vdev is a whole disk. Under Linux we should similarly discard the partition number. This commit adds the logic to perform the name truncation for devices ending in -partX, XpX, or X, where X is a string of digits. The second case handles devices like md0p0. The third case is limited to scsi and ide disks, i.e. those beginning with "sd" or "hd", in order to avoid stripping the number from names like "loop0". This commit removes the Solaris-specific code for removing slices, since we no longer reasonably expect our changes to be merged in upstream. The partition stripping code was moved off to a helper function to improve readability. Signed-off-by: Brian Behlendorf <[email protected]>
* Use stored whole_disk property when opening a vdevNed Bass2010-10-041-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | This commit fixes a bug in vdev_disk_open() in which the whole_disk property was getting set to 0 for disk devices, even when it was stored as a 1 when the zpool was created. The whole_disk property lets us detect when the partition suffix should be stripped from the device name in CLI output. It is also used to determine how writeback cache should be set for a device. When an existing zpool is imported its configuration is read from the vdev label by user space in zpool_read_label(). The whole_disk property is saved in the nvlist which gets passed into the kernel, where it in turn gets saved in the vdev struct in vdev_alloc(). Therefore, this value is available in vdev_disk_open() and should not be overridden by checking the provided device path, since that path will likely point to a partition and the check will return the wrong result. We also add an ASSERT that the whole_disk property is set. We are not aware of any cases where vdev_disk_open() should be called with a config that doesn't have this property set. The ASSERT is there so that when debugging is enabled we can identify any legitimate cases that we are missing. If we never hit the ASSERT, we can at some point remove it along with the conditional whole_disk check. Signed-off-by: Brian Behlendorf <[email protected]>
* Register the space accounting callback even when we don't have the ZPL.Ricardo M. Correia2010-10-041-1/+3
| | | | | | | | | | | | This callback is needed for properly accounting the per-uid and per-gid space usage. Even if we don't have the ZPL, we still need this callback in order to have proper on-disk ZPL compatibility and to be able to use Lustre quotas. Fortunately, the callback doesn't have any ZPL/VFS dependencies so we can just move it out of #ifdef HAVE_ZPL. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix missing vdev names in zpool status outputNed Bass2010-09-231-3/+7
| | | | | | | | | | | | | | | | | | | | Top-level vdev names in zpool status output should follow a <type-id> naming convention. In the case of raidz devices, the type portion of the name was missing. This commit fixes a bug in zpool_vdev_name() where in this snprintf call (void) snprintf(buf, sizeof (buf), "%s-%llu", path, (u_longlong_t)id); buf and path may point to the same location. The result is that buf ends up containing only the "-id" part. This only occurred for raidz devices because the code for appending the parity level to the type string stored its result in buf then set path to point there. To fix this we allocate a new temporary buffer on the stack instead of reusing buf. Signed-off-by: Brian Behlendorf <[email protected]> Closes #57
* Export ZFS symbols needed by Lustre.Ricardo M. Correia2010-09-171-0/+11
| | | | | | | Required for the DB_DNODE_ENTER()/DB_DNODE_EXIT() helpers. Signed-off-by: Ricardo M. Correia <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Quiet down very frequent large allocation warning in ZFS.Ricardo M. Correia2010-09-171-1/+1
| | | | | | | | | In my machine, dnode_hold_impl() allocates 9992 bytes in DEBUG mode and it causes a large stream of stack traces in the logs. Instead, use KM_NODEBUG to quiet down this known large alloc. Signed-off-by: Ricardo M. Correia <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Add missing Makefile.in from zpool_layout commitBrian Behlendorf2010-09-171-0/+520
| | | | | The scripts/zpool-layout/Makefile.in file generated by autogen.sh was accidentally omitted from the previous commit. Add it.
* Add [-m map] option to zpool_layoutBrian Behlendorf2010-09-178-20/+211
| | | | | | | | | | | | | | | | | By default the zpool_layout command would always use the slot number assigned by Linux when generating the zdev.conf file. This is a reasonable default there are cases when it makes sense to remap the slot id assigned by Linux using your own custom mapping. This commit adds support to zpool_layout to provide a custom slot mapping file. The file contains in the first column the Linux slot it and in the second column the custom slot mapping. By passing this map file with '-m map' to zpool_config the mapping will be applied when generating zdev.conf. Additionally, two sample mapping have been added which reflect different ways to map the slots in the dragon drawers.
* Fix markdown renderingBrian Behlendorf2010-09-151-2/+2
| | | | | | | | | | These two lines were being rendered incorrectly on the GitHub site. To fix the issue there needs to be leading whitespace before each line to ensure each command is rendered on its own line. $ ./configure $ make pkg
* Reference new zfsonlinux.org websiteBrian Behlendorf2010-09-141-4/+1
| | | | | | The wiki contents have been converted to html and made available at their new home http://zfsonlinux.org. The wiki has also been disabled the html pages are now the official documentation.
* Wait up to timeout seconds for udev devicezfs-0.5.1Brian Behlendorf2010-09-113-6/+29
| | | | | | | | | | Occasional failures were observed in zconfig.sh because udev could be delayed for a few seconds. To handle this the wait_udev function has been added to wait for timeout seconds for an expected device before returning an error. By default callers currently use a 30 seconds timeout which should be much longer than udev ever needs but not so long to worry the test suite is hung.
* Reduce volume size in zconfig.shBrian Behlendorf2010-09-101-9/+9
| | | | | Due to occasional ENOSPC failures on certain platforms I've reduced the size of the ZVOL from 400M to 300M for the zvol+ext2 clone tests.
* Use top level object directory in zfs-module.specBrian Behlendorf2010-09-101-1/+1
| | | | | | | | Commit 6283f55ea1b91e680386388c17d14b89e344fa8d updated _almost_ everything to use the correct top level object directory. This was done to correctly supporting building in custom directories. Unfortunately, I missed this one instance in the zfs-module.spec.in rpm spec file. Fix it.
* Exclude atomic.S source from dist rulesBrian Behlendorf2010-09-101-0/+6
| | | | | | | | | | | | | | | | | | | | The zfs package supports the option --with-config=srpm which is used to bootstrap configure to allow the 'make srpm' target to work. This has the advantage of allowing creation of source rpms without having all your -devel packages installed. This source package can then be feed back in to an automated build farm which only installs the required packages listed by the srpm. This ensures that all proper dependencies are expressed by the source package, because if they are not you will get configure/build failures. The trouble here is that --with-config=srpm prevents the architecture check from running resulting in TARGET_ASM_DIR being set to the default asm-generic. The 'make dist' rule then fails because there is no asm-generic/atomic.S file because it is generated at build time. To handle this I have added an empty file asm-generic/atomic.S simply as a place holder for 'make dist'.
* Use linux __KERNEL__ defineBrian Behlendorf2010-09-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | Previously the project contained who zfs_context.h files, one for user space builds and one for kernel space builds. It was the responsibility of the source including the file to ensure the right one was included based on the order of the include paths. This was the way it was done in OpenSolaris but for our purposes I felt it was overly obscure. The user and kernel zfs_context.h files have been combined in to a single file and a #define determines if you get the user or kernel context. The issue here was that I used the _KERNEL macro which is defined as part of the spl which will only be defined for most builds after you include the right zfs_context. It is safer to use the __KERNEL__ macro which is automatically defined as part of the kernel build process and passed as a command line compiler option. It will always be defined if your building in the kernel and never for user space.
* Fix "format not a string literal" warningBrian Behlendorf2010-09-081-5/+5
| | | | | | | | Under Ubuntu 10.04 the default compiler flags include -Wformat and -Wformat-security which cause the above warning. In particular, cases where "%s" was forgotten as part of the format specifier. https://wiki.ubuntu.com/CompilerFlags
* Support custom build directories and move includesBrian Behlendorf2010-09-08188-2454/+10643
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the neat tricks an autoconf style project is capable of is allow configurion/building in a directory other than the source directory. The major advantage to this is that you can build the project various different ways while making changes in a single source tree. For example, this project is designed to work on various different Linux distributions each of which work slightly differently. This means that changes need to verified on each of those supported distributions perferably before the change is committed to the public git repo. Using nfs and custom build directories makes this much easier. I now have a single source tree in nfs mounted on several different systems each running a supported distribution. When I make a change to the source base I suspect may break things I can concurrently build from the same source on all the systems each in their own subdirectory. wget -c http://github.com/downloads/behlendorf/zfs/zfs-x.y.z.tar.gz tar -xzf zfs-x.y.z.tar.gz cd zfs-x-y-z ------------------------- run concurrently ---------------------- <ubuntu system> <fedora system> <debian system> <rhel6 system> mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6 cd ubuntu cd fedora cd debian cd rhel6 ../configure ../configure ../configure ../configure make make make make make check make check make check make check This change also moves many of the include headers from individual incude/sys directories under the modules directory in to a single top level include directory. This has the advantage of making the build rules cleaner and logically it makes a bit more sense.
* Fix spl version checkBrian Behlendorf2010-09-022-9/+9
| | | | | | The spl_config.h file is checked to determine the spl version. However, the zfs code was looking for it in the source directory and not the build directory.
* Remove zfs-x.y.z.zip creation in 'make dist'Brian Behlendorf2010-09-022-5/+4
| | | | | Do no create a zfs-x.y.z.zip file as part of 'make dist'. Simply create the standard zfs-x.y.z.tar.gz file.
* Prep for zfs-0.5.1 tagBrian Behlendorf2010-09-011-1/+1
|
* Fix zfsdev_compat_ioctl() caseBrian Behlendorf2010-09-011-1/+1
| | | | | | For the !CONFIG_COMPAT case fix the zfsdev_compat_ioctl() compatibility function name. This was caught by the chaos4.3 builder.
* Minor packaging fixesBrian Behlendorf2010-09-016-13/+10
| | | | | | | | | | | | | | | The GIT file was removed from the tree because I have stopped using TopGit. Because of this is must also be removed from the top level Makefile.am as will as the zfs.spec.in file which referenced it. Fix type in lib/libzpool/Makefile.am which was preventing the needed zrlock.h header from being included by 'make dist'. I simply had the name wrong in the Makefile.am. Regenerated autogen.sh build products. Signed-off-by: Brian Behlendorf <[email protected]>
* Remove scripts/common.shBrian Behlendorf2010-09-011-443/+0
| | | | | | | | | This script is now dynamically generated at configure time from scripts/common.sh.in. This change was made by commit 26e61dd074df64f9e1d779273efd56fa9d92cdc5 but we accidentally kept the common.sh file around. Signed-off-by: Brian Behlendorf <[email protected]>
* Add quick build instructionsBrian Behlendorf2010-09-011-2/+8
| | | | | | | | | Full update to date build information will stay on the wiki for now, but there is no harm in adding the bare bones instructions to the README. They shouldn't change and are a reasonable quick start. Signed-off-by: Brian Behlendorf <[email protected]>
* Add initial autoconf productsBrian Behlendorf2010-08-3138-1/+53029
| | | | | | | Add the initial products from autogen.sh. These products will be updated incrementally after this point as development occurs. Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux ztest supportBrian Behlendorf2010-08-311-24/+25
| | | | | | | | Minor changes to ztest for this environment. These including updating ztest to run in the local development tree, as well as relocating some local variables in this function to the heap. Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux zpios supportBrian Behlendorf2010-08-3133-1/+4702
| | | | | | Linux kernel implementation of PIOS test app. Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux user util supportBrian Behlendorf2010-08-316-36/+41
| | | | | | | This topic branch contains required changes to the user space utilities to allow them to integrate cleanly with Linux. Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux user disk supportBrian Behlendorf2010-08-3119-706/+1387
| | | | | | | | | This topic branch contains all the changes needed to integrate the user side zfs tools with Linux style devices. Primarily this includes fixing up the Solaris libefi library to be Linux friendly, and integrating with the libblkid library which is provided by e2fsprogs. Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux unused code trackingBrian Behlendorf2010-08-313-77/+0
| | | | | | | Track various large hunks which have been dropped simply because they are not relevant to this port. Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux topology supportBrian Behlendorf2010-08-312-0/+21
| | | | | | | | | | Solaris recently introduced the idea of drive topology because where a drive is located does matter. I have already handled this with udev/blkid integration under Linux so I'm hopeful this case can simply be removed but for now I've just stubbed out what is needed in libspl and commented out the rest here. Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux compatibilityBrian Behlendorf2010-08-312-1/+8
| | | | | | Resolve minor Linux compatibility issues. Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux spa thread supportBrian Behlendorf2010-08-311-1/+3
| | | | | | Disable the spa thread under Linux until it can be implemented. Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux sha2 supportBrian Behlendorf2010-08-312-26/+101
| | | | | | | | | | | | | The upstream ZFS code has correctly moved to a faster native sha2 implementation. Unfortunately, under Linux that's going to be a little problematic so we revert the code to the more portable version contained in earlier ZFS releases. Using the native sha2 implementation in Linux is possible but the API is slightly different in kernel version user space depending on which libraries are used. Ideally, we need a fast implementation of SHA256 which builds as part of ZFS this shouldn't be that hard to do but it will take some effort. Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux libspl supportBrian Behlendorf2010-08-31103-1/+10117
| | | | | | | | All changes needed for the libspl layer. This includes modifications to files directly copied from OpenSolaris and the addition of new files needed to fill in the gaps. Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux kernel module supportBrian Behlendorf2010-08-3140-10/+820
| | | | | | | | | | | Setup linux kernel module support, this includes: - zfs context for kernel/user - kernel module build system integration - kernel module macros - kernel module symbol export - kernel module options Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux kernel memory supportBrian Behlendorf2010-08-3114-41/+77
| | | | | | Required kmem/vmem changes Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux kernel disk supportBrian Behlendorf2010-08-3127-112/+2571
| | | | | | Native Linux vdev disk interfaces Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux kernel device supportBrian Behlendorf2010-08-318-250/+791
| | | | | | | | | | | | | | | | This branch contains the majority of the changes required to cleanly intergrate with Linux style special devices (/dev/zfs). Mainly this means dropping all the Solaris style callbacks and replacing them with the Linux equivilants. This patch also adds the onexit infrastructure needed to track some minimal state between ioctls. Under Linux it would be easy to do this simply using the file->private_data. But under Solaris they apparent need to pass the file descriptor as part of the ioctl data and then perform a lookup in the kernel. Once again to keep code change to a minimum I've implemented the Solaris solution. Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux spl debug supportBrian Behlendorf2010-08-311-0/+13
| | | | | | Use spl debug if HAVE_SPL defined Signed-off-by: Brian Behlendorf <[email protected]>
* Add linux mntent supportBrian Behlendorf2010-08-311-0/+8
| | | | | | Use mount entry if HAVE_SETMNTENT defined Signed-off-by: Brian Behlendorf <[email protected]>