summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* Linux 2.6.36 compat, blk_* macros removedBrian Behlendorf2010-11-101-0/+10
| | | | | | | Most of the blk_* macros were removed in 2.6.36. Ostensibly this was done to improve readability and allow easier grepping. However, from a portability stand point the macros are helpful. Therefore the needed macros are redefined here if they are missing from the kernel.
* Linux 2.6.36 compat, synchronous bio flagBrian Behlendorf2010-11-101-7/+0
| | | | | | | | | | | | | | The name of the flag used to mark a bio as synchronous has changed again in the 2.6.36 kernel due to the unification of the BIO_RW_* and REQ_* flags. The new flag is called REQ_SYNC. To simplify checking this flag I have introduced the vdev_disk_dio_is_sync() helper function. Based on the results of several new autoconf tests it uses the correct mask to check for a synchronous bio. Preferred interface for flagging a synchronous bio: 2.6.12-2.6.29: BIO_RW_SYNC 2.6.30-2.6.35: BIO_RW_SYNCIO 2.6.36-2.6.xx: REQ_SYNC
* Linux 2.6.36 compat, use REQ_FAILFAST_MASKBrian Behlendorf2010-11-101-6/+17
| | | | | | | | | | | | | | | | As of linux-2.6.36 the BIO_RW_FAILFAST and REQ_FAILFAST flags have been unified under the REQ_* names. These flags always had to be kept in-sync so this is a nice step forward, unfortunately it means we need to be careful to only use the new unified flags when the BIO_RW_* flags are not defined. Additional autoconf checks were added for this and if it is ever unclear which method to use no flags are set. This is safe but may result in longer delays before a disk is failed. Perferred interface for setting FAILFAST on a bio: 2.6.12-2.6.27: BIO_RW_FAILFAST 2.6.28-2.6.35: BIO_RW_FAILFAST_{DEV|TRANSPORT|DRIVER} 2.6.36-2.6.xx: REQ_FAILFAST_{DEV|TRANSPORT|DRIVER}
* Add helper functions for manipulating device namesNed Bass2010-10-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This change adds two helper functions for working with vdev names and paths. zfs_resolve_shortname() resolves a shorthand vdev name to an absolute path of a file in /dev, /dev/disk/by-id, /dev/disk/by-label, /dev/disk/by-path, /dev/disk/by-uuid, /dev/disk/zpool. This was previously done only in the function is_shorthand_path(), but we need a general helper function to implement shorthand names for additional zpool subcommands like remove. is_shorthand_path() is accordingly updated to call the helper function. There is a minor change in the way zfs_resolve_shortname() tests if a file exists. is_shorthand_path() effectively used open() and stat64() to test for file existence, since its scope includes testing if a device is a whole disk and collecting file status information. zfs_resolve_shortname(), on the other hand, only uses access() to test for existence and leaves it to the caller to perform any additional file operations. This seemed like the most general and lightweight approach, and still preserves the semantics of is_shorthand_path(). zfs_append_partition() appends a partition suffix to a device path. This should be used to generate the name of a whole disk as it is stored in the vdev label. The user-visible names of whole disks do not contain the partition information, while the name in the vdev label does. The code was lifted from the function make_disks(), which now just calls the helper function. Again, having a helper function to do this supports general handling of shorthand names in the user interface. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix missing 'zpool events'Brian Behlendorf2010-10-121-1/+1
| | | | | | | | | | | | | | | | | | | | It turns out that 'zpool events' over 1024 bytes in size where being silently dropped. This was discovered while writing the zfault.sh tests to validate common failure modes. This could occur because the zfs interface for passing an arbitrary size nvlist_t over an ioctl() is to provide a buffer for the packed nvlist which is usually big enough. In this case 1024 byte is the default. If the kernel determines the buffer is to small it returns ENOMEM and the minimum required size of the nvlist_t. This was working properly but in the case of 'zpool events' the event stream was advanced dispite the error. Thus the retry with the bigger buffer would succeed but it would skip over the previous event. The fix is to pass this size to zfs_zevent_next() and determine before removing the event from the list if it will fit. This was preferable to checking after the event was returned because this avoids the need to rewind the stream.
* Initial zio delay timingBrian Behlendorf2010-10-122-0/+8
| | | | | | | | | | | | | | | | | | | | | | | While there is no right maximum timeout for a disk IO we can start laying the ground work to measure how long they do take in practice. This change simply measures the IO time and if it exceeds 30s an event is posted for 'zpool events'. This value was carefully selected because for sd devices it implies that at least one timeout (SD_TIMEOUT) has occured. Unfortunately, even with FAILFAST set we may retry and request and not get an error. This behavior is strongly dependant on the device driver and how it is hooked in to the scsi error handling stack. However by setting the limit at 30s we can log the event even if no error was returned. Slightly longer term we can start recording these delays perhaps as a simple power-of-two histrogram. This histogram can then be reported as part of the 'zpool status' command when given an command line option. None of this code changes the internal behavior of ZFS. Currently it is simply for reporting excessively long delays.
* Add FAILFAST supportBrian Behlendorf2010-10-126-0/+37
| | | | | | | | | | | | | | | | | | | | ZFS works best when it is notified as soon as possible when a device failure occurs. This allows it to immediately start any recovery actions which may be needed. In theory Linux supports a flag which can be set on bio's called FAILFAST which provides this quick notification by disabling the retry logic in the lower scsi layers. That's the theory at least. In practice is turns out that while the flag exists you oddly have to set it with the BIO_RW_AHEAD flag. And even when it's set it you may get retries in the low level drivers decides that's the right behavior, or if you don't get the right error codes reported to the scsi midlayer. Unfortunately, without additional kernels patchs there's not much which can be done to improve this. Basically, this just means that it may take 2-3 minutes before a ZFS is notified properly that a device has failed. This can be improved and I suspect I'll be submitting patches upstream to handle this.
* Generate zevents for speculative and soft errorsBrian Behlendorf2010-10-121-0/+1
| | | | | | | | | | | | | By default the Solaris code does not log speculative or soft io errors in either 'zpool status' or post an event. Under Linux we don't want to change the expected behavior of 'zpool status' so these io errors are still suppressed there. However, since we do need to know about these events for Linux FMA and the 'zpool events' interface is new we do post the events. With the addition of the zio_flags field the posted events now contain enough information that a user space consumer can identify and discard these events if it sees fit.
* Use linux __KERNEL__ defineBrian Behlendorf2010-09-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | Previously the project contained who zfs_context.h files, one for user space builds and one for kernel space builds. It was the responsibility of the source including the file to ensure the right one was included based on the order of the include paths. This was the way it was done in OpenSolaris but for our purposes I felt it was overly obscure. The user and kernel zfs_context.h files have been combined in to a single file and a #define determines if you get the user or kernel context. The issue here was that I used the _KERNEL macro which is defined as part of the spl which will only be defined for most builds after you include the right zfs_context. It is safer to use the __KERNEL__ macro which is automatically defined as part of the kernel build process and passed as a command line compiler option. It will always be defined if your building in the kernel and never for user space.
* Support custom build directories and move includesBrian Behlendorf2010-09-0899-0/+56293
One of the neat tricks an autoconf style project is capable of is allow configurion/building in a directory other than the source directory. The major advantage to this is that you can build the project various different ways while making changes in a single source tree. For example, this project is designed to work on various different Linux distributions each of which work slightly differently. This means that changes need to verified on each of those supported distributions perferably before the change is committed to the public git repo. Using nfs and custom build directories makes this much easier. I now have a single source tree in nfs mounted on several different systems each running a supported distribution. When I make a change to the source base I suspect may break things I can concurrently build from the same source on all the systems each in their own subdirectory. wget -c http://github.com/downloads/behlendorf/zfs/zfs-x.y.z.tar.gz tar -xzf zfs-x.y.z.tar.gz cd zfs-x-y-z ------------------------- run concurrently ---------------------- <ubuntu system> <fedora system> <debian system> <rhel6 system> mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6 cd ubuntu cd fedora cd debian cd rhel6 ../configure ../configure ../configure ../configure make make make make make check make check make check make check This change also moves many of the include headers from individual incude/sys directories under the modules directory in to a single top level include directory. This has the advantage of making the build rules cleaner and logically it makes a bit more sense.