summaryrefslogtreecommitdiffstats
path: root/lib
Commit message (Collapse)AuthorAgeFilesLines
* Illumos #3100: zvol rename fails with EBUSY when dirty.Matthew Ahrens2012-10-031-1/+1
| | | | | | | | | | | | | | | | | illumos/illumos-gate@2e2c135528b3edfe9aaf67d1f004dc0202fa1a54 Illumos changeset: 13780:6da32a929222 3100 zvol rename fails with EBUSY when dirty Reviewed by: Christopher Siden <[email protected]> Reviewed by: Adam H. Leventhal <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Approved by: Eric Schrock <[email protected]> Ported-by: Etienne Dechamps <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #995
* Create threads in detached state in userspace.Etienne Dechamps2012-10-031-1/+2
| | | | | | | | | | | | | | | | | | | | | Currently, thread_create(), when called in userspace, creates a joinable (i.e. not detached thread). This is the pthread default. Unfortunately, this does not reproduce kthreads behavior (kthreads are always detached). In addition, this contradicts the original Solaris code which creates userspace threads in detached mode. These joinable threads are never joined, which leads to a leakage of pthread thread objects ("zombie threads"). This in turn results in excessive ressource consumption, and possible ressource exhaustion in extreme cases (e.g. long ztest runs). This patch fixes the issue by creating userspace threads in detached mode. The only exception is ztest worker threads which are meant to be joinable. Signed-off-by: Brian Behlendorf <[email protected]> Issue #989
* Illumos #2703: add mechanism to report ZFS send progressBill Pijewski2012-09-191-1/+81
| | | | | | | | | | | | | Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Robert Mustacchi <[email protected]> Reviewed by: Richard Lowe <[email protected]> Approved by: Eric Schrock <[email protected]> References: https://www.illumos.org/issues/2703 Ported by: Martin Matuska <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos #1948: zpool list should show more detailed pool infoChris Siden2012-09-191-6/+27
| | | | | | | | | | | | | | | | | | Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Eric Schrock <[email protected]> Reviewed by: Richard Lowe <[email protected]> Reviewed by: Albert Lee <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Approved by: Eric Schrock <[email protected]> References: https://www.illumos.org/issues/1948 Ported by: Martin Matuska <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #685
* Seg fault 'zpool import -d /dev/disk/by-id -a'zfs-0.6.0-rc11Brian Behlendorf2012-09-181-1/+1
| | | | | | | | | Introduced by commit 44867b6d6effc1628dd00c36821ab044f89fb988. We should of course check to ensure best isn't NULL before attempting to dereference it. Signed-off-by: Brian Behlendorf <[email protected]> Closes #974
* Improve `zpool import` search behaviorBrian Behlendorf2012-09-171-29/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The goal of this change is to make 'zpool import' prefer to use the peristent /dev/mapper or /dev/disk/by-* paths. These are far preferable to the devices in /dev/ whos names are not persistent and are determined by the order in which a device is detected. This patch improves things by changing the default search path from just to the top level /dev/ directory to (in order): /dev/disk/by-vdev - Custom rules, use first if they exist /dev/disk/zpool - Custom rules, use first if they exist /dev/mapper - Use multipath devices before components /dev/disk/by-uuid - Single unique entry and persistent /dev/disk/by-id - May be multiple entries and persistent /dev/disk/by-path - Encodes physical location and persistent /dev/disk/by-label - Custom persistent labels /dev - UNSAFE device names will change The default search path can be overriden by setting the ZPOOL_IMPORT_PATH environment variable. This must be a colon delimited list of paths which are searched for vdevs. If the 'zpool import -d' option is specified only those listed paths will be searched. Finally, when multiple paths to the same device are found. If one of the paths is an exact match for the path used last time to import the pool it will be used. When there are no exact matches the prefered path will be determined by the provided search order. This means you can still import a pool and force specific names by providing the -d <path> option. And the prefered names will persist as long as those paths exist on your system. Signed-off-by: Brian Behlendorf <[email protected]> Closes #965
* Avoid running exportfs on each zfs/zpool command invocationCyril Plisko2012-09-111-9/+30
| | | | | | | | | Delay executing exportfs command until its results are actually required. Signed-off-by: Cyril Plisko <[email protected]> Signed-off-by: Gunnar Beutner <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Increase the stack space in userspace.Etienne Dechamps2012-09-061-4/+9
| | | | | | | | | | | | | | | | | | In 1e33ac1e2677c898a0b5ef6207048c692cb51bf4, the maximum stack size for userspace tools was set to 8k to mimic the available kernel stack size. Unfortunately, due to differences in how the stack is used in userspace vs kernel space, spurious stack overflows could occur in userspace tools due to the limited stack size. This is especially true in ztest when debugging is enabled. This patch multiplies the userspace stack size by 4, which fixes the stack overflow issues. This comes at the price of not being able to catch stack size issues in userspace, but the previous solution proved unreliable anyway. Signed-off-by: Brian Behlendorf <[email protected]> Fixes #934.
* Fix missing vdev names in zpool status outputMichael Martin2012-09-051-6/+6
| | | | | | | | | | | | | | | | Commit 858219c makes more sense down below in the 'if (verbose)' section of the code. Initially, buf and path will never point to the same location. Once 'path = buf' is set on a raidz vdev, the code may drop into the verbose section depending on the verbose flag. In here, using a tmpbuf makes sense since now 'buf == path'. This issue does not occur in the upstream Solaris code because their implementations of snprintf() allow for buf and path to be the same address. Signed-off-by: Brian Behlendorf <[email protected]> Closes #57
* Remove autotools productsBrian Behlendorf2012-08-2721-14779/+0
| | | | | | | | Remove all of the generated autotools products from the repository and update the .gitignore files accordingly. Signed-off-by: Brian Behlendorf <[email protected]> Closes #718
* Illumos #2803: zfs get guid pretty-prints the outputGarrett D'Amore2012-08-231-0/+12
| | | | | | | | | | | | | Reviewed by: Eric Schrock <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Alexander Eremin <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/2803 Ported by: Martin Matuska <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos #1796, #2871, #2903, #2957Christopher Siden2012-08-233-10/+65
| | | | | | | | | | | | | | | | | | | 1796 "ZFS HOLD" should not be used when doing "ZFS SEND" from a read-only pool 2871 support for __ZFS_POOL_RESTRICT used by ZFS test suite 2903 zfs destroy -d does not work 2957 zfs destroy -R/r sometimes fails when removing defer-destroyed snapshot Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Eric Schrock <[email protected]> References: https://www.illumos.org/issues/1796 https://www.illumos.org/issues/2871 https://www.illumos.org/issues/2903 https://www.illumos.org/issues/2957 Ported by: Martin Matuska <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos #2635: 'zfs rename -f' to perform force unmountEric Schrock2012-08-231-2/+4
| | | | | | | | | | | | | | | Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Bill Pijewski <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Richard Lowe <[email protected]> References: https://www.illumos.org/issues/2635 Ported by: Martin Matuska <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #717
* Properly initialize and free destroydataMartin Matuska2012-08-231-0/+2
| | | | | | | | | | This regression was accidentally introduced by commit 330d06f90d143b41b276796526a66a1c1fff046d due to ZoL specific code. The fix is to simply ensure the passed nvlist is initialized and freed. Signed-off-by: Brian Behlendorf <[email protected]> Closes #876
* Illumos #1693: persistent 'comment' field for a zpoolDan McDonald2012-08-082-2/+39
| | | | | | | | | | | | | Reviewed by: George Wilson <[email protected]> Reviewed by: Eric Schrock <[email protected]> Approved by: Richard Lowe <[email protected]> References: https://www.illumos.org/issues/1693 Ported by: Martin Matuska <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #678
* Set zvol discard_granularity to the volblocksize.Etienne Dechamps2012-08-0721-0/+21
| | | | | | | | | | | | | | | | | | Currently, zvols have a discard granularity set to 0, which suggests to the upper layer that discard requests of arbirarily small size and alignment can be made efficiently. In practice however, ZFS does not handle unaligned discard requests efficiently: indeed, it is unable to free a part of a block. It will write zeros to the specified range instead, which is both useless and inefficient (see dnode_free_range). With this patch, zvol block devices expose volblocksize as their discard granularity, so the upper layer is aware that it's not supposed to send discard requests smaller than volblocksize. Signed-off-by: Brian Behlendorf <[email protected]> Closes #862
* Illumos #1644, #1645, #1646, #1647, #1708Matthew Ahrens2012-07-316-416/+1082
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1644 add ZFS "clones" property 1645 add ZFS "written" and "written@..." properties 1646 "zfs send" should estimate size of stream 1647 "zfs destroy" should determine space reclaimed by destroying multiple snapshots 1708 adjust size of zpool history data References: https://www.illumos.org/issues/1644 https://www.illumos.org/issues/1645 https://www.illumos.org/issues/1646 https://www.illumos.org/issues/1647 https://www.illumos.org/issues/1708 This commit modifies the user to kernel space ioctl ABI. Extra care should be taken when updating to ensure both the kernel modules and utilities are updated. This change has reordered all of the new ioctl()s to the end of the list. This should help minimize this issue in the future. Reviewed by: Richard Lowe <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Albert Lee <[email protected]> Approved by: Garrett D'Amore <[email protected]> Ported by: Martin Matuska <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #826 Closes #664
* Use /sys/module instead of /proc/modules.Etienne Dechamps2012-07-261-19/+5
| | | | | | | | | | | | | | | | | | When libzfs checks if the module is loaded or not, it currently reads /proc/modules and searches for a line matching the module name. Unfortunately, if the module is included in the kernel itself (built-in module), then /proc/modules won't list it, so libzfs will wrongly conclude that the module is not loaded, thus making all ZFS userspace tools unusable. Fortunately, all loaded modules appear as directories in /sys/module, even built-in ones. Thus we can use /sys/module in lieu of /proc/modules to fix the issue. As a bonus, the code for checking becomes much simpler. Signed-off-by: Brian Behlendorf <[email protected]> Issue #851
* Linux 3.5 compat, end_writeback() changed to clear_inode()Richard Yao2012-07-2321-0/+21
| | | | | | | | | | | | | | | | The end_writeback() function was changed by moving the call to inode_sync_wait() earlier in to evict(). This effecitvely changes the ordering of the sync but it does not impact the details of the zfs implementation. However, as part of this change end_writeback() was renamed to clear_inode() to reflect the new semantics. This change does impact us and clear_inode() now maps to end_writeback() for kernels prior to 3.5. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #784
* Linux 3.5 compat, iops->truncate_range() removedRichard Yao2012-07-2321-0/+21
| | | | | | | | | The vmtruncate_range() support has been removed from the kernel in favor of using the fallocate method in the file_operations table. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #784
* Linux 3.5 compat, eops->encode_fh() takes inodesRichard Yao2012-07-2321-0/+21
| | | | | | | | | | | | | | The export_operations member ->encode_fh() has been updated to take both the child and parent inodes. This interface used to take the child dentry and a bool describing if the parent is needed. NOTE: While updating this code I noticed that we do not currently cleanly handle the case where we're passed a connectable parent. This code should be audited to make sure we're doing the right thing. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #784
* Move partition scanning from userspace to module.Etienne Dechamps2012-07-1723-11/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, zpool online -e (dynamic vdev expansion) doesn't work on whole disks because we're invoking ioctl(BLKRRPART) from userspace while ZFS still has a partition open on the disk, which results in EBUSY. This patch moves the BLKRRPART invocation from the zpool utility to the module. Specifically, this is done just before opening the device in vdev_disk_open() which is called inside vdev_reopen(). This requires jumping through some hoops to get to the disk device from the partition device, and to make sure we can still open the partition after the BLKRRPART call. Note that this new code path is triggered on dynamic vdev expansion only; other actions, like creating a new pool, are unchanged and still call BLKRRPART from userspace. This change also depends on API changes which are available in 2.6.37 and latter kernels. The build system has been updated to detect this, but there is no compatibility mode for older kernels. This means that online expansion will NOT be available in older kernels. However, it will still be possible to expand the vdev offline. Signed-off-by: Brian Behlendorf <[email protected]> Closes #808
* Add PowerPC to supported VTOCsBrian Behlendorf2012-07-121-1/+1
| | | | | | | | | | | | This code was was inherited from Solaris which was careful to define the expected VTOC for various supported architectures. While this check may have made sense there it's something we should be able to safely drop under Linux. However, I'm not quite ready to do that yet. So for the moment I'm just doing the very safe thing of adding PowerPC as a supported type. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix efi_use_whole_disk() when efi_nparts == 128.Etienne Dechamps2012-07-121-24/+26
| | | | | | | | | | | | | | | | | | Commit e5dc681a changed EFI_NUMPAR from 9 to 128. This means that the on-disk EFI label has efi_nparts = 128 instead of 9. The index of the reserved partition, however, is still 8. This breaks efi_use_whole_disk(), which uses efi_nparts-1 as the index of the reserved partition. This commit fixes efi_use_whole_disk() when the index of the reserved partition is not efi_nparts-1. It rewrites the algorithm and makes it more robust by using the order of the partitions instead of their numbering. It assumes that the last non-empty partition is the reserved partition, and that the non-empty partition before that is the data partition. Signed-off-by: Brian Behlendorf <[email protected]> Issue #808
* Use the right device path when relabeling.Etienne Dechamps2012-07-121-5/+14
| | | | | | | | | | | | | | | Currently, zpool_vdev_online() calls zpool_relabel_disk() with a short partition device name, which is obviously wrong because (1) zpool_relabel_disk() expects a full, absolute path to use with open() and (2) efi_write() must be called on an opened disk device, not a partition device. With this patch, zpool_relabel_disk() gets called with a full disk device path. The path is determined using the same algorithm as zpool_find_vdev(). Signed-off-by: Brian Behlendorf <[email protected]> Issue #808
* Fix error handling for "zpool online -e".Etienne Dechamps2012-07-121-5/+7
| | | | | | | | | | | | | | | | | | | | | | The error handling code around zpool_relabel_disk() is either inexistent or wrong. The function call itself is not checked, and zpool_relabel_disk() is generating error messages from an unitialized buffer. Before: # zpool online -e homez sdb; echo $? `: cannot relabel 'sdb1': unable to open device: 2 0 After: # zpool online -e homez sdb; echo $? cannot expand sdb: cannot relabel 'sdb1': unable to open device: 2 1 Signed-off-by: Brian Behlendorf <[email protected]> Issue #808
* Illumos #1949, #1953George Wilson2012-07-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | 1949 crash during reguid causes stale config 1953 allow and unallow missing from zpool history since removal of pyzfs Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Eric Schrock <[email protected]> Reviewed by: Bill Pijewski <[email protected]> Reviewed by: Richard Lowe <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Steve Gonczi <[email protected]> Approved by: Eric Schrock <[email protected]> References: https://www.illumos.org/issues/1949 https://www.illumos.org/issues/1953 Ported by: Brian Behlendorf <[email protected]> Closes #665
* Illumos #1748: desire support for reguid in zfsGarrett D'Amore2012-07-112-0/+23
| | | | | | | | | | | | | | | | | | | | | | Reviewed by: George Wilson <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Reviewed by: Alexander Eremin <[email protected]> Reviewed by: Alexander Stetsenko <[email protected]> Approved by: Richard Lowe <[email protected]> References: https://www.illumos.org/issues/1748 This commit modifies the user to kernel space ioctl ABI. Extra care should be taken when updating to ensure both the kernel modules and utilities are updated. If only the user space component is updated both the 'zpool events' command and the 'zpool reguid' command will not work until the kernel modules are updated. Ported by: Martin Matuska <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #665
* Speed up 'zfs list -t snapshot -o name -s name'Pawel Jakub Dawidek2012-06-142-10/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | FreeBSD #xxx: Dramatically optimize listing snapshots when user requests only snapshot names and wants to sort them by name, ie. when executes: # zfs list -t snapshot -o name -s name Because only name is needed we don't have to read all snapshot properties. Below you can find how long does it take to list 34509 snapshots from a single disk pool before and after this change with cold and warm cache: before: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 525s warm cache: 218s after: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 1.7s warm cache: 1.1s NOTE: This patch only appears in FreeBSD. If/when Illumos picks up the change we may want to drop this patch and adopt their version. However, for now this addresses a real issue. Ported-by: Brian Behlendorf <[email protected]> Issue #450
* Make zvol_remove_link() print a more useful error messageRichard Yao2012-06-131-1/+1
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* Retry removal of busy minorsDaniel Verite2012-06-111-1/+20
| | | | | | | | | | | When failing to remove a zvol device link because it's busy, wait a bit and retry in a loop instead of giving up immediately. This technique is similar to the loop in zpool_label_disk_wait(), with the same goal: waiting for the asynchronous udev processes to finish their work. Signed-off-by: Brian Behlendorf <[email protected]> Closes #692
* Linux 3.4 compat, d_make_root() replaces d_alloc_root()Richard Yao2012-06-1121-0/+21
| | | | | | | | | | | | | | | | | | | | torvalds/linux@adc0e91ab142abe93f5b0d7980ada8a7676231fe introduced introduced d_make_root() as a replacement for d_alloc_root(). Further commits appear to have removed d_alloc_root() from the Linux source tree. This causes the following failure: error: implicit declaration of function 'd_alloc_root' [-Werror=implicit-function-declaration] To correct this we update the code to use the current d_make_root() interface for readability. Then we introduce an autotools check to determine if d_make_root() is available. If it isn't then we define some compatibility logic which used the older d_alloc_root() interface. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #776
* Improve 'zpool import' EBUSY error messageBrian Behlendorf2012-06-011-0/+6
| | | | | | | | | | | | | | | | | | When a device is already open O_EXCL by another process the `zpool import` will correctly fail. However, the default failure message isn't very helpful. It may in fact be harmful if you take its advise and destroy your pool. cannot import 'tank': pool is busy Destroy and re-create the pool from a backup source. Improve the error message in the EBUSY case to simply print a message indicating that the devices are current in use. The user will need to manually identify which process has the device open exclusively and why. Signed-off-by: Brian Behlendorf <[email protected]>
* Add /dev/mapper/ to search pathBrian Behlendorf2012-06-011-1/+13
| | | | | | | | When creating pools short device names may be used when those devices appear in certain well known locations under /dev/. This change adds /dev/mapper/ to that list. Signed-off-by: Brian Behlendorf <[email protected]>
* Add vdev_id for JBOD-friendly udev aliasesNed A. Bass2012-06-011-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vdev_id parses the file /etc/zfs/vdev_id.conf to map a physical path in a storage topology to a channel name. The channel name is combined with a disk enclosure slot number to create an alias that reflects the physical location of the drive. This is particularly helpful when it comes to tasks like replacing failed drives. Slot numbers may also be re-mapped in case the default numbering is unsatisfactory. The drive aliases will be created as symbolic links in /dev/disk/by-vdev. The only currently supported topologies are sas_direct and sas_switch: o sas_direct - a channel is uniquely identified by a PCI slot and a HBA port o sas_switch - a channel is uniquely identified by a SAS switch port A multipath mode is supported in which dm-mpath devices are handled by examining the first running component disk, as reported by 'multipath -l'. In multipath mode the configuration file should contain a channel definition with the same name for each path to a given enclosure. vdev_id can replace the existing zpool_id script on systems where the storage topology conforms to sas_direct or sas_switch. The script could be extended to support other topologies as well. The advantage of vdev_id is that it is driven by a single static input file that can be shared across multiple nodes having a common storage toplogy. zpool_id, on the other hand, requires a unique /etc/zfs/zdev.conf per node and a separate slot-mapping file. However, zpool_id provides the flexibility of using any device names that show up in /dev/disk/by-path, so it may still be needed on some systems. vdev_id's functionality subsumes that of the sas_switch_id script, and it is unlikely that anyone is using it, so sas_switch_id is removed. Finally, /dev/disk/by-vdev is added to the list of directories that 'zpool import' will scan. Signed-off-by: Brian Behlendorf <[email protected]> Closes #713
* Define the needed ISA types for ARMJorgen Lundman2012-05-032-2/+21
| | | | | | Add the minimum required ISA types to support the ARM architecture. Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 3.3 compat, iops->create()/mkdir()/mknod()Brian Behlendorf2012-04-3021-0/+21
| | | | | | | | | | The mode argument of iops->create()/mkdir()/mknod() was changed from an 'int' to a 'umode_t'. To prevent a compiler warning an autoconf check was added to detect the API change and then correctly set a zpl_umode_t typedef. There is no functional change. Signed-off-by: Brian Behlendorf <[email protected]> Closes #701
* Improve error message consistencyRichard Laager2012-04-111-5/+5
| | | | | Signed-off-by: Richard Laager <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Add --enable-debug-dmu-tx configure optionBrian Behlendorf2012-03-2321-0/+21
| | | | | | | | | | | | | | | | | | Allow rigorous (and expensive) tx validation to be enabled/disabled indepentantly from the standard zfs debugging. When enabled these checks ensure that all txs are constructed properly and that a dbuf is never dirtied without taking the correct tx hold. This checking is particularly helpful when adding new dmu consumers like Lustre. However, for established consumers such as the zpl with no known outstanding tx construction problems this is just overhead. --enable-debug-dmu-tx - Enable/disable validation of each tx as --disable-debug-dmu-tx it is constructed. By default validation is disabled due to performance concerns. Signed-off-by: Brian Behlendorf <[email protected]>
* Add .zfs control directoryBrian Behlendorf2012-03-2222-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for the .zfs control directory. This was accomplished by leveraging as much of the existing ZFS infrastructure as posible and updating it for Linux as required. The bulk of the core functionality is now all there with the following limitations. *) The .zfs/snapshot directory automount support requires a 2.6.37 or newer kernel. The exception is RHEL6.2 which has backported the d_automount patches. *) Creating/destroying/renaming snapshots with mkdir/rmdir/mv in the .zfs/snapshot directory works as expected. However, this functionality is only available to root until zfs delegations are finished. * mkdir - create a snapshot * rmdir - destroy a snapshot * mv - rename a snapshot The following issues are known defeciences, but we expect them to be addressed by future commits. *) Add automount support for kernels older the 2.6.37. This should be possible using follow_link() which is what Linux did before. *) Accessing the .zfs/snapshot directory via NFS is not yet possible. The majority of the ground work for this is complete. However, finishing this work will require resolving some lingering integration issues with the Linux NFS kernel server. *) The .zfs/shares directory exists but no futher smb functionality has yet been implemented. Contributions-by: Rohan Puri <[email protected]> Contributiobs-by: Andrew Barnes <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #173
* Align parition end on 1 MiB boundaryNed Bass2012-03-051-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | Some devices have exhibited sensitivity to the ending alignment of partitions. In particular, even if the first partition begins at 1 MiB, we have seen many sd driver task abort errors with certain SSDs if the first partition doesn't end on a 1 MiB boundary. This occurs when the vdev label is read during pool creation or importation and causes a delay of about 30 seconds per device. It can also be simulated with dd when the pool isn't imported: dd if=/dev/sda1 of=/dev/null bs=262144 count=1 For the record, this problem was observed with SMARTMOD SG9XCA2E200GE01 200GB SSDs. Unfortunately I don't have a good explanation for this behavior. It seems to have something to do with highly fragmented single-sector requests being issued to the device, which it may not support. With end-aligned partitions at least page-sized requests were queued and issued to the driver according to blktrace. In any case, aligning the partition end is a fairly innocuous work-around, wasting at most 1 MiB of space. Signed-off-by: Brian Behlendorf <[email protected]> Closes #574
* Cleanly support debug packagesBrian Behlendorf2012-02-2721-0/+21
| | | | | | | | | | | | | | | | | | | Allow a source rpm to be rebuilt with debugging enabled. This avoids the need to have to manually modify the spec file. By default debugging is still largely disabled. To enable specific debugging features use the following options with rpmbuild. '--with debug' - Enables ASSERTs # For example: $ rpmbuild --rebuild --with debug zfs-modules-0.6.0-rc6.src.rpm Additionally, ZFS_CONFIG has been added to zfs_config.h for packages which build against these headers. This is critical to ensure both zfs and the dependant package are using the same prototype and structure definitions. Signed-off-by: Brian Behlendorf <[email protected]>
* Support ashift=13 for 8KB SSD block sizesRichard Yao2012-02-131-1/+1
| | | | | | | | | | | | | | New SSDs are now available which use an internal 8k block size. To make sure ZFS can get the maximum performance out of these devices we're increasing the maximum ashift to 13 (8KB). This value is still small enough that we can fit 16 uberblocks in the vdev ring label. However, I don't want to increase this any futher or it will limit the ability the safely roll back a pool to recover it. Signed-off-by: Brian Behlendorf <[email protected]> Closes #565
* Add 'fsid' mount option to allowed options.Turbo Fredriksson2012-02-131-1/+1
| | | | | | | | | | Resolves nfs-utils-1.0.x compatibility issue which requires that the fsid be set in the export options. exportfs: Warning: /tank/dir requires fsid= for NFS export Signed-off-by: Brian Behlendorf <[email protected]> Closes #570
* Add support for DISCARD to ZVOLs.Etienne Dechamps2012-02-0921-0/+21
| | | | | | | | | | | | | | | | | | | | DISCARD (REQ_DISCARD, BLKDISCARD) is useful for thin provisioning. It allows ZVOL clients to discard (unmap, trim) block ranges from a ZVOL, thus optimizing disk space usage by allowing a ZVOL to shrink instead of just grow. We can't use zfs_space() or zfs_freesp() here, since these functions only work on regular files, not volumes. Fortunately we can use the low-level function dmu_free_long_range() which does exactly what we want. Currently the discard operation is not added to the log. That's not a big deal since losing discard requests cannot result in data corruption. It would however result in disk space usage higher than it should be. Thus adding log support to zvol_discard() is probably a good idea for a future improvement. Signed-off-by: Brian Behlendorf <[email protected]>
* Support the fallocate() file operation.Etienne Dechamps2012-02-0921-0/+21
| | | | | | | | | | | | | | Currently only the (FALLOC_FL_PUNCH_HOLE) flag combination is supported, since it's the only one that matches the behavior of zfs_space(). This makes it pretty much useless in its current form, but it's a start. To support other flag combinations we would need to modify zfs_space() to make it more flexible, or emulate the desired functionality in zpl_fallocate(). Signed-off-by: Brian Behlendorf <[email protected]> Issue #334
* Improve ZVOL queue behavior.Etienne Dechamps2012-02-0721-0/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Linux block device queue subsystem exposes a number of configurable settings described in Linux block/blk-settings.c. The defaults for these settings are tuned for hard drives, and are not optimized for ZVOLs. Proper configuration of these options would allow upper layers (I/O scheduler) to take better decisions about write merging and ordering. Detailed rationale: - max_hw_sectors is set to unlimited (UINT_MAX). zvol_write() is able to handle writes of any size, so there's no reason to impose a limit. Let the upper layer decide. - max_segments and max_segment_size are set to unlimited. zvol_write() will copy the requests' contents into a dbuf anyway, so the number and size of the segments are irrelevant. Let the upper layer decide. - physical_block_size and io_opt are set to the ZVOL's block size. This has the potential to somewhat alleviate issue #361 for ZVOLs, by warning the upper layers that writes smaller than the volume's block size will be slow. - The NONROT flag is set to indicate this isn't a rotational device. Although the backing zpool might be composed of rotational devices, the resulting ZVOL often doesn't exhibit the same behavior due to the COW mechanisms used by ZFS. Setting this flag will prevent upper layers from making useless decisions (such as reordering writes) based on incorrect assumptions about the behavior of the ZVOL. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix synchronicity for ZVOLs.Etienne Dechamps2012-02-0721-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zvol_write() assumes that the write request must be written to stable storage if rq_is_sync() is true. Unfortunately, this assumption is incorrect. Indeed, "sync" does *not* mean what we think it means in the context of the Linux block layer. This is well explained in linux/fs.h: WRITE: A normal async write. Device will be plugged. WRITE_SYNC: Synchronous write. Identical to WRITE, but passes down the hint that someone will be waiting on this IO shortly. WRITE_FLUSH: Like WRITE_SYNC but with preceding cache flush. WRITE_FUA: Like WRITE_SYNC but data is guaranteed to be on non-volatile media on completion. In other words, SYNC does not *mean* that the write must be on stable storage on completion. It just means that someone is waiting on us to complete the write request. Thus triggering a ZIL commit for each SYNC write request on a ZVOL is unnecessary and harmful for performance. To make matters worse, ZVOL users have no way to express that they actually want data to be written to stable storage, which means the ZIL is broken for ZVOLs. The request for stable storage is expressed by the FUA flag, so we must commit the ZIL after the write if the FUA flag is set. In addition, we must commit the ZIL before the write if the FLUSH flag is set. Also, we must inform the block layer that we actually support FLUSH and FUA. Signed-off-by: Brian Behlendorf <[email protected]>
* Let libnvpair be linked independently of libzfs.Darik Horn2012-02-074-3/+9
| | | | | | | | | | | | Autoconf will fail to detect the ZoL libnvpair on systems that do not implicitly link library runtime dependencies, which is anything that has the GCC 4.5 DCO update. Build libuutil before libnvpair, and put it on the the LDADD line of the libnvpair automake template. Signed-off-by: Brian Behlendorf <[email protected]> Closes: #560
* Linux 3.3 compat, sops->show_options()Brian Behlendorf2012-02-0321-0/+21
| | | | | | | | | | The second argument of sops->show_options() was changed from a 'struct vfsmount *' to a 'struct dentry *'. Add an autoconf check to detect the API change and then conditionally define the expected interface. In either case we are only interested in the zfs_sb_t. Signed-off-by: Brian Behlendorf <[email protected]> Closes #549