aboutsummaryrefslogtreecommitdiffstats
path: root/lib/libshare
Commit message (Collapse)AuthorAgeFilesLines
* Improve ZVOL queue behavior.Etienne Dechamps2012-02-071-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Linux block device queue subsystem exposes a number of configurable settings described in Linux block/blk-settings.c. The defaults for these settings are tuned for hard drives, and are not optimized for ZVOLs. Proper configuration of these options would allow upper layers (I/O scheduler) to take better decisions about write merging and ordering. Detailed rationale: - max_hw_sectors is set to unlimited (UINT_MAX). zvol_write() is able to handle writes of any size, so there's no reason to impose a limit. Let the upper layer decide. - max_segments and max_segment_size are set to unlimited. zvol_write() will copy the requests' contents into a dbuf anyway, so the number and size of the segments are irrelevant. Let the upper layer decide. - physical_block_size and io_opt are set to the ZVOL's block size. This has the potential to somewhat alleviate issue #361 for ZVOLs, by warning the upper layers that writes smaller than the volume's block size will be slow. - The NONROT flag is set to indicate this isn't a rotational device. Although the backing zpool might be composed of rotational devices, the resulting ZVOL often doesn't exhibit the same behavior due to the COW mechanisms used by ZFS. Setting this flag will prevent upper layers from making useless decisions (such as reordering writes) based on incorrect assumptions about the behavior of the ZVOL. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix synchronicity for ZVOLs.Etienne Dechamps2012-02-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zvol_write() assumes that the write request must be written to stable storage if rq_is_sync() is true. Unfortunately, this assumption is incorrect. Indeed, "sync" does *not* mean what we think it means in the context of the Linux block layer. This is well explained in linux/fs.h: WRITE: A normal async write. Device will be plugged. WRITE_SYNC: Synchronous write. Identical to WRITE, but passes down the hint that someone will be waiting on this IO shortly. WRITE_FLUSH: Like WRITE_SYNC but with preceding cache flush. WRITE_FUA: Like WRITE_SYNC but data is guaranteed to be on non-volatile media on completion. In other words, SYNC does not *mean* that the write must be on stable storage on completion. It just means that someone is waiting on us to complete the write request. Thus triggering a ZIL commit for each SYNC write request on a ZVOL is unnecessary and harmful for performance. To make matters worse, ZVOL users have no way to express that they actually want data to be written to stable storage, which means the ZIL is broken for ZVOLs. The request for stable storage is expressed by the FUA flag, so we must commit the ZIL after the write if the FUA flag is set. In addition, we must commit the ZIL before the write if the FLUSH flag is set. Also, we must inform the block layer that we actually support FLUSH and FUA. Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 3.3 compat, sops->show_options()Brian Behlendorf2012-02-031-0/+1
| | | | | | | | | | The second argument of sops->show_options() was changed from a 'struct vfsmount *' to a 'struct dentry *'. Add an autoconf check to detect the API change and then conditionally define the expected interface. In either case we are only interested in the zfs_sb_t. Signed-off-by: Brian Behlendorf <[email protected]> Closes #549
* Combine libraries: spl, avl, efi, share, unicode.Darik Horn2012-01-172-71/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | These libraries, which are an artifact of the ZoL development process, conflict with packages that are already in distribution: * libspl: SPL Programming Language * libavl: AVL for Linux * libefi: GRUB And these libraries are potential conflicts: * libshare: the Linux Mount Manager * libunicode: Perl and Python Recompose these five ZoL components into the four libraries that are conventionally provided by Solaris and FreeBSD systems: + libnvpair + libuutil + libzpool + libzfs This change resolves the name conflict, makes ZoL more compatible with existing software that uses autotools to detect ZFS, and allows pkg-zfs to better reflect the official Debian kFreeBSD packaging. Signed-off-by: Brian Behlendorf <[email protected]> Closes: #430
* Linux 3.1 compat, super_block->s_shrinkBrian Behlendorf2012-01-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Linux 3.1 kernel has introduced the concept of per-filesystem shrinkers which are directly assoicated with a super block. Prior to this change there was one shared global shrinker. The zfs code relied on being able to call the global shrinker when the arc_meta_limit was exceeded. This would cause the VFS to drop references on a fraction of the dentries in the dcache. The ARC could then safely reclaim the memory used by these entries and honor the arc_meta_limit. Unfortunately, when per-filesystem shrinkers were added the old interfaces were made unavailable. This change adds support to use the new per-filesystem shrinker interface so we can continue to honor the arc_meta_limit. The major benefit of the new interface is that we can now target only the zfs filesystem for dentry and inode pruning. Thus we can minimize any impact on the caching of other filesystems. In the context of making this change several other important issues related to managing the ARC were addressed, they include: * The dnlc_reduce_cache() function which was called by the ARC to drop dentries for the Posix layer was replaced with a generic zfs_prune_t callback. The ZPL layer now registers a callback to drop these dentries removing a layering violation which dates back to the Solaris code. This callback can also be used by other ARC consumers such as Lustre. arc_add_prune_callback() arc_remove_prune_callback() * The arc_reduce_dnlc_percent module option has been changed to arc_meta_prune for clarity. The dnlc functions are specific to Solaris's VFS and have already been largely eliminated already. The replacement tunable now represents the number of bytes the prune callback will request when invoked. * Less aggressively invoke the prune callback. We used to call this whenever we exceeded the arc_meta_limit however that's not strictly correct since it results in over zeleous reclaim of dentries and inodes. It is now only called once the arc_meta_limit is exceeded and every effort has been made to evict other data from the ARC cache. * More promptly manage exceeding the arc_meta_limit. When reading meta data in to the cache if a buffer was unable to be recycled notify the arc_reclaim thread to invoke the required prune. * Added arcstat_prune kstat which is incremented when the ARC is forced to request that a consumer prune its cache. Remember this will only occur when the ARC has no other choice. If it can evict buffers safely without invoking the prune callback it will. * This change is also expected to resolve the unexpect collapses of the ARC cache. This would occur because when exceeded just the arc_meta_limit reclaim presure would be excerted on the arc_c value via arc_shrink(). This effectively shrunk the entire cache when really we just needed to reclaim meta data. Signed-off-by: Brian Behlendorf <[email protected]> Closes #466 Closes #292
* Linux 3.2 compat: set_nlink()Darik Horn2011-12-161-0/+1
| | | | | | | | | | | Directly changing inode->i_nlink is deprecated in Linux 3.2 by commit SHA: bfe8684869601dacfcb2cd69ef8cfd9045f62170 Use the new set_nlink() kernel function instead. Signed-off-by: Brian Behlendorf <[email protected]> Closes: #462
* Add make rule for building Arch Linux packagesPrakash Surya2011-12-141-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added the necessary build infrastructure for building packages compatible with the Arch Linux distribution. As such, one can now run: $ ./configure $ make pkg # Alternatively, one can run 'make arch' as well on the Arch Linux machine to create two binary packages compatible with the pacman package manager, one for the zfs userland utilities and another for the zfs kernel modules. The new packages can then be installed by running: # pacman -U $package.pkg.tar.xz In addition, source-only packages suitable for an Arch Linux chroot environment or remote builder can also be build using the 'sarch' make rule. NOTE: Since the source dist tarball is created on the fly from the head of the build tree, it's MD5 hash signature will be continually influx. As a result, the md5sum variable was intentionally omitted from the PKGBUILD files, and the '--skipinteg' makepkg option is used. This may or may not have any serious security implications, as the source tarball is not being downloaded from an outside source. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #491
* Added comments for libshare's NFS functions.Gunnar Beutner2011-12-051-1/+57
| | | | | | | Some of the functions' purpose wasn't immediately obvious without additional explanations. This commit adds these missing comments. Signed-off-by: Brian Behlendorf <[email protected]>
* Simplify BDI integrationBrian Behlendorf2011-11-081-0/+1
| | | | | | | | | | | | | | Update the code to use the bdi_setup_and_register() helper to simplify the bdi integration code. The updated code now just registers the bdi during mount and destroys it during unmount. The only complication is that for 2.6.32 - 2.6.33 kernels the helper wasn't available so in these cases the zfs code must provide it. Luckily the bdi_setup_and_register() function is trivial. Signed-off-by: Brian Behlendorf <[email protected]> Closes #367
* Autogen refresh for udev changesBrian Behlendorf2011-08-081-0/+3
| | | | | | | | Run autogen.sh using the same autotools versions as upstream: * autoconf-2.63 * automake-1.11.1 * libtool-2.2.6b
* Add backing_device_info per-filesystemBrian Behlendorf2011-08-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For a long time now the kernel has been moving away from using the pdflush daemon to write 'old' dirty pages to disk. The primary reason for this is because the pdflush daemon is single threaded and can be a limiting factor for performance. Since pdflush sequentially walks the dirty inode list for each super block any delay in processing can slow down dirty page writeback for all filesystems. The replacement for pdflush is called bdi (backing device info). The bdi system involves creating a per-filesystem control structure each with its own private sets of queues to manage writeback. The advantage is greater parallelism which improves performance and prevents a single filesystem from slowing writeback to the others. For a long time both systems co-existed in the kernel so it wasn't strictly required to implement the bdi scheme. However, as of Linux 2.6.36 kernels the pdflush functionality has been retired. Since ZFS already bypasses the page cache for most I/O this is only an issue for mmap(2) writes which must go through the page cache. Even then adding this missing support for newer kernels was overlooked because there are other mechanisms which can trigger writeback. However, there is one critical case where not implementing the bdi functionality can cause problems. If an application handles a page fault it can enter the balance_dirty_pages() callpath. This will result in the application hanging until the number of dirty pages in the system drops below the dirty ratio. Without a registered backing_device_info for the filesystem the dirty pages will not get written out. Thus the application will hang. As mentioned above this was less of an issue with older kernels because pdflush would eventually write out the dirty pages. This change adds a backing_device_info structure to the zfs_sb_t which is already allocated per-super block. It is then registered when the filesystem mounted and unregistered on unmount. It will not be registered for mounted snapshots which are read-only. This change will result in flush-<pool> thread being dynamically created and destroyed per-mounted filesystem for writeback. Signed-off-by: Brian Behlendorf <[email protected]> Closes #174
* Use libzfs_run_process() in libshare.Gunnar Beutner2011-08-011-59/+29
| | | | | | | | This should simplify the code a bit by re-using existing code to fork and exec a process. Signed-off-by: Brian Behlendorf <[email protected]> Issue #190
* Call exportfs -v once for NFS shares.James H2011-08-011-90/+74
| | | | | | | | | | | | | | | | | | At the moment we call exportfs -v every time we check whether an NFS share is active. This happens every time you run a zfs or zpool command, making them extremely slow when you have a lot of exports. The time taken is approx O(n2) of the number of shares. This commit stores the output from exportfs -v in a temporary file and use this to speed up subsequent accesses. This mechanism is still too slow - if you have tens of thousands of NFS shares it will still be painful running ANY zfs/zpool command. Signed-off-by: Gunnar Beutner <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #341
* Provide a rc.d script for archlinuxzfs-0.6.0-rc5Kyle Fuller2011-07-111-0/+1
| | | | | | | | | | | Unlike most other Linux distributions archlinux installs its init scripts in /etc/rc.d insead of /etc/init.d. This commit provides an archlinux rc.d script for zfs and extends the build infrastructure to ensure it get's installed in the correct place. Signed-off-by: Brian Behlendorf <[email protected]> Closes #322
* Implemented sharing datasets via NFS using libshare.Gunnar Beutner2011-07-066-0/+2289
The sharenfs and sharesmb properties depend on the libshare library to export datasets via NFS and SMB. This commit implements the base libshare functionality as well as support for managing NFS shares. Signed-off-by: Brian Behlendorf <[email protected]>