summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Treat spill block dbufs as meta dataBrian Behlendorf2014-05-051-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | When the system attributes (SAs) for an object exceed what can can be stored in the bonus area of a dnode a spill block is allocated. These spill blocks are currently considered data blocks. However, they should be accounted for as meta data because they are effectively an extension of the dnode. While this may seem like a minor accounting issue it has broader implications. The key thing to be aware of is that each spill block will hold a reference on its parent dnode. The dnode in turn holds a reference on its dbuf in the dnode object. This means that a single 512 byte data buffer for a spill block can pin over 16k of meta data. This is analogous to the small file situation described in 2b13331 where a relatively small number of data buffer can cause the ARC to exceed the meta limit. However, unlike the small file case a spill block can legitimately be considered meta data. By changing the spill block to meta data they will now be dropped from the cache when the meta limit is reached. This then allows the dnodes and dbufs which the spill block was pinning to be released. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Closes #2294
* Remove SELinux enforcing check from init scriptsBrian Behlendorf2014-05-023-20/+0
| | | | | | | | | | | | | | The default SELinux policy for RHEL and Fedora has been updated to include ZFS in the list of filesystems which support xattrs. Therefore, there's no longer a need to detect this in the init scripts. References: https://bugzilla.redhat.com/show_bug.cgi?id=811532 https://bugzilla.redhat.com/show_bug.cgi?id=816543 Signed-off-by: Brian Behlendorf <[email protected]> Closes #2166
* ztest: Switch to LWP rwlock interfaceRichard Yao2014-05-011-51/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ztest is intended to subject the ZFS code in userland to stress that it should be able to withstand. Any failures that occur when running it are failures that likely would occur inside the kernel. However, being in userland, it is much easier to debug them. In practice, this prevents a large number of problems from reaching production code. A design decision was made by the original authors of ztest to make a distinction between userland locking primitives and kernel locking primitives. The ztest code itself calls userland locking primitives while the kernel code being run in userland will call emulated kernel locking primitives that wrap the userland locking primitives. When ztest was first ported to Linux, a decision was made to use the emulated kernel interfaces everywhere. In effect, the userland rw_rdlock()/rw_wrlock() became the kernel rw_enter() and and the userland rw_unlock() became the kernel rw_exit(). This caused a regression because of an assertion in rw_enter() to catch recursive locking. That is permitted in userland, but not in the kernel. Consequently, the ztest code itself does recursive read locking. The use of the emulated kernel interfaces consequently caused the following failure: ztest: ../../lib/libzpool/kernel.c:384: Assertion `rwlp->rw_owner != zk_thread_current() (0x1c87150 != 0x1c87150)' failed. That occurs because ztest_dmu_objset_create_destroy() will take a read lock and call ztest_dmu_object_alloc_free(). That will call ztest_io(), which will take a readlock only when asked to do ZTEST_IO_REWRITE. This triggered the assertion. The pthreads rwlock interface was based on the LWP rwlock interface implemented in Illumos libc. Luckily enough, the subset used by ztest is almost identical, so we can solve this problem by switching to the LWP thread rwlock interface in ztest. This eliminates a point of divergence with Illumos and should make code sharing slightly easier. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1970
* libspl: Implement LWP rwlock interfaceRichard Yao2014-05-012-0/+52
| | | | | | | | | | | | This implements a subset of the LWP rwlock interface by wrapping the equivalent POSIX thread interface. It is a superset of the features needed by ztest. The missing bits are {,_}rw_read_held() and {,_}rw_write_held(). Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #1970
* Fix libblkid ZFS detection when making new poolsRichard Yao2014-05-011-1/+1
| | | | | | | | | zfsonlinux/zfs@1db7b9be75a225cedb3b7a60028ca5695e5b8346 should have fixed this, but this particular string was overlooked. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2288
* dmu_tx_assign() should not return ENOMEMBrian Behlendorf2014-05-011-1/+6
| | | | | | | | | | | | | | | | | | | | | | As described in the comment above dmu_tx_assign() this function must only fail if the pool is out of space. If for some other reason the TX cannot be assigned (such as memory pressure) ERESTART must be returned. Alternately, EAGAIN could be returned to inject a delay but that isn't required because the caller will block on the condition variable waiting for the next TXG. /* * Assign tx to a transaction group. txg_how can be one of: * * (1) TXG_WAIT. If the current open txg is full, waits until there's * a new one. This should be used when you're not holding locks. * It will only fail if we're truly out of space (or over quota). * ... */ Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #2287
* Implement File Attribute SupportRichard Yao2014-05-013-6/+114
| | | | | | | | | | | | | | | | | | | | | | | We add support for lsattr and chattr to resolve a regression caused by 88c283952f0bfeab54612f9ce666601d83c4244f that broke Python's xattr.list(). That changet broke Gentoo Portage's FEATURES=xattr, which depended on Python's xattr.list(). Only attributes common to both Solaris and Linux are supported. These are 'a', 'd' and 'i' in Linux's lsattr and chattr commands. File attributes exclusive to Solaris are present in the ZFS code, but cannot be accessed or modified through this method. That was the case prior to this patch. The resolution of issue zfsonlinux/zfs#229 should implement some method to permit access and modification of Solaris-specific attributes. References: https://bugs.gentoo.org/show_bug.cgi?id=483516 Original-patch-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1691
* Refactor inode_owner_or_capable() autotools checkRichard Yao2014-05-012-19/+36
| | | | | | | | | | | We need inode_owner_or_capable() for ZFS file attributes in addition to xattrs, so it should go into its own file. This moves it into its own file and changes it to be more comprehensive. It will now fail if no known good API is detected. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #1691
* Fill in mountpoint buffer before using it in errorsilovezfs2014-04-301-3/+3
| | | | | | | | | | | | | | | | | | zfs_is_mountable() fills in the mountpoint buffer, so, as in upstream, it needs to have been called before the mountpoint buffer can be used in error messages. In particular, return (zfs_error_fmt(hdl, EZFS_MOUNTFAILED, dgettext(TEXT_DOMAIN, "cannot mount '%s'"), mountpoint)); should not come before the call to zfs_is_mountable(). Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: ilovezfs <[email protected]> Closes #2284
* Add assertion to catch 0-count pageChunwei Chen2014-04-251-0/+7
| | | | | | | | | | | | Some network related block device uses tcp_sendpage, which doesn't behave well when using 0-count page. Add assertion to catch them. This has a runtime dependency on: zfsonlinux/spl@ae16ed9 Fix crash when using ZFS on Ceph rbd Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2277
* Add support for aarch64 (ARMv8)Jorgen Lundman2014-04-251-2/+2
| | | | | | | | | | | | | | Using the ARM reference simulation (fast model foundation v8) I cross compiled spl and zfs, to confirm it works on ARMv8 (64 bit arm architecture, called aarch64 in Linux). As it is based on previous ARM porting, the resulting patch is disappointingly small, there was very little to do. The code fixes the compile issues and has light testing done. Signed-off-by: Jorgen Lundman <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2260
* Fix LZ4 endianness autodetectionNed Bass2014-04-201-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Endianness detection in LZ4 is broken in user-space builds. This bug corrupts compressed data and manifests itself in several ztest failures. When LZ4 was originally ported to Illumos ZFS, the proper checks for Linux were stripped out. The Linux port then inherited the remaining detection code that works on Illumos but not on Linux. The current LZ4 endianness check misuses the condition defined(__BIG_ENDIAN) to indicate a big-endian system. On Linux __BIG_ENDIAN is defined uncondtionally in the user-space header /usr/include/endian.h, regardless of the endianness of the system. The kernel does not use this header, so only user-space builds are affected. While we could fix this by restoring the upstream LZ4 endianness detection code, reliable checks already exist in libspl/include/sys/isa_defs.h. This change uses the libspl results to replace the word-size and endianness checks in LZ4, simplifying the code and reducing duplication. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: DHE <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Fixes #1963 Fixes #1964 Fixes #1965
* Fix zfsdev_ioctl() kmem leak warningBrian Behlendorf2014-04-181-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | Due to an asymmetry in the kmem accounting a memory leak was being reported when it was only an accounting issue. All memory allocated with kmem_alloc() must be released with kmem_free() or it will not be properly accounted for. In this case the code used strfree() to release the memory allocated by kmem_alloc(). Presumably this was done because the size of the memory region wasn't available when the memory needed to be freed. To resolve this issue the code has been updated to use strdup() instead of kmem_alloc() to allocate the memory. Like strfree(), strdup() is not integrated with the memory accounting. This means we can use strfree() to release it like Illumos. SPL: kmem leaked 10/4368729 bytes address size data func:line ffff880067e9aa40 10 ZZZZZZZZZZ zfsdev_ioctl:5655 Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #2262
* Various zimport.sh fixesBrian Behlendorf2014-04-171-26/+31
| | | | | | | | | | | | | | 1) $SPLSRC and $SRCDIR should be changed to $SRC_DIR. These are vestiges of an earlier version of the script and were missed when it was updated. Additionally ensure the directory is created. 2) The 'fail' function should take an integer argument for the error code to return. Otherwise 0 (success) will be mistakenly returned and errors will we incorrectly suppressed. The error code should be meaningful enough to determine where the script failed. Signed-off-by: Brian Behlendorf <[email protected]>
* Report atime and relatime as the property's actual value.Tim Chase2014-04-161-2/+2
| | | | | | | | | | | Neither atime nor relatime should be considered to be "temporary mount point properties". Their semantics are enforced completely within ZFS and also they're (correctly) not documented as being temporary mount point properties. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2257
* Uninitialized variable spa_autoreplace usedDHE2014-04-161-1/+1
| | | | | | | | Caught by ztest and valgrind. Signed-off-by: DHE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2259
* Use ddi_time_after and friends to compare timeChunwei Chen2014-04-146-10/+22
| | | | | | | | | Also, make sure we use clock_t for ddi_get_lbolt to prevent type conversion from screwing things. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2142
* Make zimport.sh bash dependency explicitBrian Behlendorf2014-04-101-1/+1
| | | | | | | | Unfortunately, the zimport.sh test script really does depend on bash. Moving to /bin/sh should be possible once the shared infrastructure scripts it depends on is made portable. Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 3.14 compat: rq_for_each_segment in dmu_req_copyChunwei Chen2014-04-103-8/+56
| | | | | | | | | | rq_for_each_segment changed from taking bio_vec * to taking bio_vec. We provide rq_for_each_segment4 which takes both. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2124
* Revert "Fix zvol+btrfs hang"Chunwei Chen2014-04-101-77/+0
| | | | | | | | | | | | After the dmu_req_copy change, bi_io_vecs are not touched, so this is no longer needed. This reverts commit e26ade5101ba1d8e8350ff1270bfca4258e1ffe3. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2124
* Refactor dmu_req_copy for immutable biovec changesChunwei Chen2014-04-101-28/+39
| | | | | | | | | | | | Originally, dmu_req_copy modifies bv_len and bv_offset in bio_vec so that it can continue in subsequent passes. However, after the immutable biovec changes in Linux 3.14, this is not allowed. So instead, we just tell dmu_req_copy how many bytes are already copied and it will skip to the right spot accordingly. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2124
* Linux 3.14 compat: Immutable biovec changes in vdev_disk.cChunwei Chen2014-04-104-5/+36
| | | | | | | | | | bi_sector, bi_size and bi_idx are moved from bio to bio->bi_iter. This patch creates BIO_BI_*(bio) macros to hide the differences. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2124
* Linux 3.14 compat: posix_acl_{create,chmod}Chunwei Chen2014-04-103-5/+26
| | | | | | | | | posix_acl_{create,chmod} is changed to __posix_acl_{create_chmod} Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2124
* Improve zfs.sh error messagesBrian Behlendorf2014-04-102-4/+8
| | | | | | | | | Ensure an error message is logged when the 'zfs.sh' script fails to either load a module or if udev fails to create the /dev/zfs device. Error messages for missing KERNEL_MODULES are suppressed because that functionality may just be built-in to the kernel. Signed-off-by: Brian Behlendorf <[email protected]>
* Replace zed_file_create_dirs() with mkdirp()Chris Dunlap2014-04-093-97/+13
| | | | | | | | | | | | | | | | | | When processing directory components starting from the root dir, zed_file_create_dirs() contained a bug in checking the return value of mkdir(). A typo was made, and the test for (mkdir_errno != EEXIST) was erroneously written as (mkdir_errno == EEXIST). If some of the leading directory components already existed, this bug would cause the routine to exit before creating the remaining directory components. Instead of fixing the above mkdir_errno test, this commit replaces zed_file_create_dirs() with mkdirp(). This cleanup was already planned, and zed_file_create_dirs() only existed because I didn't realize mkdirp() was already in tree at the time. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2248
* Set errno for mkdirp() called with NULL path ptrChris Dunlap2014-04-091-1/+3
| | | | | | | | | | | | | | | | If mkdirp() is called with a NULL ptr for the path arg, it will return -1 with errno unchanged. This is unexpected since on error it should return -1 and set errno to one of the error values listed for mkdir(2). This commit sets errno = ENOENT for this NULL ptr case. This is in accordance with the errors specified by mkdir(2): ENOENT A component of the path prefix does not exist or is a null pathname. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2248
* Dynamically create loop devicesBrian Behlendorf2014-04-091-12/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | Several of the in-tree regression tests depend on the availability of loop devices. If for some reason no loop devices are available the tests will fail. Normally this isn't an issue because most Linux distributions create 8 loop devices by default. This is enough for our purposes. However, recent Fedora releases have only been creating a single loop device and this leads to failures. Alternately, if something else of the system is using the loop devices we may see failures. The fix for this is to update the support scripts to dynamically create loop devices as needed. The scripts need only create a node under /dev/ and the loop driver with create the minor. This behavior has been supported by the loop driver for ages. Additionally this patch updates cleanup_loop_devices() to cleanup loop devices which have already had their file store deleted. This helps prevent stale loop devices from accumulating on the system due to test failures. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Closes #2249
* Improve partition detection on lesser used devicesRichard Yao2014-04-081-8/+12
| | | | | | | | | | | | | | The format strings in efi_get_info() are intended to extract both the main device and partition number. However, this is only done correctly for hd, sd and vd devices. The format strings for ram, dm-, md and loop devices misparse the input. This causes the partition device to be incorrectly labelled as the main device with the partition being labelled 0. Reported-by: ilovezfs <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2175
* Allow specifying '-o <opts>' in defaults/init script.Turbo Fredriksson2014-04-041-1/+2
| | | | | | Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2103
* Support using overlay mounts in defaults/init script.Turbo Fredriksson2014-04-041-1/+6
| | | | | | Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2103
* Fix for re-reading /etc/mtab.John M. Layman2014-04-043-5/+22
| | | | | | | | | | | | | | | | | | | This is a continuation of fb5c53ea65b75c67c23f90ebbbb1134a5bb6c140: When /etc/mtab is updated on Linux it's done atomically with rename(2). A new mtab is written, the existing mtab is unlinked, and the new mtab is renamed to /etc/mtab. This means that we must close the old file and open the new file to get the updated contents. Using rewind(3) will just move the file pointer back to the start of the file, freopen(3) will close and open the file. In this commit, a few more rewind(3) calls were replaced with freopen(3) to allow updated mtab entries to be picked up immediately. Signed-off-by: John M. Layman <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2215 Issue #1611
* Fix locking order in zfs_zget()Richard Yao2014-04-041-1/+1
| | | | | Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Fix deadlock in zfs_zget()Richard Yao2014-04-041-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zfsonlinux/zfs#180 occurred because of a race between inode eviction and zfs_zget(). zfsonlinux/zfs@36df284 tried to address it by making a call to the VFS to learn whether an inode is being evicted. If it was being evicted the operation was retried after dropping and reacquiring the relevant resources. Unfortunately, this introduced another deadlock. INFO: task kworker/u24:6:891 blocked for more than 120 seconds. Tainted: P O 3.13.6 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u24:6 D ffff88107fcd2e80 0 891 2 0x00000000 Workqueue: writeback bdi_writeback_workfn (flush-zfs-5) ffff8810370ff950 0000000000000002 ffff88103853d940 0000000000012e80 ffff8810370fffd8 0000000000012e80 ffff88103853d940 ffff880f5c8be098 ffff88107ffb6950 ffff8810370ff980 ffff88103a9a5b78 0000000000000000 Call Trace: [<ffffffff813dd1d4>] schedule+0x24/0x70 [<ffffffff8115fc09>] __wait_on_freeing_inode+0x99/0xc0 [<ffffffff8115fdd8>] find_inode_fast+0x78/0xb0 [<ffffffff811608c5>] ilookup+0x65/0xd0 [<ffffffffa035c5ab>] zfs_zget+0xdb/0x260 [zfs] [<ffffffffa03589d6>] zfs_get_data+0x46/0x340 [zfs] [<ffffffffa035fee1>] zil_add_block+0xa31/0xc00 [zfs] [<ffffffffa0360642>] zil_commit+0x12/0x20 [zfs] [<ffffffffa036a6e4>] zpl_putpage+0x174/0x840 [zfs] [<ffffffff811071ec>] do_writepages+0x1c/0x40 [<ffffffff8116df2b>] __writeback_single_inode+0x3b/0x2b0 [<ffffffff8116ecf7>] writeback_sb_inodes+0x247/0x420 [<ffffffff8116f5f3>] wb_writeback+0xe3/0x320 [<ffffffff81170b8e>] bdi_writeback_workfn+0xfe/0x490 [<ffffffff8106072c>] process_one_work+0x16c/0x490 [<ffffffff810613f3>] worker_thread+0x113/0x390 [<ffffffff81066edf>] kthread+0xdf/0x100 This patch implements the original fix in a slightly different manner in order to avoid both deadlocks. Instead of relying on a call to ilookup() which can block in __wait_on_freeing_inode() the return value from igrab() is used. This gives us the information that ilookup() provided without the risk of a deadlock. Alternately, this race could be closed by registering an sops->drop_inode() callback. The callback would need to detect the active SA hold thereby informing the VFS that this inode should not be evicted. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #180
* Revert "Fixed a use-after-free bug in zfs_zget()."Brian Behlendorf2014-04-031-23/+1
| | | | | | This reverts commit 36df284366caa77cb40083d2e6bcce02274e2f05. Signed-off-by: Brian Behlendorf <[email protected]>
* Merge branch 'zed-initial'Chris Dunlap2014-04-0255-35/+4121
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zed monitors ZFS events. When a zevent is posted, zed will run any scripts that have been enabled for the corresponding zevent class. Multiple scripts may be invoked for a given zevent. The zevent nvpairs are passed to the scripts as environment variables. Refer to the zed(8) manpage for details. Events are processed synchronously by the single thread, and there is no maximum timeout for script execution. Consequently, a misbehaving script can delay (or forever block) the processing of subsequent zevents. Plans are to address this in future commits. An EID (Event IDentifier) has been added to each event to uniquely identify it throughout the lifetime of the loaded ZFS kernel module; it is a monotonically increasing integer that resets to 1 each time the module is loaded. Initial scripts have been developed to log zevents to syslog, automatically rebuild to a hot spare device, and send email in response to checksum / data / io / resilver.finish / scrub.finish zevents. To enable email notifications, configure ZED_EMAIL in zed.rc (which is serving as a config file of sorts until a proper configuration file is implemented). To enable hot sparing, uncomment ZED_SPARE_ON_IO_ERRORS and ZED_SPARE_ON_CHECKSUM_ERRORS in zed.rc; note that the autoexpand property is not yet supported. zed is a work-in-progress. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2
| * Replace check for _POSIX_MEMLOCK w/ HAVE_MLOCKALLChris Dunlap2014-04-022-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zed supports a '-M' cmdline opt to lock all pages in memory via mlockall(). The _POSIX_MEMLOCK define is checked to determine whether this function is supported. The current test assumes mlockall() is supported if _POSIX_MEMLOCK is non-zero. However, this test is insufficient according to mlock(2) and sysconf(3). If _POSIX_MEMLOCK is -1, mlockall() is not supported; but if _POSIX_MEMLOCK is 0, availability must be checked at runtime. This commit adds an autoconf check for mlockall() to user.m4. The zed code block for mlockall() is now guarded with a test for HAVE_MLOCKALL. If defined, mlockall() will be called and its runtime availability checked via its return value. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2
| * Add automatic hot spare functionalityBrian Behlendorf2014-04-027-9/+190
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a vdev starts getting I/O or checksum errors it is now possible to automatically rebuild to a hot spare device. To cleanly support this functionality in a shell script some additional information was added to all zevent ereports which include a vdev. This covers both io and checksum zevents but may be used but other scripts. In the Illumos FMA solution the same information is required but it is retrieved through the libzfs library interface. Specifically the following members were added: vdev_spare_paths - List of vdev paths for all hot spares. vdev_spare_guids - List of vdev guids for all hot spares. vdev_read_errors - Read errors for the problematic vdev vdev_write_errors - Write errors for the problematic vdev vdev_cksum_errors - Checksum errors for the problematic vdev. By default the required hot spare scripts are installed but this functionality is disabled. To enable hot sparing uncomment the ZED_SPARE_ON_IO_ERRORS and ZED_SPARE_ON_CHECKSUM_ERRORS in the /etc/zfs/zed.d/zed.rc configuration file. These scripts do no add support for the autoexpand property. At a minimum this requires adding a new udev rule to detect when a new device is added to the system. It also requires that the autoexpand policy be ported from Illumos, see: https://github.com/illumos/illumos-gate/blob/master/usr/src/cmd/syseventd/modules/zfs_mod/zfs_mod.c Support for detecting the correct name of a vdev when it's not a whole disk was added by Turbo Fredriksson. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Turbo Fredriksson <[email protected]> Issue #2
| * Add missing DATA_TYPE_STRING_ARRAY outputBrian Behlendorf2014-04-021-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | This functionality has always been missing. But until now there were no zevents which included an array of strings so it wasn't missed. However, that's now changed so to ensure this information is output correctly by 'zpool events -v' the DATA_TYPE_STRING_ARRAY has been implemented. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlap <[email protected]> Issue #2
| * Make command line guid parsing more tolerantBrian Behlendorf2014-04-022-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several of the zfs utilities allow you to pass a vdev's guid rather than the device name. However, the utilities are not consistent in how they parse that guid. For example, 'zinject' expects the guid to be passed as a hex value while 'zpool replace' wants it as a decimal. The user is forced to just know what format to use. This patch improve things by making the parsing more tolerant. When strtol(3) is called using 0 for the base, rather than say 10 or 16, it will then accept hex, decimal, or octal input based on the prefix. From the man page. If base is zero or 16, the string may then include a "0x" prefix, and the number will be read in base 16; otherwise, a zero base is taken as 10 (decimal) unless the next character is '0', in which case it is taken as 8 (octal). NOTE: There may be additional conversions not caught be this patch. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlap <[email protected]> Issue #2
| * Add systemd unit file for zedChris Dunlap2014-04-023-2/+20
| | | | | | | | | | | | | | | | | | | | This commit adds a systemd unit file for zed.service and integrates it into the zfs.target from commit 881f45c. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2108 Issue #2
| * Initial implementation of zed (ZFS Event Daemon)Chris Dunlap2014-04-0234-1/+3715
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zed monitors ZFS events. When a zevent is posted, zed will run any scripts that have been enabled for the corresponding zevent class. Multiple scripts may be invoked for a given zevent. The zevent nvpairs are passed to the scripts as environment variables. Events are processed synchronously by the single thread, and there is no maximum timeout for script execution. Consequently, a misbehaving script can delay (or forever block) the processing of subsequent zevents. Plans are to address this in future commits. Initial scripts have been developed to log events to syslog and send email in response to checksum/data/io errors and resilver.finish/scrub.finish events. By default, email will only be sent if the ZED_EMAIL variable is configured in zed.rc (which is serving as a config file of sorts until a proper configuration file is implemented). Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2
| * Replace zpool_events_next() "block" parm w/ "flags"Chris Dunlap2014-03-314-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zpool_events_next() can be called in blocking mode by specifying a non-zero value for the "block" parameter. However, the design of the ZFS Event Daemon (zed) requires additional functionality from zpool_events_next(). Instead of adding additional arguments to the function, it makes more sense to use flags that can be bitwise-or'd together. This commit replaces the zpool_events_next() int "block" parameter with an unsigned bitwise "flags" parameter. It also defines ZEVENT_NONE to specify the default behavior. Since non-blocking mode can be specified with the existing ZEVENT_NONBLOCK flag, the default behavior becomes blocking mode. This, in effect, inverts the previous use of the "block" parameter. Existing callers of zpool_events_next() have been modified to check for the ZEVENT_NONBLOCK flag. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2
| * Add defs for makefile installation dir varsChris Dunlap2014-03-313-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add macro definitions to AM_CPPFLAGS to propagate makefile installation directory variables for libexecdir, runstatedir, sbindir, and sysconfdir. https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Installation-Directory-Variables.html A corollary is that you should not use these variables except in makefiles. For instance, instead of trying to evaluate datadir in configure and hard-coding it in makefiles using e.g., 'AC_DEFINE_UNQUOTED([DATADIR], ["$datadir"], [Data directory.])', you should add -DDATADIR='$(datadir)' to your makefile's definition of CPPFLAGS (AM_CPPFLAGS if you are also using Automake). The runstatedir directory is for "installing data files which the programs modify while they run, that pertain to one specific machine, and which need not persist longer than the execution of the program". https://www.gnu.org/prep/standards/html_node/Directory-Variables.html It will be defined by autoconf 2.70 or later, and default to "$(localstatedir)/run". http://git.savannah.gnu.org/gitweb/?p=autoconf.git;a=commit;h=a197431414088a417b407b9b20583b2e8f7363bd Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2
| * Clarify zpool_events_next() commentBrian Behlendorf2014-03-313-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to the very poorly chosen argument name 'cleanup_fd' it was completely unclear that this file descriptor is used to track the current cursor location. When the file descriptor is created by opening ZFS_DEV a private cursor is created in the kernel for the returned file descriptor. Subsequent calls to zpool_events_next() and zpool_events_seek() then require the file descriptor as an argument to reposition the cursor. When the file descriptor is closed the kernel state tracking the cursor is destroyed. This patch contains no functional change, it just changes a few variable names and clarifies the documentation. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlap <[email protected]> Issue #2
| * Add zpool_events_seek() functionalityBrian Behlendorf2014-03-317-1/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | The ZFS_IOC_EVENTS_SEEK ioctl was added to allow user space callers to seek around the zevent file descriptor by EID. When a specific EID is passed and it exists the cursor will be positioned there. If the EID is no longer cached by the kernel ENOENT is returned. The caller may also pass ZEVENT_SEEK_START or ZEVENT_SEEK_END to seek to those respective locations. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlap <[email protected]> Issue #2
| * Add a unique "eid" value to all zeventsBrian Behlendorf2014-03-313-0/+18
|/ | | | | | | | | | | | Tagging each zevent with a unique monotonically increasing EID (Event IDentifier) provides the required infrastructure for a user space daemon to reliably process zevents. By writing the EID to persistent storage the daemon can safely resume where it left off in the event stream when it's restarted. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlap <[email protected]> Issue #2
* Remount datasets for "zfs inherit".Gunnar Beutner2014-03-241-0/+9
| | | | | | | | | | | | Changing properties with "zfs inherit" should cause the datasets to be remounted. This ensures that the modified property values will be propagated in to the filesystem namespace where they can be enforced. This change is modeled after an identical fix made to zfs_prop_set(). Signed-off-by: Gunnar Beutner <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2201
* Linux 3.13 compat: Handle __must_check bdi_setup_and_registerRichard Yao2014-03-241-2/+3
| | | | | | | | | | | | | | | torvalds/linux@8077c0d983ab276ec5f2700df56a64d671781905 added a __must_check to the bdi_setup_and_register(), which caused our autotools check to break. zfsonlinux/zfs@729210564a5325e190fc4fba22bf17bacf957ace was intended to correct that, but it depended on -Wno-unused-result, which is unrecognized in older GCC versions. That commit has been reverted in favor of a solution that does not require -Wno-unused-result. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2102 Closes #2135
* Revert "Properly ignore bdi_setup_and_register return value"Richard Yao2014-03-241-4/+1
| | | | | | | | | | Older GCC versions do not obey -Wno-unused-result. This reverts commit 729210564a5325e190fc4fba22bf17bacf957ace in favor of a solution that does not require -Wno-unused-result. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #1906
* Illumos #4089 NULL pointer dereference in arc_read()Boris Protopopov2014-03-241-9/+11
| | | | | | | | | | | | | | | | | | 4089 NULL pointer dereference in arc_read() Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/4089 illumos/illumos-gate@57815f6b95a743697e148327725b7f568e75e6ea Signed-off-by: Brian Behlendorf <[email protected]> Issue #2171 Issue #2165 Closes #2198