aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* zfs.8: Drop references to Oracle documentationRichard Laager2016-05-161-2/+2
| | | | Signed-off-by: Richard Laager <[email protected]>
* zfs.8: zfs get and zfs list accept mountpointsRichard Laager2016-05-161-5/+6
| | | | Signed-off-by: Richard Laager <[email protected]>
* OpenZFS 6739 - assumption in cv_timedwait_hiresDenys Rtveliashvili2016-05-152-27/+26
| | | | | | | | | | | | | | | | | | | Userland version of cv_timedwait_hires() always assumes absolute time. Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Robert Mustacchi <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported by: Denys Rtveliashvili <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/6739 OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/41c6413 Porting Notes: The ported change has revealed a number of problems in the Linux-specific code, as it was expecting incorrect return codes from pthread_* functions. Reviewed and improved the usage of pthread_* function in lib/libzpool/kernel.c.
* Fix the test to use the variablejyxent2016-05-131-1/+1
| | | | | Signed-off-by: Manuel Amador (Rudd-O) <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4645
* Use cv_timedwait_sig_hires in arc_reclaim_threadChunwei Chen2016-05-122-1/+3
| | | | | | | | | | | The was originally using interruptible cv_timedwait_sig, but was changed to uninterruptible cv_timedwait_hires in ae6d0c6. Use _sig_hires instead to allow interruptible sleep. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4633 Closes #4634
* A collection of dracut fixesManuel Amador (Rudd-O)2016-05-1210-18/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | - In older systems without sysroot.mount, import before dracut-mount, and re-enable old dracut mount hook - rootflags MUST be present even if the administrator neglected to specify it explicitly - Check that mount.zfs exists in sbindir - Remove awk and head as (now unused) requirements, add grep, and install the right mount.zfs - Eliminate one use of grep in Dracut - Use a more accurate grepping statement to identify zfsutil in rootflags - Ensure that pooldev is nonempty - Properly handle /dev/sd* devices and more - Use new -P to get list of zpool devices - Bail out of the generator when zfs:AUTO is on the root command line - Ignore errors from systemctl trying to load sysroot.mount, we only care about the output - Determine which one is the correct initqueuedir at run time. - Add a compatibility getargbool for our detection / setup script. - Update dracut .gitignore files Signed-off-by: <Matthew Thode [email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4558 Closes #4562
* OpenZFS 6093 - zfsctl_shares_lookupDan McDonald2016-05-121-8/+4
| | | | | | | | | | | | | | | | | | | 6093 zfsctl_shares_lookup should only VN_RELE() on zfs_zget() success Reviewed by: Gordon Ross <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/6093 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/0f92170 Closes #4630 This function was always implemented slightly differently under Linux and therefore never suffered from this issue. The patch has been updated and applied as cleanup in order to minimize differences with the upstream OpenZFS code.
* Revert "Kill znode->z_gen field"Brian Behlendorf2016-05-126-16/+19
| | | | | | | | | | This reverts commit 4cd77889b684fd0dd1a0a995b692dda3db76a9ac. The i_generation field in the inode is 32-bit and the SA code expects 64-bit fixed values. Revert this optimization for now until this is cleanly addressed. Signed-off-by: Brian Behlendorf <[email protected]> Issue #4538
* Add -lhHpw options to "zpool iostat" for avg latency, histograms, & queuesTony Hutter2016-05-1229-178/+2074
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the zfs module to collect statistics on average latencies, queue sizes, and keep an internal histogram of all IO latencies. Along with this, update "zpool iostat" with some new options to print out the stats: -l: Include average IO latencies stats: total_wait disk_wait syncq_wait asyncq_wait scrub read write read write read write read write wait ----- ----- ----- ----- ----- ----- ----- ----- ----- - 41ms - 2ms - 46ms - 4ms - - 5ms - 1ms - 1us - 4ms - - 5ms - 1ms - 1us - 4ms - - - - - - - - - - - 49ms - 2ms - 47ms - - - - - - - - - - - - - 2ms - 1ms - - - 1ms - ----- ----- ----- ----- ----- ----- ----- ----- ----- 1ms 1ms 1ms 413us 16us 25us - 5ms - 1ms 1ms 1ms 413us 16us 25us - 5ms - 2ms 1ms 2ms 412us 26us 25us - 5ms - - 1ms - 413us - 25us - 5ms - - 1ms - 460us - 29us - 5ms - 196us 1ms 196us 370us 7us 23us - 5ms - ----- ----- ----- ----- ----- ----- ----- ----- ----- -w: Print out latency histograms: sdb total disk sync_queue async_queue latency read write read write read write read write scrub ------- ------ ------ ------ ------ ------ ------ ------ ------ ------ 1ns 0 0 0 0 0 0 0 0 0 ... 33us 0 0 0 0 0 0 0 0 0 66us 0 0 107 2486 2 788 12 12 0 131us 2 797 359 4499 10 558 184 184 6 262us 22 801 264 1563 10 286 287 287 24 524us 87 575 71 52086 15 1063 136 136 92 1ms 152 1190 5 41292 4 1693 252 252 141 2ms 245 2018 0 50007 0 2322 371 371 220 4ms 189 7455 22 162957 0 3912 6726 6726 199 8ms 108 9461 0 102320 0 5775 2526 2526 86 17ms 23 11287 0 37142 0 8043 1813 1813 19 34ms 0 14725 0 24015 0 11732 3071 3071 0 67ms 0 23597 0 7914 0 18113 5025 5025 0 134ms 0 33798 0 254 0 25755 7326 7326 0 268ms 0 51780 0 12 0 41593 10002 10002 0 537ms 0 77808 0 0 0 64255 13120 13120 0 1s 0 105281 0 0 0 83805 20841 20841 0 2s 0 88248 0 0 0 73772 14006 14006 0 4s 0 47266 0 0 0 29783 17176 17176 0 9s 0 10460 0 0 0 4130 6295 6295 0 17s 0 0 0 0 0 0 0 0 0 34s 0 0 0 0 0 0 0 0 0 69s 0 0 0 0 0 0 0 0 0 137s 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------- -h: Help -H: Scripted mode. Do not display headers, and separate fields by a single tab instead of arbitrary space. -q: Include current number of entries in sync & async read/write queues, and scrub queue: syncq_read syncq_write asyncq_read asyncq_write scrubq_read pend activ pend activ pend activ pend activ pend activ ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 0 0 0 0 78 29 0 0 0 0 0 0 0 0 78 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 0 0 227 394 0 19 0 0 0 0 0 0 227 394 0 19 0 0 0 0 0 0 108 98 0 19 0 0 0 0 0 0 19 98 0 0 0 0 0 0 0 0 78 98 0 0 0 0 0 0 0 0 19 88 0 0 0 0 0 0 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -p: Display numbers in parseable (exact) values. Also, update iostat syntax to allow the user to specify specific vdevs to show statistics for. The three options for choosing pools/vdevs are: Display a list of pools: zpool iostat ... [pool ...] Display a list of vdevs from a specific pool: zpool iostat ... [pool vdev ...] Display a list of vdevs from any pools: zpool iostat ... [vdev ...] Lastly, allow zpool command "interval" value to be floating point: zpool iostat -v 0.5 Signed-off-by: Tony Hutter <[email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #4433
* Fixes bug in fix_paths()Marcel Huber2016-05-121-1/+1
| | | | | | | | | | | | | Fixes bug introduced in commit 7d90f569a. Hinted by gcc: libzfs_import.c: In function ‘fix_paths’: libzfs_import.c:602:28: warning: self-comparison always evaluates to true [-Wtautological-compare] if (best->ne_num_labels == best->ne_num_labels && Signed-off-by: Marcel Huber <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4632
* Reduce stack usage of dmu_recv_stream functionNikolay Borisov2016-05-111-57/+62
| | | | | | | | | | | | | | | | | | | | The receive_writer_arg and receive_arg structures become large when ZFS is compiled with debugging enabled. This results in gcc throwing an error about excessive stack usage: module/zfs/dmu_send.c: In function ‘dmu_recv_stream’: module/zfs/dmu_send.c:2502:1: error: the frame size of 1256 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Fix this by allocating those functions on the heap, rather than on the stack. With patch: dmu_send.c:2350:1:dmu_recv_stream 240 static Without patch: dmu_send.c:2350:1:dmu_recv_stream 1336 static Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4620
* OpenZFS 3993, 4700Adam Stevko2016-05-115-50/+139
| | | | | | | | | | | | | | | | | | | | 3993 zpool(1M) and zfs(1M) should support -p for "list" and "get" 4700 "zpool get" doesn't support -H or -o options Reviewed by: Dan McDonald <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/3993 OpenZFS-issue: https://www.illumos.org/issues/4700 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c58b352 Porting notes: I removed ZoL's zpool_get_prop_literal() in favor of zpool_get_prop(..., boolean_t literal) since that's what OpenZFS uses. The functionality is the same.
* Add zfs-helpers.sh scriptBrian Behlendorf2016-05-104-1/+154
| | | | | | | | | | | | | Add a script designed to facilitate in-tree development and testing by installing symlinks on your system which refer to in-tree helper utilities. These helper utilities must be installed to in order to exercise all ZFS functionality. By using symbolic links and keeping the scripts in-tree during development they can be easily modified and those changes tracked. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #4607
* OpenZFS 6842 - Fix empty xattr dir causing lockupChunwei Chen2016-05-102-11/+35
| | | | | | | | | | | | | | | | | Reviewed by: Brian Behlendorf <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Denys Rtveliashvili <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> An initial version of this patch was applied in commit 29572cc and subsequently refined upstream. Since the implementations do not conflict with each other both are left applied for now. OpenZFS-issue: https://www.illumos.org/issues/6842 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/02525cd Closes #4615
* OpenZFS 6873 - zfs_destroy_snaps_nvl leaks errlistChris Williamson2016-05-091-2/+5
| | | | | | | | | | | | | | | Authored by: Chris Williamson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Ported-by: Denys Rtveliashvili <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> lzc_destroy_snaps() returns an nvlist in errlist. zfs_destroy_snaps_nvl() should nvlist_free() it before returning. OpenZFS-issue: https://www.illumos.org/issues/6873 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ee06391 Closes #4614
* OpenZFS 6879 - Incorrect endianness swapDenys Rtveliashvili2016-05-091-1/+1
| | | | | | | | | | | | | | | Authored by: Dan Kimmel <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Ported-by: Denys Rtveliashvili <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Incorrect endianness swap for drr_spill.drr_length in libzfs_sendrecv.c Instead of drr_write.drr_length, we should be assigning the result of the byteswap to drr_spill.drr_length. OpenZFS-issue: https ://www.illumos.org/issues/6879 OpenZFS-commit: https ://github.com/openzfs/openzfs/commit/74c8720 Closes #4613
* Wrap vdev_count_verify_zaps() with ZFS_DEBUGBrian Behlendorf2016-05-061-0/+4
| | | | | | | | | | Commit e0ab3ab introduced two blocks of code which are only needed when debugging is enabled. These blocks should be wrapped with ZFS_DEBUG for clarity and to prevent unused variable warnings in a production build. Signed-off-by: Don Brady <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4515
* Per-vdev ZAP tests must use $ZPOOL and $ZDBBrian Behlendorf2016-05-068-29/+29
| | | | | | | | | | Commit e0ab3ab introduced new per-vdev ZAP tests which should have used the $ZPOOL and $ZDB variabled. The tests passed the automated testing since both utilities but when running in-tree all of the new tests fail. Signed-off-by: Don Brady <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4515
* OpenZFS 6672 - arc_reclaim_thread() should use gethrtime()David Quigley2016-05-062-5/+14
| | | | | | | | | | | | | | | 6672 arc_reclaim_thread() should use gethrtime() instead of ddi_get_lbolt() Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Josef 'Jeff' Sipek <[email protected]> Reviewed by: Robert Mustacchi <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: David Quigley <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/6672 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/571be5c Closes #4600
* OpenZFS 6286 - ZFS internal error when set large block on bootfsBrian Behlendorf2016-05-051-3/+3
| | | | | | | | | | | | 6286 ZFS internal error when set large block on bootfs Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Andriy Gapon <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/6286 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/6de9bb5 Closes #4585
* OpenZFS 6544 - incorrect comment in libzfs.h about offline statusTony Hutter2016-05-051-1/+1
| | | | | | | | | | | 6544 incorrect comment in libzfs.h about offline status Reviewed by: Matthew Ahrens <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/6544 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/cb605c4 Closes #4595
* OpenZFS 5669 - altroot not set in zpool createBrian Behlendorf2016-05-051-0/+2
| | | | | | | | | | | 5669 altroot not set in zpool create when specified with -o Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/5669 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c423721 Closes #4594
* taskq_create() calls thread_create() with wrong argumentsDenys Rtveliashvili2016-05-051-1/+1
| | | | | | | Correct the arguments passed to `thread_create()`. Signed-off-by: Isaac Huang <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4593
* OpenZFS 6736 - ZFS per-vdev ZAPsJoe Stein2016-05-0224-75/+980
| | | | | | | | | | | | | | | | | 6736 ZFS per-vdev ZAPs Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Don Brady <[email protected]> Reviewed by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/6736 https://github.com/openzfs/openzfs/commit/215198a Ported-by: Don Brady <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4515
* Kill znode->z_gen fieldNikolay Borisov2016-05-026-19/+16
| | | | | | | | This field is a duplicate of the inode->i_generation, so just kill it Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4538
* Enable PF_FSTRANS for ioctl secpolicy callbacks (#4571)Tim Chase2016-05-021-1/+4
| | | | | | | | | | At the very least, the zfs_secpolicy_write_perms ioctl security policy callback, which calls dsl_dataset_hold(), can require freeing memory and, therefore, re-enter ZFS. This patch enables PF_FSTRANS for all of the security policy callbacks similarly to the manner in which it's enabled for the actual ioctl callback. Signed-off-by: Brian Behlendorf <[email protected]> Closes #4554
* module/.gitignore: Add *.dwo (#4580)Vitaut Bajaryn2016-05-021-0/+1
| | | | | | | These files get generated when CONFIG_DEBUG_INFO_DWARF4 is enabled in Linux .config. Signed-off-by: Brian Behlendorf <[email protected]> Closes #4580
* Fix user namespaces uid/gid mappingBrian Behlendorf2016-04-301-4/+4
| | | | | | | | | | | As described in torvalds/linux@5f3a4a2 the &init_user_ns, and not the current user_ns, should be passed to posix_acl_from_xattr() and posix_acl_to_xattr(). Conveniently the init_user_ns is available through the init credential (kcred). Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Massimo Maggi <[email protected]> Closes #4177
* Add support for libtirpcBrian Behlendorf2016-04-289-132/+61
| | | | | | | | | | | | | | | | | | | | | While OpenSolaris libc and glibc both include XDR support, the musl libc does not in favor of depending on the BSD-licensed libtirpc library. Adding support is a simple matter of detecting the library, including the headers and linking against it. By default libtirpc will be checked for and if available used. Otherwise, configure will fall back to using the xdr implementation provided by libc if available. The options --with-tirpc/--without-tirpc can be used to disable this checking. In addition, the xdr_control() function has been simplied to only handle ZFSs specific use case. Original-patch-by: stf <[email protected]> Original-patch-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Signed-off-by: Carlo Landmeter <[email protected]> Closes #2254 Closes #4559
* Illumos 6844 - dnode_next_offset can detect fictional holesAlex Reece2016-04-272-5/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 6844 dnode_next_offset can detect fictional holes Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> dnode_next_offset is used in a variety of places to iterate over the holes or allocated blocks in a dnode. It operates under the premise that it can iterate over the blockpointers of a dnode in open context while holding only the dn_struct_rwlock as reader. Unfortunately, this premise does not hold. When we create the zio for a dbuf, we pass in the actual block pointer in the indirect block above that dbuf. When we later zero the bp in zio_write_compress, we are directly modifying the bp. The state of the bp is now inconsistent from the perspective of dnode_next_offset: the bp will appear to be a hole until zio_dva_allocate finally finishes filling it in. In the meantime, dnode_next_offset can detect a hole in the dnode when none exists. I was able to experimentally demonstrate this behavior with the following setup: 1. Create a file with 1 million dbufs. 2. Create a thread that randomly dirties L2 blocks by writing to the first L0 block under them. 3. Observe dnode_next_offset, waiting for it to skip over a hole in the middle of a file. 4. Do dnode_next_offset in a loop until we skip over such a non-existent hole. The fix is to ensure that it is valid to iterate over the indirect blocks in a dnode while holding the dn_struct_rwlock by passing the zio a copy of the BP and updating the actual BP in dbuf_write_ready while holding the lock. References: https://www.illumos.org/issues/6844 https://github.com/openzfs/openzfs/pull/82 DLPX-35372 Ported-by: Brian Behlendorf <[email protected]> Closes #4548
* Illumos 6659 - nvlist_free(NULL) is a no-opJosef 'Jeff' Sipek2016-04-2711-39/+19
| | | | | | | | | | | | | | | 6659 nvlist_free(NULL) is a no-op Reviewed by: Toomas Soome <[email protected]> Reviewed by: Marcel Telka <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://www.illumos.org/issues/6659 https://github.com/illumos/illumos-gate/commit/aab83bb Ported-by: David Quigley <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4566
* Fix zfs_copies_001_pos/zfs_copies_004_negBrian Behlendorf2016-04-272-0/+5
| | | | | | | | Call block_device_wait when creating/destroying volumes in order to make the operations synchronous as expected by the test cases. Signed-off-by: Brian Behlendorf <[email protected]> Closes #4560
* Fix 'zpool import' blkid device namesBrian Behlendorf2016-04-251-19/+123
| | | | | | | | | | | | | | | | | When importing a pool using the blkid cache only the device node path was added to the list of known paths for a device. This results in 'zpool import' always using the sdX names in preference to the 'path' name stored in the label. To fix the issue the blkid import path has been updated to add both the 'path', 'devid', and 'devname' names from the label to the known paths. A sanity check is done to ensure these paths do refer to the same device identified by blkid. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #4523 Closes #3043
* Disable efi_debug in --enable-debug buildsBrian Behlendorf2016-04-251-4/+0
| | | | | | | | | | Disable the additional EFI debugging in all builds. Some users run debug builds in production and the extra log messages can cause confusion. Beyond that the log messages are rarely useful. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #4523
* Use udev for partition detectionBrian Behlendorf2016-04-254-46/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When ZFS partitions a block device it must wait for udev to create both a device node and all the device symlinks. This process takes a variable length of time and depends on factors such how many links must be created, the complexity of the rules, etc. Complicating the situation further it is not uncommon for udev to create and then remove a link multiple times while processing the udev rules. Given the above, the existing scheme of waiting for an expected partition to appear by name isn't 100% reliable. At this point udev may still remove and recreate think link resulting in the kernel modules being unable to open the device. In order to address this the zpool_label_disk_wait() function has been updated to use libudev. Until the registered system device acknowledges that it in fully initialized the function will wait. Once fully initialized all device links are checked and allowed to settle for 50ms. This makes it far more likely that all the device nodes will exist when the kernel modules need to open them. For systems without libudev an alternate zpool_label_disk_wait() was updated to include a settle time. In addition, the kernel modules were updated to include retry logic for this ENOENT case. Due to the improved checks in the utilities it is unlikely this logic will be invoked. However, if the rare event it is needed it will prevent a failure. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #4523 Closes #3708 Closes #4077 Closes #4144 Closes #4214 Closes #4517
* Create unique partition labelsBrian Behlendorf2016-04-252-1/+28
| | | | | | | | | | | | | | | | | | | | When partitioning a device a name may be specified for each partition. Internally zfs doesn't use this partition name for anything so it has always just been set to "zfs". However this isn't optimal because udev will create symlinks using this name in /dev/disk/by-partlabel/. If the name isn't unique then all the links cannot be created. Therefore a random 64-bit value has been added to the partition label, i.e "zfs-1234567890abcdef". Additional information could be encoded here but since partitions may be reused that might result in confusion and it was decided against. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes #4517
* fix booting via dracut generated initramfsMatthew Thode2016-04-257-2/+216
| | | | | | | | | | | | | | | Dracut and Systemd updated how they integrate with each other, because of this our current integrations stopped working (around the time 4.1.13 came out). This patch addresses that issue and gets us booting again. Thanks to @Rudd-O for doing the work to get dracut working again and letting me submit this on his behalf. Signed-off-by: Manuel Amador (Rudd-O) <[email protected]> Signed-off-by: Matthew Thode <[email protected]> Closes #3605 Closes #4478
* Linux 4.5 compat: Use xattr_handler->name for aclChunwei Chen2016-04-253-20/+70
| | | | | | | | | | | | | | Linux 4.5 added member "name" to xattr_handler. xattr_handler which matches to whole name rather than prefix should use "name" instead of "prefix". Otherwise, kernel will return with EINVAL when it tries to resolve handlers. Also, we remove the strcmp checks when xattr_handler has name, because xattr_resolve_name will do the check for us. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4549 Closes #4537
* Add pn_alloc()/pn_free() functionsBrian Behlendorf2016-04-2110-32/+178
| | | | | | | | | | | | | | In order to remove the HAVE_PN_UTILS wrappers the pn_alloc() and pn_free() functions must be implemented. The existing illumos implementation were used for this purpose. The `flags` argument which was used in places wrapped by the HAVE_PN_UTILS condition has beed added back to zfs_remove() and zfs_link() functions. This removes a small point of divergence between the ZoL code and upstream. Signed-off-by: Brian Behlendorf <[email protected]> Closes #4522
* Rework zpool import excluded devices checkNikolay Borisov2016-04-181-26/+16
| | | | | | | | | | | | | | | | | Current zpool import code skips directory entries which have prefixes similar to some system files on linux such as "fd", "core" etc. However, this means one cannot have one's zpools hosted inside files which are named e.g. core-1 or lp. Furthermore, apart from the string checks there is already which makes the zpool_open_func work only with regular files and block devices. To fix this problem remove most of the checks since they are redundant but leave the checks for the 'hpet' and 'watchdog' names. Furthermore, change the checks to strcmp which albeit less safe than strncmp allows to have devices whose names are prefixed by 'hpet' or 'watchdog'. Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4438
* Fix ZPL miswrite of default POSIX ACLNed Bass2016-04-184-3/+65
| | | | | | | | | | | | Commit 4967a3e introduced a typo that caused the ZPL to store the intended default ACL as an access ACL. Due to caching this problem may not become visible until the filesystem is remounted or the inode is evicted from the cache. Fix the typo and add a regression test. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #4520
* Fix inverted logic on none elevator comparisonColin Ian King2016-04-151-1/+1
| | | | | | | | | | Commit d1d7e2689db9e03f1 ("cstyle: Resolve C style issues") inverted the logic on the none elevator comparison. Fix this and make it cstyle warning clean. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4507
* remove sanity check in replacement testJinshan Xiong2016-04-133-9/+0
| | | | | | | | | | | | | In replacement test, it spawns a process to truncate a file background and make sure that the process exists 1 second later. However, the process may have finished its work and exited therefore it has the chance to report a false alarm. This patch just removed those sanity check. Signed-off-by: Jinshan Xiong <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4516
* Make zfs test easier to run in local installJinshan Xiong2016-04-123-22/+23
| | | | | | | | | | | | | | When ZFS is installed by 'make install', programs will be installed into '/usr/local'. ZFS test scripts can't locate programs 'zpool' that caused tests failure. Fix typo in help message. Add sanity check to for ksh and generate a useful error message. Signed-off-by: Jinshan Xiong <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4495
* Add zfs-tests for relatimeChunwei Chen2016-04-054-2/+75
| | | | | | | | | | Add atime_003_pos to test relatime=on, we do check_atime_updated twice, the first time should success and the second time should fail. We also modify atime_001_pos to do check_atime_updated twice and both times should succeed. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4482
* Make zfs mount according to relatime config in datasetChunwei Chen2016-04-053-3/+15
| | | | | | | | Also enable lazytime in mount.zfs Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4482
* Enable lazytime semantic for atimeChunwei Chen2016-04-053-72/+110
| | | | | | | | | | | | | | | | | | | | | | Linux 4.0 introduces lazytime. The idea is that when we update the atime, we delay writing it to disk for as long as it is reasonably possible. When lazytime is enabled, dirty_inode will be called with only I_DIRTY_TIME flag whenever i_atime is updated. So under such condition, we will set z_atime_dirty. We will only write it to disk if file is closed, inode is evicted or setattr is called. Ideally, we should also write it whenever SA is going to be updated, but it is left for future improvement. There's one thing that we should take care of now that we allow i_atime to be dirty. In original implementation, whenever SA is modified, zfs_inode_update will be called to overwrite every thing in inode. This will cause dirty i_atime to be discarded. We fix this by don't overwrite i_atime in zfs_inode_update. We only overwrite i_atime when allocating new inode or doing zfs_rezget with zfs_inode_update_new. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4482
* Fix atime handling and relatimeChunwei Chen2016-04-059-108/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem for atime: We have 3 places for atime: inode->i_atime, znode->z_atime and SA. And its handling is a mess. A huge part of mess regarding atime comes from zfs_tstamp_update_setup, zfs_inode_update, and zfs_getattr, which behave inconsistently with those three values. zfs_tstamp_update_setup clears z_atime_dirty unconditionally as long as you don't pass ATTR_ATIME. Which means every write(2) operation which only updates ctime and mtime will cause atime changes to not be written to disk. Also zfs_inode_update from write(2) will replace inode->i_atime with what's inside SA(stale). But doesn't touch z_atime. So after read(2) and write(2). You'll have i_atime(stale), z_atime(new), SA(stale) and z_atime_dirty=0. Now, if you do stat(2), zfs_getattr will actually replace i_atime with what's inside, z_atime. So you will have now you'll have i_atime(new), z_atime(new), SA(stale) and z_atime_dirty=0. These will all gone after umount. And you'll leave with a stale atime. The problem for relatime: We do have a relatime config inside ZFS dataset, but how it should interact with the mount flag MS_RELATIME is not well defined. It seems it wanted relatime mount option to override the dataset config by showing it as temporary in `zfs get`. But at the same time, `zfs set relatime=on|off` would also seems to want to override the mount option. Not to mention that MS_RELATIME flag is actually never passed into ZFS, so it never really worked. How Linux handles atime: The Linux kernel actually handles atime completely in VFS, except for writing it to disk. So if we remove the atime handling in ZFS, things would just work, no matter it's strictatime, relatime, noatime, or even O_NOATIME. And whenever VFS updates the i_atime, it will notify the underlying filesystem via sb->dirty_inode(). And also there's one thing to note about atime flags like MS_RELATIME and other flags like MS_NODEV, etc. They are mount point flags rather than filesystem(sb) flags. Since native linux filesystem can be mounted at multiple places at the same time, they can all have different atime settings. So these flags are never passed down to filesystem drivers. What this patch tries to do: We remove znode->z_atime, since we won't gain anything from it. We remove most of the atime handling and leave it to VFS. The only thing we do with atime is to write it when dirty_inode() or setattr() is called. We also add file_accessed() in zpl_read() since it's not provided in vfs_read(). After this patch, only the MS_RELATIME flag will have effect. The setting in dataset won't do anything. We will make zfstuil to mount ZFS with MS_RELATIME set according to the setting in dataset in future patch. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4482
* Linux 4.6 compat: PAGE_CACHE_SIZE removalBrian Behlendorf2016-04-052-24/+23
| | | | | | | | | | | | | | | | | As described in torvalds/linux@4a2d057e the macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were originally introduced to make it possible to add bigger chunks to the page cache. This never panned out and it has therefore been removed from the kernel. ZFS has been updated to use the PAGE_{SIZE,SHIFT,MASK,ALIGN} macros and calls to page_cache_release() have been replaced with put_page(). There was no need to introduce a configure check for this because these interfaces have existed for a very long time. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #4489
* Fix WANT_DEVNAME2DEVID configure errorBrian Behlendorf2016-04-012-6/+7
| | | | | | | | | | Accidentally introduced by commit e4023e4. The AM_CONDITIONAL cannot be located where it can be invoked conditionally, as in the `--with-config=user` case. Relocate it to the top level ZFS_AC_CONFIG macro along with the other AM_CONDITIONALs. Signed-off-by: Brian Behlendorf <[email protected]> Issue #4416