summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Revert "Revert "Fix unlink/xattr deadlock""Brian Behlendorf2013-01-172-55/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 53c7411919a64d6f0889aa0d6974610f6cd35744 effectively reinstating the asynchronous xattr cleanup code. These Linux changes were reverted because after testing and careful contemplation I was convinced that due to the 89260a1c8851ce05ea04b23606ba438b271d890 commit they were no longer required. Unfortunately, the deadlock described in #1176 was a case which wasn't considered. At mount zfs_unlinked_drain() can occur which will unlink a list of znodes in effectively a random order which isn't safe. The only reason it was safe to originally revert this change was the we could guarantee that the VFS would always prune the xattr leaves before the parents. Therefore, until we can cleanly resolve this deadlock for all cases we need to keep this change in spite of the xattr unlink performance penalty associated with it. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1176 Issue #457
* Fix 'zfs rollback' on mounted file systemsBrian Behlendorf2013-01-179-42/+144
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rolling back a mounted filesystem with open file handles and cached dentries+inodes never worked properly in ZoL. The major issue was that Linux provides no easy mechanism for modules to invalidate the inode cache for a file system. Because of this it was possible that an inode from the previous filesystem would not get properly dropped from the cache during rolling back. Then a new inode with the same inode number would be create and collide with the existing cached inode. Ideally this would trigger an VERIFY() but in practice the error wasn't handled and it would just NULL reference. Luckily, this issue can be resolved by sprucing up the existing Solaris zfs_rezget() functionality for the Linux VFS. The way it works now is that when a file system is rolled back all the cached inodes will be traversed and refetched from disk. If a version of the cached inode exists on disk the in-core copy will be updated accordingly. If there is no match for that object on disk it will be unhashed from the inode cache and marked as stale. This will effectively make the inode unfindable for lookups allowing the inode number to be immediately recycled. The inode will then only be accessible from the cached dentries. Subsequent dentry lookups which reference a stale inode will result in the dentry being invalidated. Once invalidated the dentry will drop its reference on the inode allowing it to be safely pruned from the cache. Special care is taken for negative dentries since they do not reference any inode. These dentires will be invalidate based on when they were added to the dentry cache. Entries added before the last rollback will be invalidate to prevent them from masking real files in the dataset. Two nice side effects of this fix are: * Removes the dependency on spl_invalidate_inodes(), it can now be safely removed from the SPL when we choose to do so. * zfs_znode_alloc() no longer requires a dentry to be passed. This effectively reverts this portition of the code to its upstream counterpart. The dentry is not instantiated more correctly in the Linux ZPL layer. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #795
* Fix false ENOENT on snapshot control dentriesNed Bass2013-01-164-58/+158
| | | | | | | | | | | | | | | | | | | | | | Lookups in the snapshot control directory for an existing snapshot fail with ENOENT if an earlier lookup failed before the snapshot was created. This is because the earlier lookup causes a negative dentry to be cached which is never invalidated. The bug can be reproduced as follows (the second ls should succeed): $ ls /tank/.zfs/snapshot/s ls: cannot access /tank/.zfs/snapshot/s: No such file or directory $ zfs snap tank@s $ ls /tank/.zfs/snapshot/s ls: cannot access /tank/.zfs/snapshot/s: No such file or directory To remedy this, always invalidate cached dentries in the snapshot control directory. Since these entries never exist on disk there is no significant performance penalty for the extra lookups. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1192
* Fix quoting error in unmount commandNed Bass2013-01-161-1/+1
| | | | | | | | | A misplaced single quote caused the umount command to fail with a syntax error when unmounting snapshots under the .zfs/snapshot control directory. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1210
* Ensure that zfs diff prints unicode safely.Darik Horn2013-01-161-1/+1
| | | | | | | | | In the stream_bytes() library function used by `zfs diff`, explicitly cast each byte in the input string to an unsigned character so that the Linux fprintf() correctly escapes to octal and does not mangle the output. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1172
* Illumos #3189 kernel panic in test hotspare_onoffline_004_negChristopher Siden2013-01-141-1/+1
| | | | | | | | | | | | | | | 3189 kernel panic in ZFS test suite during hotspare_onoffline_004_neg Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Arne Jansen <[email protected]> Approved by: Dan McDonald <[email protected]> References: illumos/illumos-gate@8f0b538d1dc99df23a6a89cfd9ffddc1b9804a00 changeset: 13818:e9ad0a945d45 https://www.illumos.org/issues/3189 Ported-by: Brian Behlendorf <[email protected]>
* Illumos #1862 incremental zfs receive fails for sparse file > 8PBArne Jansen2013-01-141-15/+27
| | | | | | | | | | | | | | | 1862 incremental zfs receive fails for sparse file > 8PB Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Simon Klinkert <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos/illumos-gate@31495a1e56860f4575614774a592fe33fc9c71f2 illumos changeset: 13789:f0c17d471b7a https://www.illumos.org/issues/1862 Ported-by: Brian Behlendorf <[email protected]>
* Illumos #3208 cross-endian incorrect user/group accountingMatthew Ahrens2013-01-142-10/+27
| | | | | | | | | | | | | | | | | | 3208 moving zpool cross-endian results in incorrect user/group accounting Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Christopher Siden <[email protected]> Approved by: Richard Lowe <[email protected]> References: illumos/illumos-gate@e828a46d29ad418487f50d56b5c19e2a1f9033a7 illumos changeset: 13835:eea81edc4f14 https://www.illumos.org/issues/3208 Ported-by: Brian Behlendorf <[email protected]> Closes #627 Closes #1136
* Illumos #3397, #3398Christopher Siden2013-01-111-9/+17
| | | | | | | | | | | | | | | | | | | 3397 zdb <pool> <objnum> output is too verbose 3398 zdb can't dump feature flags zap objects Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Eric Schrock <[email protected]> Reviewed by: Richard Lowe <[email protected]> Approved by: Dan McDonald <[email protected]> References: illumos/illumos-gate@e690fb27a7d1483f052505e1ff373d205f9dee99 https://www.illumos.org/issues/3397 https://www.illumos.org/issues/3398 Ported-by: Brian Behlendorf <[email protected]>
* Illumos #1884, #3028, #3048, #3049, #3060, #3061, #3093Yuri Pankov2013-01-112-425/+430
| | | | | | | | | | | | | | | | | | | | | | | | | | | | 1884 Empty "used" field for zfs *space commands 3028 zfs {group,user}space -n prints (null) instead of numeric GID/UID 3048 zfs {user,group}space [-s|-S] is broken 3049 zfs {user,group}space -t doesn't really filter the results 3060 zfs {user,group}space -H output isn't tab-delimited 3061 zfs {user,group}space -o doesn't use specified fields order 3093 zfs {user,group}space's -i is noop Reviewed by: Garry Mills <[email protected]> Reviewed by: Eric Schrock <[email protected]> Approved by: Richard Lowe <[email protected]> References: illumos/illumos-gate@89f5d17b06fc4132c983112b24836a779a0ed736 illumos changeset: 13803:b5e49d71ff0e https://www.illumos.org/issues/1884 https://www.illumos.org/issues/3028 https://www.illumos.org/issues/3048 https://www.illumos.org/issues/3049 https://www.illumos.org/issues/3060 https://www.illumos.org/issues/3061 https://www.illumos.org/issues/3093 Ported-by: Brian Behlendorf <[email protected]> Closes #1194
* Illumos #1377 `zpool status -D' should tell if there are no DDT entriesYuri Pankov2013-01-111-2/+8
| | | | | | | | | | | | | | | | 1337 `zpool status -D' should tell if there are no DDT entries Reviewed by: Eric Schrock <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Albert Lee <[email protected]> References: illumos/illumos-gate@ce72e614c133351311e87bbbe4eba8fea9e77768 illumos changeset: 13432:d1ad8d106d64 https://www.illumos.org/issues/1337 Ported-by: Brian Behlendorf <[email protected]>
* Illumos #1557 assertion failed in userland taskq_destroy()Garrett D'Amore2013-01-111-3/+2
| | | | | | | | | | | | | | | 1557 assertion failed in userland taskq_destroy() Reviewed by: Richard Lowe <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos/illumos-gate@aa846ad9bc4785806bb6263657698d5890afbc08 illumos changeset: 13597:3eac1e8e0f4c https://www.illumos.org/issues/1557 Ported-by: Brian Behlendorf <[email protected]>
* Illumos #2618 arc.c mistypes in the commentsBart Coddens2013-01-111-4/+4
| | | | | | | | | | | | | | | 2618 arc.c mistypes in the comments Reviewed by: Jason King <[email protected]> Reviewed by: Josef Sipek <[email protected]> Approved by: Richard Lowe <[email protected]> References: illumos/illumos-gate@fc98fea58e89224f6f13d7fae246d6cb5dfa35ea illumos changeset: 13721:5b51a16a186f https://www.illumos.org/issues/2618 Ported-by: Brian Behlendorf <[email protected]>
* Only use gcc -Wunused-but-set-variable when availableBrian Behlendorf2013-01-1020-20/+20
| | | | | | | | | | | | | | | | | | | Certain versions of gcc generate an 'unrecognized command line option' error message when -Wunused-but-set-variable is used unconditionally. This in turn can cause several of the autoconf tests to misdetect an interface. Now, the use of -Wunused-but-set-variable in the autoconf tests was introduced by commit b9c59ec8 to address a gcc 4.6 compatibility problem. So we really only need to pass this option for version of gcc which are known to support it. Therefore, the tests have been updated to use the result of the existing ZFS_AC_CONFIG_ALWAYS_NO_UNUSED_BUT_SET_VARIABLE which determines if gcc supports this option. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1004
* 'zfs send' man page sync'ed with IllumosSteven Burgess2013-01-101-14/+34
| | | | | | | | | | | | | | | | | * Move -R option up one position in the list to match the Illumos documentation. * Move -D option up one position and refreshed it to match the Illumos documentation. * Move -p option up one position and refreshed it to match the Illumos documentation. * Add the -n, -P documentation found in zfs receive in to zfs send where to belongs. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1187
* 'zfs receive' man page sync'ed with IllumosSteven Burgess2013-01-101-61/+0
| | | | | | | | | | The only valid options are -vnFu, these other ones seem to be misplaced zfs send options. Remove: -D -r -p -n -P Signed-off-by: Brian Behlendorf <[email protected]> Closes #1186
* Add /sbin/fsck.zfs helperBrian Behlendorf2013-01-094-2/+13
| | | | | | | | | | A fsck helper to accomidate distributions that expect to be able to execute a fsck on all filesystem types. Currently this script does nothing but it could be extended to act as a compatibility wrapper for 'zpool scrub'. Signed-off-by: Brian Behlendorf <[email protected]> Closes #964
* Report realpath() canonicalization errorBrian Behlendorf2013-01-091-1/+2
| | | | | | | | Rather than just reporting the failure include the passed mount point and error number. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1153
* call_usermodehelper() should wait for processNed Bass2013-01-093-4/+4
| | | | | | | | | | | | | | | | As of Linux 3.4 the UMH_WAIT_* constants were renumbered. In particular, the meaning of "1" changed from UMH_WAIT_PROC (wait for process to complete), to UMH_WAIT_EXEC (wait for the exec, but not the process). A number of call sites used the number 1 instead of the constant name, so the behavior was not as expected on kernels with this change. One visible consequence of this change was that processes accessing automounted snapshots received an ELOOP error because they failed to wait for zfs.mount to complete. Signed-off-by: Brian Behlendorf <[email protected]> Closes #816
* Revert "Avoid ELOOP on auto-mounted snapshots"Brian Behlendorf2013-01-091-7/+0
| | | | | | | | | | | | This reverts commit 7afcf5b1da83549bfba70a61fae7a00eaa63c2b0 which accidentally introduced a regression with the .zfs snapshot directory. While the updated code still does correctly mount the requested snapshot. It updates the vfsmount such that it references the original dataset vfsmount. The result is that the snapshot itself isn't visible. Signed-off-by: Brian Behlendorf <[email protected]> Issue #816
* Only reduce __zio_execute() stack usage in kernel spaceBrian Behlendorf2013-01-091-0/+2
| | | | | | | | | Related to 91579709fccd3e55a21970742b66c388fb1403db we need to be very careful about not overrunning the stack in kernel space. However, in user space we're already allowing slightly larger stacks so this stack usage optimization is not required there. Signed-off-by: Brian Behlendorf <[email protected]>
* Merge branch 'feature-flags'Brian Behlendorf2013-01-0878-768/+5413
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Feature flags support for ZFS ported from Illumos. Only minimal compatibility changes were made where required to accomidate Linux. For a detailed description of feature flags see original proposal on zfs-discuss. They are conceptually very similar to Linux's ext[234] style of feature flags. http://lists.freebsd.org/pipermail/freebsd-fs/2011-May/011568.html NOTE: This branch updates the default pool version for new pools from 28 to 5000. Version 28 pools may still be created for compatibility with Solaris by using the '-o version=28' option. $ zpool create -o version=28 ... Existing pools must be manually upgraded using 'zpool upgrade'. $ zpool upgrade ... Signed-off-by: Brian Behlendorf <[email protected]> Closes #778
| * Illumos #3145, #3212George Wilson2013-01-084-2/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 3145 single-copy arc 3212 ztest: race condition between vdev_online() and spa_vdev_remove() Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Eric Schrock <[email protected]> Reviewed by: Justin T. Gibbs <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos-gate/commit/9253d63df408bb48584e0b1abfcc24ef2472382e illumos changeset: 13840:97fd5cdf328a https://www.illumos.org/issues/3145 https://www.illumos.org/issues/3212 Ported-by: Brian Behlendorf <[email protected]> Closes #989 Closes #1137
| * Illumos #3104: eliminate empty bpobjsMatthew Ahrens2013-01-0812-11/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 3104 eliminate empty bpobjs Reviewed by: George Wilson <[email protected]> Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos/illumos-gate@f17457368189aa911f774c38c1f21875a568bdca illumos changeset: 13782:8f78aae28a63 https://www.illumos.org/issues/3104 Ported-by: Brian Behlendorf <[email protected]>
| * Fix __zio_execute() asynchronous dispatchBrian Behlendorf2013-01-081-9/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | To save valuable stack all zio's were made asynchronous when in the tgx_sync_thread context or during pool initialization. See commit 2fac4c2 for the original patch and motivation. Unfortuantely, the changes to dsl_pool_sync_context() made by the feature flags broke this logic causing in __zio_execute() to dispatch itself infinitely when called during pool initialization. This commit refines the existing logic to specificly target only the two cases we care about. Signed-off-by: Brian Behlendorf <[email protected]>
| * Illumos #3349: zpool upgrade -V bumps the on disk version numberGeorge Wilson2013-01-082-21/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 3349 zpool upgrade -V bumps the on disk version number, but leaves the in core version Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Richard Lowe <[email protected]> Approved by: Dan McDonald <[email protected]> References: illumos/illumos-gate@25345e466695fbe736faa53b8f3413d8e8f81981 https://www.illumos.org/issues/3349 Ported-by: Brian Behlendorf <[email protected]>
| * Illumos #3086: unnecessarily setting DS_FLAG_INCONSISTENT on asyncMatthew Ahrens2013-01-0812-115/+209
| | | | | | | | | | | | | | | | | | | | | | | | | | | | 3086 unnecessarily setting DS_FLAG_INCONSISTENT on async destroyed datasets Reviewed by: Christopher Siden <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos/illumos-gate@ce636f8b38e8c9ff484e880d9abb27251a882860 illumos changeset: 13776:cd512c80fd75 https://www.illumos.org/issues/3086 Ported-by: Brian Behlendorf <[email protected]>
| * Illumos #2762: zpool command should have better support for feature flagsChristopher Siden2013-01-089-99/+388
| | | | | | | | | | | | | | | | | | | | | | | | | | 2762 zpool command should have better support for feature flags Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos/illumos-gate@57221772c3fc05faba04bf48ddff45abf2bbf2bd https://www.illumos.org/issues/2762 Ported-by: Brian Behlendorf <[email protected]>
| * Illumos #3090 and #3102George Wilson2013-01-089-100/+188
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 3090 vdev_reopen() during reguid causes vdev to be treated as corrupt 3102 vdev_uberblock_load() and vdev_validate() may read the wrong label Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos/illumos-gate@dfbb943217bf8ab22a1a9d2e9dca01d4da95ee0b illumos changeset: 13777:b1e53580146d https://www.illumos.org/issues/3090 https://www.illumos.org/issues/3102 Ported-by: Brian Behlendorf <[email protected]> Closes #939
| * Revert "Temporarily disable the reguid test."Brian Behlendorf2013-01-081-6/+0
| | | | | | | | | | | | | | | | | | This reverts commit d13524579162b35189804c357a63993be758b84c. Since feature flags have now been merged we can apply the real upstream fix from Illumos. Signed-off-by: Brian Behlendorf <[email protected]> Issue #997
| * Illumos #2619 and #2747Christopher Siden2013-01-0867-462/+4262
|/ | | | | | | | | | | | | | | | | | | | | | 2619 asynchronous destruction of ZFS file systems 2747 SPA versioning with zfs feature flags Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Richard Lowe <[email protected]> Reviewed by: Dan Kruchinin <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos/illumos-gate@53089ab7c84db6fb76c16ca50076c147cda11757 illumos/illumos-gate@ad135b5d644628e791c3188a6ecbd9c257961ef8 illumos changeset: 13700:2889e2596bd6 https://www.illumos.org/issues/2619 https://www.illumos.org/issues/2747 NOTE: The grub specific changes were not ported. This change must be made to the Linux grub packages. Ported-by: Brian Behlendorf <[email protected]>
* Fix duplicate words in zpool.8Dominik Honnef2013-01-071-1/+1
| | | | | | | Remove the duplicate words 'cannot be' from the zpool.8 man page. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1177
* Allow fake mounts to succeed on non-legacy filesystems.Will Rouesnel2013-01-071-1/+2
| | | | | | | | | | | | | mountall in Debian depends on being able to pass the -f parameter to mount, which specifies a fake mount and just updates the mtab. Currently mount.zfs will fail such a request if it is not passed with -o zfsutil. This patch allows a fake mount on a non-legacy filesystem to succeed in the same manner as a -o remount does, thus enabling mountall to work correctly. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1167
* Fix gcc array subscript above bounds warningNed Bass2013-01-071-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | In a debug build, certain GCC versions flag an array bounds warning in the below code from dnode_sync.c } else { int i; ASSERT(dn->dn_next_nblkptr[txgoff] < dnp->dn_nblkptr); /* the blkptrs we are losing better be unallocated */ for (i = dn->dn_next_nblkptr[txgoff]; i < dnp->dn_nblkptr; i++) ASSERT(BP_IS_HOLE(&dnp->dn_blkptr[i])); This usage is in fact safe, since the ASSERT ensures the index does not exceed to maximum possible number of block pointers. However gcc can't determine that the assignment 'i = dn->dn_next_nblkptr[txgoff];' falls within the array bounds so it issues a warning. To avoid this, initialize i to zero to make gcc happy but skip the elements before dn->dn_next_nblkptr[txgoff] in the loop body. Since a dnode contains at most 3 block pointers this overhead should be negligible. Signed-off-by: Brian Behlendorf <[email protected]> Closes #950
* Merge branch 'io_schedule'Brian Behlendorf2013-01-072-22/+11
|\ | | | | | | | | | | | | | | | | | | | | Currently ZFS doesn't show any I/O time in eg "top" wait% or in /proc/$pid/stat's blkio_ticks. Using io_schedule() instead of schedule() in zio_wait()'s cv_wait() is the correct way to fix this. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1158 Closes #1175
| * Use cv_wait_io() which will will account for iowaitMatt Johnston2013-01-072-1/+2
| | | | | | | | | | | | | | Update zio_wait() to use cv_wait_io() to ensure the iowait time is properly accounted for. Signed-off-by: Brian Behlendorf <[email protected]>
| * Revert part of "Log I/Os longer than zio_delay_max (30s default)"Matt Johnston2013-01-071-22/+10
|/ | | | | | | | | | | | | | | | | | | | | | | This reverts commit 9dcb97198338ba2d8764dd5604b278118612f74 which was originally introduced to debug occasional slow I/Os. These I/Os would complete eventually but were observed to take several 100 seconds. The root cause of this issue was the CFQ scheduler which can, under certain conditions, excessively delay an I/O from being issued to the device. This issue was mitigated somewhat by commit 84daaddedbfc9cf4bd1490d8a6f4b2967051e308 which ensures the I/O elevator gets changed even for DM style devices. This change isn't in any way harmful but it does conflict with a required change to properly account from I/O wait time. Because Linux does not export the io_schedule_timeout() function we must instead rely on io_schedule() via cv_wait_io(). The additional debugging information which was added to the delay event has been intentionally left in place. Signed-off-by: Brian Behlendorf <[email protected]>
* ZFS 0.6.0-rc13zfs-0.6.0-rc13Brian Behlendorf2012-12-201-1/+1
|
* Fix zpool on zvol lock inversion deadlockBrian Behlendorf2012-12-201-0/+28
| | | | | | | | | | | | | | | | | | | | | | | In all but one case the spa_namespace_lock is taken before the bdev->bd_mutex lock. But Linux __blkdev_get() function calls fops->open() with the bdev->bd_mutex lock held and we must somehow still safely acquire the spa_namespace_lock. To avoid a potential lock inversion deadlock we preemptively try to take the spa_namespace_lock(). Normally it will not be contended and this is safe because spa_open_common() handles the case where the caller already holds the spa_namespace_lock. When it is contended we risk a lock inversion if we were to block waiting for the lock. Luckily, the __blkdev_get() function allows us to return -ERESTARTSYS which will result in bdev->bd_mutex being dropped, reacquired, and fops->open() being called again. This process can be repeated safely until both locks are acquired. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #612
* Revert "Remove TSD zfs_fsyncer_key"Brian Behlendorf2012-12-204-1/+16
| | | | | | | | This reverts commit 31f2b5abdf95d8426d8bfd66ca7f62ec70215e3c back to the original code until the fsync(2) performance regression can be addressed. Signed-off-by: Brian Behlendorf <[email protected]>
* Refresh AUTHORSBrian Behlendorf2012-12-191-10/+66
| | | | | | | The AUTHORS file was getting stale. Refresh its contents using the authors listed in the git commit logs. Signed-off-by: Brian Behlendorf <[email protected]>
* Remove the ChangeLogBrian Behlendorf2012-12-192-577/+1
| | | | | | | The ChangeLog was retired long ago, the git commit logs are authoritative. To avoid any confusion remove the ChangeLog. Signed-off-by: Brian Behlendorf <[email protected]>
* Remove TSD zfs_fsyncer_keyBrian Behlendorf2012-12-194-16/+1
| | | | | | | | | | | | | | | | | | | | It's my understanding that the zfs_fsyncer_key TSD was added as a performance omtimization to reduce contention on the zl_lock from zil_commit(). This issue manifested itself as very long (100+ms) fsync() system call times for fsync() heavy workloads. However, under Linux I'm not seeing the same contention that was originally described. Therefore, I'm removing this code in order to ween ourselves off any dependence on TSD. If the original performance issue reappears on Linux we can revisit fixing it without resorting to TSD. This just leaves one small ZFS TSD consumer. If it can be cleanly removed from the code we'll be able to shed the SPL TSD implementation entirely. Signed-off-by: Brian Behlendorf <[email protected]> Closes zfsonlinux/spl#174
* Set elevator for DM devices despite vdev_wholediskPrakash Surya2012-12-181-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current state of udev and devicer-mapper devices makes it difficult to construct a mapping of DM partitions and their underlying DM device. For example, with a /dev directory with the following contents: $ ls -d /dev/dm-* /dev/dm-0 /dev/dm-1 /dev/dm-2 /dev/dm-3 it is not immediately apparent if these are completely separate devices, or partitions and real devices intermixed. In contrast, SCSI devices would appear as so: $ ls -d /dev/sd* /dev/sda /dev/sda1 /dev/sdb /dev/sdb1 Here, one can immediately determine that there are two devices (sda and sdb), each containing a single partition. The lack of a predictable and consistent mapping from DM devices to DM device partitions makes it difficult for user space to process these devices the same way it does SCSI devices. As a result, the ZFS utilities do not partition DM devices, and instead set the "vdev_wholedisk" label to 0 and treat them as partitions. This has the side effect that, even if ZFS has sole ownership of the device, the IO scheduler will not be modified because it is treated as a partition. This change adds an exception for DM devices in vdev_elevator_switch, allowing the elevator to be modified even though the "vdev_wholedisk" property is not set. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1149
* Fix using zvol as slog deviceJorgen Lundman2012-12-184-14/+31
| | | | | | | | | | | | | | | During the original ZoL port the vdev_uses_zvols() function was disabled until it could be properly implemented. This prevented a zpool from use a zvol for its slog device. This patch implements that missing functionality by adding a zvol_is_zvol() function to zvol.c. Given the full path to a device it will lookup the device and verify its major number against the registered zvol major number for the system. If they match we know the device is a zvol. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1131
* Fix get/set users/groups in quota props via numeric idMassimo Maggi2012-12-171-7/+7
| | | | | | | | | | | | | | | Fix setting/getting users/groups in quota properties through numeric identifier. This support was accidentally disabled in the original port by applying the HAVE_IDMAP wrapper macro too broadly. Fix obtained by moving #ifdef HAVE_IDMAP to exclude only the part of code that really needs IDMAP. Now zfs (get|set) (user|group)quota@1000 works as expected. Signed-off-by: Massimo Maggi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1147
* Do not use KERNEL_DIR env var in Makefile.amRichard Yao2012-12-171-3/+3
| | | | | | | | | | | | | | A Gentoo user reported an issue where the build system would attempt to recurse into the kernel source tree if KERNEL_DIR is set in the environment. KERNEL_DIR is an environment variable that is used when the kernel sources are in a non-standard location, so it is necessary to stop relying on it to prevent this issue. https://bugs.gentoo.org/show_bug.cgi?id=433946 Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Update SAs when an inode is dirtiedBrian Behlendorf2012-12-145-23/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | Revert the portion of commit d3aa3ea which always resulted in the SAs being update when an mmap()'ed file was closed. That change accidentally resulted in unexpected ctime updates which upset tools like git. That was always a horrible hack and I'm happy it will never make it in to a tagged release. The right fix is something I initially resisted doing because I was worried about the additional overhead. However, in hindsight the overhead isn't as bad as I feared. This patch implemented the sops->dirty_inode() callback which is unsurprisingly called when an inode is dirtied. We leverage this callback to keep the znode SAs strictly in sync with the inode. However, for now we're going to go slowly to avoid introducing any new unexpected issues by only updating the atime, mtime, and ctime. This will cover the callpath of most concern to us. ->filemap_page_mkwrite->file_update_time->update_time-> mark_inode_dirty_sync->__mark_inode_dirty->dirty_inode Signed-off-by: Brian Behlendorf <[email protected]> Closes #764 Closes #1140
* Update 69-vdev.rules .gitignoreBrian Behlendorf2012-12-141-1/+1
| | | | | | | Commit 2957f38 renamed 60-vdev.rules to 69-vdev.rules but failed to update the .gitignore file to reflect this change. Signed-off-by: Brian Behlendorf <[email protected]>
* Avoid ELOOP on auto-mounted snapshotsNed Bass2012-12-131-0/+7
| | | | | | | | | | | | Ensure that the path member pointers are associated with the newly-mounted snapshot when zpl_snapdir_automount() returns. Otherwise the follow_automount() function may be called repeatedly, leading to an incorrect ELOOP error return. This problem was observed as a 'Too many levels of symbolic links' error from user-space commands accessing an unmounted snapshot in the .zfs/snapshot directory. Signed-off-by: Brian Behlendorf <[email protected]> Closes #816