openzfs/zfs.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Refresh links to web site	Ned Bass	2013-03-06	1	-1/+1
\| \| \| \| \| \| \| \|	A few files still refer to @behlendorf's private fork on github. Use the primary web site URL instead. Two typos are also corrected. Signed-off-by: Brian Behlendorf <[email protected]>
*	Add KMODDIR to install target	Brian Behlendorf	2013-03-06	1	-8/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	Provide a mechanism to control the directory name the modules are installed in. The kernel privdes INSTALL_MOD_DIR for this but it was hardcoded to be 'addon/zfs'. Add a KMODDIR variable which can be passed to 'make install' to override the default directory name. While we're here change the default from 'addon/zfs' to 'extra' which is the kernel.org default. Signed-off-by: Brian Behlendorf <[email protected]>
*	Add snapdev=[hidden\|visible] dataset property	Eric Dillmann	2013-03-05	3	-3/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new snapdev dataset property may be set to control the visibility of zvol snapshot devices. By default this value is set to 'hidden' which will prevent zvol snapshots from appearing under /dev/zvol/ and /dev/<dataset>/. When set to 'visible' all zvol snapshots for the dataset will be visible. This functionality was largely added because when automatic snapshoting is enabled large numbers of read-only zvol snapshots will be created. When creating these devices the kernel will attempt to read their partition tables, and blkid will attempt to identify any filesystems on those partitions. This leads to a variety of issues: 1) The zvol partition tables will be read in the context of the `modprobe zfs` for automatically imported pools. This is undesirable and should be done asynchronously, but for now reducing the number of visible devices helps. 2) Udev expects to be able to complete its work for a new block devices fairly quickly. When many zvol devices are added at the same time this is no longer be true. It can lead to udev timeouts and missing /dev/zvol links. 3) Simply having lots of devices in /dev/ can be aukward from a management standpoint. Hidding the devices your unlikely to ever use helps with this. Any snapshot device which is needed can be made visible by changing the snapdev property. NOTE: This patch changes the default behavior for zvols which was effectively 'snapdev=visible'. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1235 Closes #945 Issue #956 Issue #756
*	Merge zvol.c changes from PSARC 2010/306 Read-only ZFS pools	George Wilson	2013-03-04	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The changes to zvol.c were never merged from the last onnv_147 bulk update. This was because zvol.c was largely rewritten for Linux making it fairly easy to miss these sorts of changes. This causes a regression when importing a zpool with zvols read-only. This does not impact pool which only contain filesystem datasets. References: illumos/illumos-gate@f9af39b Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1332 Closes #1333
*	Constify structures containing function pointers	Richard Yao	2013-03-04	5	-43/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PaX team modified the kernel's modpost to report writeable function pointers as section mismatches because they are potential exploit targets. We could ignore the warnings, but their presence can obscure actual issues. Proper const correctness can also catch programming mistakes. Building the kernel modules against a PaX/GrSecurity patched Linux 3.4.2 kernel reports 133 section mismatches prior to this patch. This patch eliminates 130 of them. The quantity of writeable function pointers eliminated by constifying each structure is as follows: vdev_opts_t 52 zil_replay_func_t 24 zio_compress_info_t 24 zio_checksum_info_t 9 space_map_ops_t 7 arc_byteswap_func_t 5 The remaining 3 writeable function pointers cannot be addressed by this patch. 2 of them are in zpl_fs_type. The kernel's sget function requires that this be non-const. The final writeable function pointer is created by SPL_SHRINKER_DECLARE. The kernel's set_shrinker() and remove_shrinker() functions also require that this be non-const. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1300
*	Fix hot spares	Brian Behlendorf	2013-03-01	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The issue with hot spares in ZoL is because it opens all leaf vdevs exclusively (O_EXCL). On Linux, exclusive opens cause subsequent exclusive opens to fail with EBUSY. This could be resolved by not opening any of the devices exclusively, which is what Illumos does, but the additional protection offered by exclusive opens is desirable. It cleanly prevents you from accidentally adding an in-use non-ZFS device to your pool. To fix this we very slightly relaxed the usage of O_EXCL in the following ways. 1) Functions which open the device but only read had the O_EXCL flag removed and were updated to use O_RDONLY. 2) A common holder was added to the vdev disk code. This allow the ZFS code to internally open the device multiple times but non-ZFS callers may not. 3) An exception was added to make_disks() for hot spare when creating partition tables. For hot spare devices which are already opened exclusively we skip creating the partition table because this must already have been done when the disk was originally added as a hot spare. Additional minor changes include fixing check_in_use() to use a partition instead of a slice suffix. And is_spare() was moved above make_disks() to avoid adding a forward reference. Signed-off-by: Brian Behlendorf <[email protected]> Closes #250
*	Remove wholedisk check from vdev_disk_open()	Brian Behlendorf	2013-02-28	1	-14/+0
\| \| \| \| \| \| \| \| \|	As described by the comment and enforced the by assertion the v->vdev_wholedisk will never be -1. The wholedisk handling is performed by the user space utilities. To prevent confusion this dead code is being removed. Signed-off-by: Brian Behlendorf <[email protected]>
*	Leaf vdevs should not be reopened	Brian Behlendorf	2013-02-28	1	-3/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When vdev_disk.c was implemented for Linux we failed to handle the reopen case. According to the vdev_reopen() comment leaf vdevs should not be closed or opened when v->vdev_reopening is set. Under Linux we would always close and open the device. This issue was only noticed when a 'zpool scrub' command was run while the leaf vdev device names in /dev/disk/by-vdev were missing. The scrub command calls vdev_reopen() which caused the vdevs to be closed but they couldn't be reopened due to the missing links. The result was that all the vdevs were marked unavailable and the pool was halted due to failmode=wait. This patch adds the missing functionality in a similiar fashion to to the Illumos code. Signed-off-by: Brian Behlendorf <[email protected]>
*	Remove the bio_empty_barrier() check.	Etienne Dechamps	2013-02-24	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To determine whether the kernel is capable of handling empty barrier BIOs, we check for the presence of the bio_empty_barrier() macro, which was introduced in 2.6.24. If this macro is defined, then we can flush disk vdevs; if it isn't, then flushing is disabled. Unfortunately, the bio_empty_barrier() macro was removed in 2.6.37, even though the kernel is still capable of handling empty barrier BIOs. As a result, flushing is effectively disabled on kernels >= 2.6.37, meaning that starting from this kernel version, zfs doesn't use barriers to guarantee on-disk data consistency. This is quite bad and can lead to potential data corruption on power failures. This patch fixes the issue by removing the configure check for bio_empty_barrier(), as we don't support kernels <= 2.6.24 anymore. Thanks to Richard Kojedzinszky for catching this nasty bug. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1318
*	Enable zfs_arc_memory_throttle_disable by default	Brian Behlendorf	2013-02-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The zfs_arc_memory_throttle_disable module option was introduced by commit 0c5493d47059f25ce9dbf20c9fe87655f55102a1 to resolve a memory miscalculation which could result in the txg_sync thread spinning. When this was first introduced the default behavior was left unchanged until enough real world usage confirmed there were no unexpected issues. We've now reached that point. Linux's direct reclaim is working as expected so we're enabling this behavior by default. This helps pave the way to retire the spl_kmem_availrmem() functionality in the SPL layer. This was the only caller. Signed-off-by: Brian Behlendorf <[email protected]> Issue #938
*	Make spa.c assertions catch unsupported pre-feature flag pool versions	Richard Yao	2013-02-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	A couple of assertions in spa.c were designed to prevent the use of invalid pool versions. They were written under the assumption that all valid pools are less than SPA_VERSION. Since feature flags jumped from 28 to 5000, any numbers in the range 28 to 5000 non-inclusive will fail to trigger them. We switch to the new SPA_VERSION_IS_SUPPORTED macro to correct this. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1282
*	Add explicit MAXNAMELEN check	Brian Behlendorf	2013-02-12	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	It turns out that the Linux VFS doesn't strictly handle all cases where a component path name exceeds MAXNAMELEN. It does however appear to correctly handle MAXPATHLEN for us. The right way to handle this appears to be to add an explicit check to the zpl_lookup() function. Several in-tree filesystems handle this case the same way. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1279
*	Switch KM_SLEEP to KM_PUSHPAGE	Ned Bass	2013-02-06	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Two more locations where KM_SLEEP was used in a call which must use KM_PUSHPAGE were found while using the zpool upgrade command. See commit b8d06fc for additional details. Also make a small correction to the comment block above dsl_dir_open_spa(). Signed-off-by: Brian Behlendorf <[email protected]> Closes #1268
*	Cast 'zfs bad bloc' to ULL for x86	Brian Behlendorf	2013-02-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Explicitly case this value to an unsigned long long for 32-bit systems to inform the compiler that a long type should not be used. Otherwise we get the following compiler error: dmu_send.c:376: error: integer constant is too large for ‘long’ type Signed-off-by: Brian Behlendorf <[email protected]>
*	Add zfs_arc_memory_throttle_disable module option	Brian Behlendorf	2013-02-01	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The way in which virtual box ab(uses) memory can throw off the free memory calculation in arc_memory_throttle(). The result is the txg_sync thread will effectively spin waiting for memory to be released even though there's lots of memory on the system. To handle this case I'm adding a zfs_arc_memory_throttle_disable module option largely for virtual box users. Setting this option disables free memory checks which allows the txg_sync thread to make progress. By default this option is disabled to preserve the current behavior. However, because Linux supports direct memory reclaim it's doubtful throttling due to perceived memory pressure is ever a good idea. We should enable this option by default once we've done enough real world testing to convince ourselve there aren't any unexpected side effects. Signed-off-by: Brian Behlendorf <[email protected]> Closes #938
*	Add zfs_disable_dup_eviction module option	Brian Behlendorf	2013-02-01	1	-0/+3
\| \| \| \| \| \| \| \|	Commit 1eb5bfa introduced a new zfs_disable_dup_eviction tunable. It should have been made available as a module option in the original patch but was overlooked. Signed-off-by: Brian Behlendorf <[email protected]>
*	Fix mismatch between SA header size and layout	Ned Bass	2013-01-31	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a system attribute layout is created an inconsistency may occur between the system attribute header (sa_hdr_phys_t) size and the variable-sized attribute count stored in the layout. The inconsistency results in the following failed assertion when SA_HDR_SIZE_MATCH_LAYOUT returns false: SPLError: 11315:0:(sa.c:1541:sa_find_idx_tab()) ASSERTION((IS_SA_BONUSTYPE(bonustype) && SA_HDR_SIZE_MATCH_LAYOUT(hdr, tb)) \|\| !IS_SA_BONUSTYPE(bonustype) \|\| (IS_SA_BONUSTYPE(bonustype) && hdr->sa_layout_info == 0)) failed The bug originates in this snippet from sa_find_sizes(). if (is_var_sz && var_size > 1) { if (P2ROUNDUP(hdrsize + sizeof (uint16_t), *total < full_space) { hdrsize += sizeof (uint16_t); This assumes that the current variable-sized attribute will be stored in the current buffer and accounts for the space needed to store its size in the sa_hdr_phys_t. However if the next attribute spills over we need to store a blkptr_t at the end of the bonus buffer to point to the spill block. If the current attribute is in the way of the blkptr_t then it too will be relocated into the spill block. But since we've already accounted for it in the header size we get the inconsistency described above. To avoid this, record the index of the last variable-sized attribute that prompted a hdrsize increase, and reverse the increase if we later determine that that attribute will be relocated to the spill block. Signed-off-by: Matthew Ahrens <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1250
*	Fix rounding discrepancy in sa_find_sizes()	Ned Bass	2013-01-31	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A rounding discrepancy exists between how sa_build_layouts() and sa_find_sizes() calculate when the spill block needs to be kicked in. This results in a narrow size range where sa_build_layouts() believes there must be a spill block allocated but due to the discrepancy there isn't. A panic then occurs when the hdl->sa_spill NULL pointer is dereferenced. The following reproducer for this bug was isolated: truncate -s 128m /tmp/tank zpool create tank /tmp/tank zfs create -o xattr=sa tank/fish ln -s `perl -e 'print "z" x 41'` /tank/fish/z setfattr -hn trusted.foo -v`perl -e 'print "z"x45'` /tank/fish/z This test results in roughly the following system attribute (SA) layout: 176 bytes - "standard" SA's 41 bytes - name of symbolic link target 100 bytes - XDR encoded nvlist for xattr --- 317 bytes - total Because 317 is less than DN_MAX_BONUSLEN (320), sa_find_sizes() decides no spill block is needed. But sa_build_layouts() rounds 41 up to 48 when computing the space requirements so it tries to switch to the spill block. Note that we were only able to reproduce this bug using a combination of symbolic links and the Linux-specific xattr=sa dataset property. So while this issue is not technically Linux-specific, it may be difficult or impossible to hit the narrow size range needed to reproduce it on other platforms. To fix the discrepancy, round the running total in sa_find_sizes() up to an 8-byte boundary before accounting for each SA, since this is how they will be stored in the bonus and (possibly) spill buffers. To make the intent of the code more clear, explicitly assert key assumptions about expected alignment of data and whether spill-over will occur. Signed-off-by: Matthew Ahrens <[email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #1240
*	Illumos #3447 improve the comment in txg.c	Adam H. Leventhal	2013-01-30	1	-2/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	3447 improve the comment in txg.c Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Richard Lowe <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Dan McDonald <[email protected]> References: illumos/illumos-gate@adbbcfface63b3a71922d5a25d34a2018c0435de https://www.illumos.org/issues/3447 Ported-by: Brian Behlendorf <[email protected]>
*	Illumos #3035 LZ4 compression support in ZFS and GRUB	Eric Dillmann	2013-01-29	7	-1/+1128
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	3035 LZ4 compression support in ZFS and GRUB Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Christopher Siden <[email protected]> References: illumos/illumos-gate@a6f561b4aee75d0d028e7b36b151c8ed8a86bc76 https://www.illumos.org/issues/3035 http://wiki.illumos.org/display/illumos/LZ4+Compression+In+ZFS This patch has been slightly modified from the upstream Illumos version to be compatible with Linux. Due to the very limited stack space in the kernel a lz4 workspace kmem cache is used. Since we are using gcc we are also able to take advantage of the gcc optimized __builtin_ctz functions. Support for GRUB has been dropped from this patch. That code is available but those changes will need to made to the upstream GRUB package. Lastly, several hunks of dead code were dropped for clarity. They include the functions real_LZ4_uncompress(), LZ4_compressBound() and the Visual Studio specific hunks wrapped in _MSC_VER. Ported-by: Eric Dillmann <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1217
*	Avoid gcc -Werror=maybe-uninitialized warnings	Chris Wedgwood	2013-01-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Explicitly set acl details to zero to silence gcc (zfs_acl_node_read can't be sure zfs_acl_znode_info will set acl_count and aclsize). Normally suppressing these warnings by setting this to zero at declaration time is a bad idea but in this instance it's hard to avoid and should be fairly safe. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1244
*	Use dsl_dataset_snap_lookup()	Brian Behlendorf	2013-01-25	3	-34/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Retire the dmu_snapshot_id() function which was introduced in the initial .zfs control directory implementation. There is already an existing dsl_dataset_snap_lookup() which does exactly what we need, and the dmu_snapshot_id() function as implemented is racy. https://github.com/zfsonlinux/zfs/issues/1215#issuecomment-12579879 Signed-off-by: Brian Behlendorf <[email protected]> Closes #1238
*	Add d_clear_d_op() compatibility	Brian Behlendorf	2013-01-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Added d_clear_d_op() helper function which clears some flags and the registered dentry->d_op table. This is required because d_set_d_op() issues a warning when the dentry operations table is already set. For the .zfs control directory to work properly we must be able to override the default operations table and register custom .d_automount and .d_revalidate callbacks. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #1230
*	fzap_cursor_move_to_key() should drop l_rwlock	Ned Bass	2013-01-23	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Callers of zap_deref_leaf() must be careful to drop leaf->l_rwlock since that function returns with the lock held on success. All other callers drop the lock correctly but it seems fzap_cursor_move_to_key() does not. This may block writers or cause VERIFY failures when the lock is freed. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1215 Closes zfsonlinux/spl#143 Closes zfsonlinux/spl#97
*	Fix zpl_revalidate() NULL deref	Brian Behlendorf	2013-01-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	In zpl_revalidate() it's possible for the nameidata to be NULL for kernels which still accept the parameter. In particular, lookup_one_len() calls d_revalidate() with a NULL nameidata. Resolve the issue by checking for a NULL nameidata in which case just set the flags to 0. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1226
*	Use sb->s_d_op default dentry operations	Brian Behlendorf	2013-01-18	2	-11/+9
\| \| \| \| \| \| \| \| \| \|	As of Linux 2.6.37 the right way to register custom dentry operations is to use the super block's ->s_d_op field. For older kernels they should be registered as part of the lookup operation. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1223
*	Fix zpool on zvol deadlock	Massimo Maggi	2013-01-18	1	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 65d56083b4617a4cade0cff68cbbaf68114169d6 fixes the lock inversion between spa_namespace_lock and bdev->bd_mutex but only for the first user of spa_namespace_lock: dmu_objset_own(). Later spa_namespace_lock gets acquired by dsl_prop_get_integer() though dsl_prop_get()->dsl_dataset_hold()->dsl_dir_open_spa()-> spa_open()->spa_open_common() without this "protection". By moving the mutex release after this second use, even this acquisition of the lock is "protected" by the ERESTARTSYS trick. Signed-off-by: Massimo Maggi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1220
*	Revert "Revert "Fix unlink/xattr deadlock""	Brian Behlendorf	2013-01-17	2	-55/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 53c7411919a64d6f0889aa0d6974610f6cd35744 effectively reinstating the asynchronous xattr cleanup code. These Linux changes were reverted because after testing and careful contemplation I was convinced that due to the 89260a1c8851ce05ea04b23606ba438b271d890 commit they were no longer required. Unfortunately, the deadlock described in #1176 was a case which wasn't considered. At mount zfs_unlinked_drain() can occur which will unlink a list of znodes in effectively a random order which isn't safe. The only reason it was safe to originally revert this change was the we could guarantee that the VFS would always prune the xattr leaves before the parents. Therefore, until we can cleanly resolve this deadlock for all cases we need to keep this change in spite of the xattr unlink performance penalty associated with it. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1176 Issue #457
*	Fix 'zfs rollback' on mounted file systems	Brian Behlendorf	2013-01-17	5	-41/+131
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rolling back a mounted filesystem with open file handles and cached dentries+inodes never worked properly in ZoL. The major issue was that Linux provides no easy mechanism for modules to invalidate the inode cache for a file system. Because of this it was possible that an inode from the previous filesystem would not get properly dropped from the cache during rolling back. Then a new inode with the same inode number would be create and collide with the existing cached inode. Ideally this would trigger an VERIFY() but in practice the error wasn't handled and it would just NULL reference. Luckily, this issue can be resolved by sprucing up the existing Solaris zfs_rezget() functionality for the Linux VFS. The way it works now is that when a file system is rolled back all the cached inodes will be traversed and refetched from disk. If a version of the cached inode exists on disk the in-core copy will be updated accordingly. If there is no match for that object on disk it will be unhashed from the inode cache and marked as stale. This will effectively make the inode unfindable for lookups allowing the inode number to be immediately recycled. The inode will then only be accessible from the cached dentries. Subsequent dentry lookups which reference a stale inode will result in the dentry being invalidated. Once invalidated the dentry will drop its reference on the inode allowing it to be safely pruned from the cache. Special care is taken for negative dentries since they do not reference any inode. These dentires will be invalidate based on when they were added to the dentry cache. Entries added before the last rollback will be invalidate to prevent them from masking real files in the dataset. Two nice side effects of this fix are: * Removes the dependency on spl_invalidate_inodes(), it can now be safely removed from the SPL when we choose to do so. * zfs_znode_alloc() no longer requires a dentry to be passed. This effectively reverts this portition of the code to its upstream counterpart. The dentry is not instantiated more correctly in the Linux ZPL layer. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #795
*	Fix false ENOENT on snapshot control dentries	Ned Bass	2013-01-16	1	-58/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Lookups in the snapshot control directory for an existing snapshot fail with ENOENT if an earlier lookup failed before the snapshot was created. This is because the earlier lookup causes a negative dentry to be cached which is never invalidated. The bug can be reproduced as follows (the second ls should succeed): $ ls /tank/.zfs/snapshot/s ls: cannot access /tank/.zfs/snapshot/s: No such file or directory $ zfs snap tank@s $ ls /tank/.zfs/snapshot/s ls: cannot access /tank/.zfs/snapshot/s: No such file or directory To remedy this, always invalidate cached dentries in the snapshot control directory. Since these entries never exist on disk there is no significant performance penalty for the extra lookups. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1192
*	Fix quoting error in unmount command	Ned Bass	2013-01-16	1	-1/+1
\| \| \| \| \| \| \| \| \|	A misplaced single quote caused the umount command to fail with a syntax error when unmounting snapshots under the .zfs/snapshot control directory. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1210
*	Illumos #3189 kernel panic in test hotspare_onoffline_004_neg	Christopher Siden	2013-01-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	3189 kernel panic in ZFS test suite during hotspare_onoffline_004_neg Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Arne Jansen <[email protected]> Approved by: Dan McDonald <[email protected]> References: illumos/illumos-gate@8f0b538d1dc99df23a6a89cfd9ffddc1b9804a00 changeset: 13818:e9ad0a945d45 https://www.illumos.org/issues/3189 Ported-by: Brian Behlendorf <[email protected]>
*	Illumos #1862 incremental zfs receive fails for sparse file > 8PB	Arne Jansen	2013-01-14	1	-15/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1862 incremental zfs receive fails for sparse file > 8PB Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Simon Klinkert <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos/illumos-gate@31495a1e56860f4575614774a592fe33fc9c71f2 illumos changeset: 13789:f0c17d471b7a https://www.illumos.org/issues/1862 Ported-by: Brian Behlendorf <[email protected]>
*	Illumos #3208 cross-endian incorrect user/group accounting	Matthew Ahrens	2013-01-14	1	-9/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	3208 moving zpool cross-endian results in incorrect user/group accounting Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Christopher Siden <[email protected]> Approved by: Richard Lowe <[email protected]> References: illumos/illumos-gate@e828a46d29ad418487f50d56b5c19e2a1f9033a7 illumos changeset: 13835:eea81edc4f14 https://www.illumos.org/issues/3208 Ported-by: Brian Behlendorf <[email protected]> Closes #627 Closes #1136
*	Illumos #2618 arc.c mistypes in the comments	Bart Coddens	2013-01-11	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	2618 arc.c mistypes in the comments Reviewed by: Jason King <[email protected]> Reviewed by: Josef Sipek <[email protected]> Approved by: Richard Lowe <[email protected]> References: illumos/illumos-gate@fc98fea58e89224f6f13d7fae246d6cb5dfa35ea illumos changeset: 13721:5b51a16a186f https://www.illumos.org/issues/2618 Ported-by: Brian Behlendorf <[email protected]>
*	call_usermodehelper() should wait for process	Ned Bass	2013-01-09	3	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As of Linux 3.4 the UMH_WAIT_* constants were renumbered. In particular, the meaning of "1" changed from UMH_WAIT_PROC (wait for process to complete), to UMH_WAIT_EXEC (wait for the exec, but not the process). A number of call sites used the number 1 instead of the constant name, so the behavior was not as expected on kernels with this change. One visible consequence of this change was that processes accessing automounted snapshots received an ELOOP error because they failed to wait for zfs.mount to complete. Signed-off-by: Brian Behlendorf <[email protected]> Closes #816
*	Revert "Avoid ELOOP on auto-mounted snapshots"	Brian Behlendorf	2013-01-09	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 7afcf5b1da83549bfba70a61fae7a00eaa63c2b0 which accidentally introduced a regression with the .zfs snapshot directory. While the updated code still does correctly mount the requested snapshot. It updates the vfsmount such that it references the original dataset vfsmount. The result is that the snapshot itself isn't visible. Signed-off-by: Brian Behlendorf <[email protected]> Issue #816
*	Only reduce __zio_execute() stack usage in kernel space	Brian Behlendorf	2013-01-09	1	-0/+2
\| \| \| \| \| \| \| \| \|	Related to 91579709fccd3e55a21970742b66c388fb1403db we need to be very careful about not overrunning the stack in kernel space. However, in user space we're already allowing slightly larger stacks so this stack usage optimization is not required there. Signed-off-by: Brian Behlendorf <[email protected]>
*	Illumos #3145, #3212	George Wilson	2013-01-08	2	-2/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	3145 single-copy arc 3212 ztest: race condition between vdev_online() and spa_vdev_remove() Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Eric Schrock <[email protected]> Reviewed by: Justin T. Gibbs <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos-gate/commit/9253d63df408bb48584e0b1abfcc24ef2472382e illumos changeset: 13840:97fd5cdf328a https://www.illumos.org/issues/3145 https://www.illumos.org/issues/3212 Ported-by: Brian Behlendorf <[email protected]> Closes #989 Closes #1137
*	Illumos #3104: eliminate empty bpobjs	Matthew Ahrens	2013-01-08	6	-11/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	3104 eliminate empty bpobjs Reviewed by: George Wilson <[email protected]> Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos/illumos-gate@f17457368189aa911f774c38c1f21875a568bdca illumos changeset: 13782:8f78aae28a63 https://www.illumos.org/issues/3104 Ported-by: Brian Behlendorf <[email protected]>
*	Fix __zio_execute() asynchronous dispatch	Brian Behlendorf	2013-01-08	1	-9/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	To save valuable stack all zio's were made asynchronous when in the tgx_sync_thread context or during pool initialization. See commit 2fac4c2 for the original patch and motivation. Unfortuantely, the changes to dsl_pool_sync_context() made by the feature flags broke this logic causing in __zio_execute() to dispatch itself infinitely when called during pool initialization. This commit refines the existing logic to specificly target only the two cases we care about. Signed-off-by: Brian Behlendorf <[email protected]>
*	Illumos #3349: zpool upgrade -V bumps the on disk version number	George Wilson	2013-01-08	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	3349 zpool upgrade -V bumps the on disk version number, but leaves the in core version Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Richard Lowe <[email protected]> Approved by: Dan McDonald <[email protected]> References: illumos/illumos-gate@25345e466695fbe736faa53b8f3413d8e8f81981 https://www.illumos.org/issues/3349 Ported-by: Brian Behlendorf <[email protected]>
*	Illumos #3086: unnecessarily setting DS_FLAG_INCONSISTENT on async	Matthew Ahrens	2013-01-08	7	-113/+192
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	3086 unnecessarily setting DS_FLAG_INCONSISTENT on async destroyed datasets Reviewed by: Christopher Siden <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos/illumos-gate@ce636f8b38e8c9ff484e880d9abb27251a882860 illumos changeset: 13776:cd512c80fd75 https://www.illumos.org/issues/3086 Ported-by: Brian Behlendorf <[email protected]>
*	Illumos #2762: zpool command should have better support for feature flags	Christopher Siden	2013-01-08	2	-10/+20
\| \| \| \| \| \| \| \| \| \| \| \| \|	2762 zpool command should have better support for feature flags Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos/illumos-gate@57221772c3fc05faba04bf48ddff45abf2bbf2bd https://www.illumos.org/issues/2762 Ported-by: Brian Behlendorf <[email protected]>
*	Illumos #3090 and #3102	George Wilson	2013-01-08	4	-51/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	3090 vdev_reopen() during reguid causes vdev to be treated as corrupt 3102 vdev_uberblock_load() and vdev_validate() may read the wrong label Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos/illumos-gate@dfbb943217bf8ab22a1a9d2e9dca01d4da95ee0b illumos changeset: 13777:b1e53580146d https://www.illumos.org/issues/3090 https://www.illumos.org/issues/3102 Ported-by: Brian Behlendorf <[email protected]> Closes #939
*	Illumos #2619 and #2747	Christopher Siden	2013-01-08	30	-365/+2454
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	2619 asynchronous destruction of ZFS file systems 2747 SPA versioning with zfs feature flags Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Richard Lowe <[email protected]> Reviewed by: Dan Kruchinin <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos/illumos-gate@53089ab7c84db6fb76c16ca50076c147cda11757 illumos/illumos-gate@ad135b5d644628e791c3188a6ecbd9c257961ef8 illumos changeset: 13700:2889e2596bd6 https://www.illumos.org/issues/2619 https://www.illumos.org/issues/2747 NOTE: The grub specific changes were not ported. This change must be made to the Linux grub packages. Ported-by: Brian Behlendorf <[email protected]>
*	Fix gcc array subscript above bounds warning	Ned Bass	2013-01-07	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a debug build, certain GCC versions flag an array bounds warning in the below code from dnode_sync.c } else { int i; ASSERT(dn->dn_next_nblkptr[txgoff] < dnp->dn_nblkptr); /* the blkptrs we are losing better be unallocated */ for (i = dn->dn_next_nblkptr[txgoff]; i < dnp->dn_nblkptr; i++) ASSERT(BP_IS_HOLE(&dnp->dn_blkptr[i])); This usage is in fact safe, since the ASSERT ensures the index does not exceed to maximum possible number of block pointers. However gcc can't determine that the assignment 'i = dn->dn_next_nblkptr[txgoff];' falls within the array bounds so it issues a warning. To avoid this, initialize i to zero to make gcc happy but skip the elements before dn->dn_next_nblkptr[txgoff] in the loop body. Since a dnode contains at most 3 block pointers this overhead should be negligible. Signed-off-by: Brian Behlendorf <[email protected]> Closes #950
*	Use cv_wait_io() which will will account for iowait	Matt Johnston	2013-01-07	1	-1/+1
\| \| \| \| \| \| \|	Update zio_wait() to use cv_wait_io() to ensure the iowait time is properly accounted for. Signed-off-by: Brian Behlendorf <[email protected]>
*	Revert part of "Log I/Os longer than zio_delay_max (30s default)"	Matt Johnston	2013-01-07	1	-22/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 9dcb97198338ba2d8764dd5604b278118612f74 which was originally introduced to debug occasional slow I/Os. These I/Os would complete eventually but were observed to take several 100 seconds. The root cause of this issue was the CFQ scheduler which can, under certain conditions, excessively delay an I/O from being issued to the device. This issue was mitigated somewhat by commit 84daaddedbfc9cf4bd1490d8a6f4b2967051e308 which ensures the I/O elevator gets changed even for DM style devices. This change isn't in any way harmful but it does conflict with a required change to properly account from I/O wait time. Because Linux does not export the io_schedule_timeout() function we must instead rely on io_schedule() via cv_wait_io(). The additional debugging information which was added to the delay event has been intentionally left in place. Signed-off-by: Brian Behlendorf <[email protected]>
*	Fix zpool on zvol lock inversion deadlock	Brian Behlendorf	2012-12-20	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In all but one case the spa_namespace_lock is taken before the bdev->bd_mutex lock. But Linux __blkdev_get() function calls fops->open() with the bdev->bd_mutex lock held and we must somehow still safely acquire the spa_namespace_lock. To avoid a potential lock inversion deadlock we preemptively try to take the spa_namespace_lock(). Normally it will not be contended and this is safe because spa_open_common() handles the case where the caller already holds the spa_namespace_lock. When it is contended we risk a lock inversion if we were to block waiting for the lock. Luckily, the __blkdev_get() function allows us to return -ERESTARTSYS which will result in bdev->bd_mutex being dropped, reacquired, and fops->open() being called again. This process can be repeated safely until both locks are acquired. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Jorgen Lundman <[email protected]> Closes #612