summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Eliminate runtime function pointer mods in autotools checksRichard Yao2013-03-0410-150/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PaX/GrSecurity patched kernels implement a dialect of C that relies on a GCC plugin for enforcement. A basic idea in this dialect is that function pointers in structures should not change during runtime. This causes code that modifies function pointers at runtime to fail to compile in many instances. The autotools checks rely on whether or not small test cases compile against a given kernel. Some autotools checks assume some default case if other cases fail. When one of these autotools checks tests a PaX/GrSecurity patched kernel by modifying a function pointer at runtime, the default case will be used. Early detection of such situations is possible by relying on compiler warnings, which are compiler errors when --enable-debug is used. Unfortunately, very few people build ZFS with --enable-debug. The more common situation is that these issues manifest themselves as runtime failures in the form of NULL pointer exceptions. Previous patches that addressed such issues with PaX/GrSecurity compatibility largely relied on rewriting autotools checks to avoid runtime function pointer modification or the addition of PaX/GrSecurity specific checks. This patch takes the previous work to its logical conclusion by eliminating the use of runtime function pointer modification. This permits the removal of PaX-specific autotools checks in favor of ones that work across all supported kernels. This should resolve issues that were reported to occur with PaX/GrSecurity-patched Linux 3.7.5 kernels on Gentoo Linux. https://bugs.gentoo.org/show_bug.cgi?id=457176 We should be able to prevent future regressions in PaX/GrSecurity compatibility by ensuring that all changes to ZFSOnLinux avoid runtime function pointer modification. At the same time, this does not solve the issue of silent failures triggering default cases in the autotools check, which is what permitted these regressions to become runtime failures in the first place. This will need to be addressed in a future patch. Reported-by: Marcin Mirosław <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1300
* Fix hot sparesBrian Behlendorf2013-03-013-82/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The issue with hot spares in ZoL is because it opens all leaf vdevs exclusively (O_EXCL). On Linux, exclusive opens cause subsequent exclusive opens to fail with EBUSY. This could be resolved by not opening any of the devices exclusively, which is what Illumos does, but the additional protection offered by exclusive opens is desirable. It cleanly prevents you from accidentally adding an in-use non-ZFS device to your pool. To fix this we very slightly relaxed the usage of O_EXCL in the following ways. 1) Functions which open the device but only read had the O_EXCL flag removed and were updated to use O_RDONLY. 2) A common holder was added to the vdev disk code. This allow the ZFS code to internally open the device multiple times but non-ZFS callers may not. 3) An exception was added to make_disks() for hot spare when creating partition tables. For hot spare devices which are already opened exclusively we skip creating the partition table because this must already have been done when the disk was originally added as a hot spare. Additional minor changes include fixing check_in_use() to use a partition instead of a slice suffix. And is_spare() was moved above make_disks() to avoid adding a forward reference. Signed-off-by: Brian Behlendorf <[email protected]> Closes #250
* Remove wholedisk check from vdev_disk_open()Brian Behlendorf2013-02-281-14/+0
| | | | | | | | | As described by the comment and enforced the by assertion the v->vdev_wholedisk will never be -1. The wholedisk handling is performed by the user space utilities. To prevent confusion this dead code is being removed. Signed-off-by: Brian Behlendorf <[email protected]>
* Leaf vdevs should not be reopenedBrian Behlendorf2013-02-281-3/+16
| | | | | | | | | | | | | | | | | | | When vdev_disk.c was implemented for Linux we failed to handle the reopen case. According to the vdev_reopen() comment leaf vdevs should not be closed or opened when v->vdev_reopening is set. Under Linux we would always close and open the device. This issue was only noticed when a 'zpool scrub' command was run while the leaf vdev device names in /dev/disk/by-vdev were missing. The scrub command calls vdev_reopen() which caused the vdevs to be closed but they couldn't be reopened due to the missing links. The result was that all the vdevs were marked unavailable and the pool was halted due to failmode=wait. This patch adds the missing functionality in a similiar fashion to to the Illumos code. Signed-off-by: Brian Behlendorf <[email protected]>
* -x shouldn't warn about old on-disk format or unavailable featuresTim Connors2013-02-282-2/+5
| | | | | | | | | `zpool status -x` should only flag errors or where the pool is unavailable. If it imported fine but isn't using the latest features available in the code, that's not an error. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1319
* Remove the bio_empty_barrier() check.Etienne Dechamps2013-02-243-29/+0
| | | | | | | | | | | | | | | | | | | | | | | To determine whether the kernel is capable of handling empty barrier BIOs, we check for the presence of the bio_empty_barrier() macro, which was introduced in 2.6.24. If this macro is defined, then we can flush disk vdevs; if it isn't, then flushing is disabled. Unfortunately, the bio_empty_barrier() macro was removed in 2.6.37, even though the kernel is still capable of handling empty barrier BIOs. As a result, flushing is effectively disabled on kernels >= 2.6.37, meaning that starting from this kernel version, zfs doesn't use barriers to guarantee on-disk data consistency. This is quite bad and can lead to potential data corruption on power failures. This patch fixes the issue by removing the configure check for bio_empty_barrier(), as we don't support kernels <= 2.6.24 anymore. Thanks to Richard Kojedzinszky for catching this nasty bug. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1318
* Use -Werror for all kernel configure tests.Etienne Dechamps2013-02-244-11/+1
| | | | | | | | | | | | | As a matter of fact, we're already using -Werror for most tests because of a bug in kernel-bio-empty-barrier.m4 which sets -Werror without reverting it afterwards. This meant that all tests which ran after this one was using -Werror. This patch simply makes it clear that we're using -Werror and makes the code more readable and more predictable. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1317
* Enable zfs_arc_memory_throttle_disable by defaultBrian Behlendorf2013-02-211-1/+1
| | | | | | | | | | | | | | | | | | | The zfs_arc_memory_throttle_disable module option was introduced by commit 0c5493d47059f25ce9dbf20c9fe87655f55102a1 to resolve a memory miscalculation which could result in the txg_sync thread spinning. When this was first introduced the default behavior was left unchanged until enough real world usage confirmed there were no unexpected issues. We've now reached that point. Linux's direct reclaim is working as expected so we're enabling this behavior by default. This helps pave the way to retire the spl_kmem_availrmem() functionality in the SPL layer. This was the only caller. Signed-off-by: Brian Behlendorf <[email protected]> Issue #938
* Fix broken RPATH in spec fileBrian Behlendorf2013-02-121-5/+2
| | | | | | | | | | | | | | | | | | | | | | Rather then setting _prefix=/ and having to override all the default install locations. It's cleaner, and more understandable, to leave prefix=/usr and only override _sbindir and _libdir. This fixes three issues: * The commands no longer get built with an incorrect rpath for the libraries. This is good because fixing this sort of thing is required by the Fedora packaging guidelines. http://fedoraproject.org/wiki/Packaging:Guidelines#Beware_of_Rpath * The various AUTHORS, COPYRIGHT, etc files are now correctly installed under /usr/share/doc instead of /share/doc. * _libexecdir is now handled properly for each distribution. Fedora/RHEL=/usr/libexec, OpenSUSE/SLES=/usr/lib, Debian=/usr/lib/rpm Signed-off-by: Brian Behlendorf <[email protected]> Closes #1058
* Make spa.c assertions catch unsupported pre-feature flag pool versionsRichard Yao2013-02-121-2/+2
| | | | | | | | | | | | | A couple of assertions in spa.c were designed to prevent the use of invalid pool versions. They were written under the assumption that all valid pools are less than SPA_VERSION. Since feature flags jumped from 28 to 5000, any numbers in the range 28 to 5000 non-inclusive will fail to trigger them. We switch to the new SPA_VERSION_IS_SUPPORTED macro to correct this. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1282
* Add explicit MAXNAMELEN checkBrian Behlendorf2013-02-121-0/+3
| | | | | | | | | | | | | It turns out that the Linux VFS doesn't strictly handle all cases where a component path name exceeds MAXNAMELEN. It does however appear to correctly handle MAXPATHLEN for us. The right way to handle this appears to be to add an explicit check to the zpl_lookup() function. Several in-tree filesystems handle this case the same way. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1279
* Update the zfs.8 "ZFS Volumes as Swap" sectionBrian Behlendorf2013-02-071-1/+4
| | | | | | | | As of 0.6.0-rc11 using ZFS volumes as Linux swap devices is supported. Swapping to files in ZFS filesystems is not. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1189
* Switch KM_SLEEP to KM_PUSHPAGENed Bass2013-02-062-3/+3
| | | | | | | | | | | | Two more locations where KM_SLEEP was used in a call which must use KM_PUSHPAGE were found while using the zpool upgrade command. See commit b8d06fc for additional details. Also make a small correction to the comment block above dsl_dir_open_spa(). Signed-off-by: Brian Behlendorf <[email protected]> Closes #1268
* Remove unused machelf.h headerBrian Behlendorf2013-02-052-181/+0
| | | | | | | | | The machelf.h header is never included by anything in the zfs build process. It is all effectively dead code which can be safely removed. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1265
* Fix function relocations in libzpoolRichard Yao2013-02-051-1/+2
| | | | | | | | | | | binutils 2.23.1 fails in situations that generate function relocations on PowerPC and possibly other architectures. This causes linking of libzpool to fail because it depends on libnvpair. We add a dependency on libnvpair to lib/libzpool/Makefile.am to correct that. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1267
* Cast 'zfs bad bloc' to ULL for x86Brian Behlendorf2013-02-041-1/+1
| | | | | | | | | | | Explicitly case this value to an unsigned long long for 32-bit systems to inform the compiler that a long type should not be used. Otherwise we get the following compiler error: dmu_send.c:376: error: integer constant is too large for ‘long’ type Signed-off-by: Brian Behlendorf <[email protected]>
* Fix 1M references in zpool-features.5Brian Behlendorf2013-02-041-5/+5
| | | | | | | | The zpool-features(5) man page should reference the Linux zfs(8) and zpool(8) man pages. The 1M convention isn't used on Linux. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1184
* Add zpool-features(5) man pageBrian Behlendorf2013-02-041-1/+1
| | | | | | | | | | The zpool-features(5) man page was accidentally omitted from the build target when feature flags was merged. As a result it doesn't get installed as part of 'make install' so none of the packages include this man page. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1262
* ZFS 0.6.0-rc14zfs-0.6.0-rc14Brian Behlendorf2013-02-011-1/+1
|
* Add zfs_arc_memory_throttle_disable module optionBrian Behlendorf2013-02-011-0/+7
| | | | | | | | | | | | | | | | | | | | | | The way in which virtual box ab(uses) memory can throw off the free memory calculation in arc_memory_throttle(). The result is the txg_sync thread will effectively spin waiting for memory to be released even though there's lots of memory on the system. To handle this case I'm adding a zfs_arc_memory_throttle_disable module option largely for virtual box users. Setting this option disables free memory checks which allows the txg_sync thread to make progress. By default this option is disabled to preserve the current behavior. However, because Linux supports direct memory reclaim it's doubtful throttling due to perceived memory pressure is ever a good idea. We should enable this option by default once we've done enough real world testing to convince ourselve there aren't any unexpected side effects. Signed-off-by: Brian Behlendorf <[email protected]> Closes #938
* Add zfs_disable_dup_eviction module optionBrian Behlendorf2013-02-011-0/+3
| | | | | | | | Commit 1eb5bfa introduced a new zfs_disable_dup_eviction tunable. It should have been made available as a module option in the original patch but was overlooked. Signed-off-by: Brian Behlendorf <[email protected]>
* Honor 80 character limit in 'zpool status'Brian Behlendorf2013-01-311-2/+2
| | | | | | | | | This is a minor nit, but the second line of the 'action' message when you need to upgrade your pool to support feature flags exceeds the standard 80 character limit. Fix it by moving the word 'feature' on to the third line. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix mismatch between SA header size and layoutNed Bass2013-01-311-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a system attribute layout is created an inconsistency may occur between the system attribute header (sa_hdr_phys_t) size and the variable-sized attribute count stored in the layout. The inconsistency results in the following failed assertion when SA_HDR_SIZE_MATCH_LAYOUT returns false: SPLError: 11315:0:(sa.c:1541:sa_find_idx_tab()) ASSERTION((IS_SA_BONUSTYPE(bonustype) && SA_HDR_SIZE_MATCH_LAYOUT(hdr, tb)) || !IS_SA_BONUSTYPE(bonustype) || (IS_SA_BONUSTYPE(bonustype) && hdr->sa_layout_info == 0)) failed The bug originates in this snippet from sa_find_sizes(). if (is_var_sz && var_size > 1) { if (P2ROUNDUP(hdrsize + sizeof (uint16_t), *total < full_space) { hdrsize += sizeof (uint16_t); This assumes that the current variable-sized attribute will be stored in the current buffer and accounts for the space needed to store its size in the sa_hdr_phys_t. However if the next attribute spills over we need to store a blkptr_t at the end of the bonus buffer to point to the spill block. If the current attribute is in the way of the blkptr_t then it too will be relocated into the spill block. But since we've already accounted for it in the header size we get the inconsistency described above. To avoid this, record the index of the last variable-sized attribute that prompted a hdrsize increase, and reverse the increase if we later determine that that attribute will be relocated to the spill block. Signed-off-by: Matthew Ahrens <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1250
* Fix rounding discrepancy in sa_find_sizes()Ned Bass2013-01-311-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A rounding discrepancy exists between how sa_build_layouts() and sa_find_sizes() calculate when the spill block needs to be kicked in. This results in a narrow size range where sa_build_layouts() believes there must be a spill block allocated but due to the discrepancy there isn't. A panic then occurs when the hdl->sa_spill NULL pointer is dereferenced. The following reproducer for this bug was isolated: truncate -s 128m /tmp/tank zpool create tank /tmp/tank zfs create -o xattr=sa tank/fish ln -s `perl -e 'print "z" x 41'` /tank/fish/z setfattr -hn trusted.foo -v`perl -e 'print "z"x45'` /tank/fish/z This test results in roughly the following system attribute (SA) layout: 176 bytes - "standard" SA's 41 bytes - name of symbolic link target 100 bytes - XDR encoded nvlist for xattr --- 317 bytes - total Because 317 is less than DN_MAX_BONUSLEN (320), sa_find_sizes() decides no spill block is needed. But sa_build_layouts() rounds 41 up to 48 when computing the space requirements so it tries to switch to the spill block. Note that we were only able to reproduce this bug using a combination of symbolic links and the Linux-specific xattr=sa dataset property. So while this issue is not technically Linux-specific, it may be difficult or impossible to hit the narrow size range needed to reproduce it on other platforms. To fix the discrepancy, round the running total in sa_find_sizes() up to an 8-byte boundary before accounting for each SA, since this is how they will be stored in the bonus and (possibly) spill buffers. To make the intent of the code more clear, explicitly assert key assumptions about expected alignment of data and whether spill-over will occur. Signed-off-by: Matthew Ahrens <[email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #1240
* Illumos #3447 improve the comment in txg.cAdam H. Leventhal2013-01-301-2/+71
| | | | | | | | | | | | | | | | 3447 improve the comment in txg.c Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Richard Lowe <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Dan McDonald <[email protected]> References: illumos/illumos-gate@adbbcfface63b3a71922d5a25d34a2018c0435de https://www.illumos.org/issues/3447 Ported-by: Brian Behlendorf <[email protected]>
* Retire zpool_id infrastructureBrian Behlendorf2013-01-2927-851/+16
| | | | | | | | | | | | | | | | | | | In the interest of maintaining only one udev helper to give vdevs user friendly names, the zpool_id and zpool_layout infrastructure is being retired. They are superseded by vdev_id which incorporates all the previous functionality. Documentation for the new vdev_id(8) helper and its configuration file, vdev_id.conf(5), can be found in their respective man pages. Several useful example files are installed under /etc/zfs/. /etc/zfs/vdev_id.conf.alias.example /etc/zfs/vdev_id.conf.multipath.example /etc/zfs/vdev_id.conf.sas_direct.example /etc/zfs/vdev_id.conf.sas_switch.example Signed-off-by: Brian Behlendorf <[email protected]> Closes #981
* Remove NPTL_GUARD_WITHIN_STACKBrian Behlendorf2013-01-294-65/+1
| | | | | | | | | | | | | | | | | Commit 4b2f65b253952c5103311cc8bb4b8cdc6836fd7e increased the user space stack by 4x to resolve certain stack overflows. As such it no longer makes sense to worry about a single extra page which might or might not be part of the process stack. There is now ample headroom for normal usage. By eliminating this configure check we are also resolving the following segfault which intentionally occurs at configure time and may be logged in dmesg. conftest[22156]: segfault at 7fbf18a47e48 ip 00000000004007fe sp 00007fbf18a4be50 error 6 in conftest[400000+1000] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos #3035 LZ4 compression support in ZFS and GRUBEric Dillmann2013-01-2913-2/+1188
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 3035 LZ4 compression support in ZFS and GRUB Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Christopher Siden <[email protected]> References: illumos/illumos-gate@a6f561b4aee75d0d028e7b36b151c8ed8a86bc76 https://www.illumos.org/issues/3035 http://wiki.illumos.org/display/illumos/LZ4+Compression+In+ZFS This patch has been slightly modified from the upstream Illumos version to be compatible with Linux. Due to the very limited stack space in the kernel a lz4 workspace kmem cache is used. Since we are using gcc we are also able to take advantage of the gcc optimized __builtin_ctz functions. Support for GRUB has been dropped from this patch. That code is available but those changes will need to made to the upstream GRUB package. Lastly, several hunks of dead code were dropped for clarity. They include the functions real_LZ4_uncompress(), LZ4_compressBound() and the Visual Studio specific hunks wrapped in _MSC_VER. Ported-by: Eric Dillmann <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1217
* Quiet mkfs.ext2 outputBrian Behlendorf2013-01-281-1/+1
| | | | | | | | | The -q option should quiet the mkfs.ext2 output but certain versions of e2fsprogs appear to ignore it. This can result in an extra 'done' message in the test output. To keep this noise from distracting just direct stdout to /dev/null. Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 2.6.26 compat, lookup_bdev()Brian Behlendorf2013-01-283-0/+27
| | | | | | | | | | | | | | | | | | | It's doubtful many people were impacted by this but commit 6c28567 accidentally broke ZFS builds for 2.6.26 and earlier kernels. This commit depends on the lookup_bdev() function which exists in 2.6.26 but wasn't exported until 2.6.27. The availability of the function isn't critical so a wrapper is introduced which returns ERR_PTR(-ENOTSUP) when the function isn't defined. This will have the effect of causing zvol_is_zvol() to always fail for 2.6.26 kernels. This in turn means vdevs will always get opened concurrently which is good for normal usage. This will only become an issue if your using a zvol as a vdev in another pool. In which case you really should be using a newer kernel anyway. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1205
* Stop using /bin/ as a source in zconfig.shBrian Behlendorf2013-01-282-10/+33
| | | | | | | | | | | | | | | | | Test 5, 6, 7, and 7 in zconfig.sh use /bin/ as a source of random directories and files for their test. This has lead to unexpected tests failures because the total size of /bin/ on the test system isn't checked and it is entirely possible for it to be larger than the target filesystem. To resolve this issue we create a somewhat random collection of files and directories in /var/tmp to use. On average we expect about 5MB of data with the worst case being 20MB. This is large enough to be interesting and small enough to always fit in the default test datasets. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1113
* Use strerror() not strerror_r()Brian Behlendorf2013-01-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The differ() function used strerror_r() instead of strerror() because it allowed the error message to be directly copied in to a buffer. This causes two issues under Linux. * There are two versions of strerror_r() available an XSI-compliant version which returns an 'int' error code. And a GNU-specific version which return a 'char *' to the resulting error string. int strerror_r(int errnum, char *buf, size_t buflen); /* XSI */ char *strerror_r(int errnum, char *buf, size_t buflen); /* GNU */ * The most recent versions of strerror_r() are annotated with the warn_unused_result attribute. This causes the following warning since the upstream implementation casts the result to void. warning: ignoring return value of 'strerror_r', declared with attribute warn_unused_result [-Wunused-result] The cleanest way to resolve both of these problems is just to use strerror() and make a copy of the result in to the buffer. This resolves both issues and this is the only instance of strerror_r() in the code base. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1231
* Avoid gcc -Werror=maybe-uninitialized warningsChris Wedgwood2013-01-281-2/+2
| | | | | | | | | | | Explicitly set acl details to zero to silence gcc (zfs_acl_node_read can't be sure zfs_acl_znode_info will set acl_count and aclsize). Normally suppressing these warnings by setting this to zero at declaration time is a bad idea but in this instance it's hard to avoid and should be fairly safe. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1244
* Use dsl_dataset_snap_lookup()Brian Behlendorf2013-01-255-35/+8
| | | | | | | | | | | | Retire the dmu_snapshot_id() function which was introduced in the initial .zfs control directory implementation. There is already an existing dsl_dataset_snap_lookup() which does exactly what we need, and the dmu_snapshot_id() function as implemented is racy. https://github.com/zfsonlinux/zfs/issues/1215#issuecomment-12579879 Signed-off-by: Brian Behlendorf <[email protected]> Closes #1238
* vdev_id: improve keyword parsing flexibilityNed Bass2013-01-251-9/+11
| | | | | | | | | | | The vdev_id udev helper strictly requires configuration file keywords to always be anchored at the beginning of the line and to be followed by a space character. However, users may prefer to use indentation or tab delimitation. Improve flexibility by simply requiring a keyword to be the first field on the line. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1239
* Fix zconfig.sh partitioning errorBrian Behlendorf2013-01-241-2/+2
| | | | | | | | | | | Parted version 3.0 doesn't allow us to specify the start and end percentages as 50% and 100% respectively. This results in: Error: The location 100% is outside the device /dev/zd0 Therefore we change the syntax to 51% and -1 for end of device. Signed-off-by: Brian Behlendorf <[email protected]>
* Fix test script error codesBrian Behlendorf2013-01-241-1/+1
| | | | | | | | | The 'exit $?' command in the INT TERM EXIT trap was overwritting the expected error code with the error code from mv. Fix the issue by removing the 'exit $?'. It's important the we preserve the original error code so failures are easily noticed. Signed-off-by: Brian Behlendorf <[email protected]>
* Add d_clear_d_op() compatibilityBrian Behlendorf2013-01-232-0/+22
| | | | | | | | | | | | | Added d_clear_d_op() helper function which clears some flags and the registered dentry->d_op table. This is required because d_set_d_op() issues a warning when the dentry operations table is already set. For the .zfs control directory to work properly we must be able to override the default operations table and register custom .d_automount and .d_revalidate callbacks. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #1230
* fzap_cursor_move_to_key() should drop l_rwlockNed Bass2013-01-231-6/+6
| | | | | | | | | | | | | Callers of zap_deref_leaf() must be careful to drop leaf->l_rwlock since that function returns with the lock held on success. All other callers drop the lock correctly but it seems fzap_cursor_move_to_key() does not. This may block writers or cause VERIFY failures when the lock is freed. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1215 Closes zfsonlinux/spl#143 Closes zfsonlinux/spl#97
* Fix zpl_revalidate() NULL derefBrian Behlendorf2013-01-221-1/+1
| | | | | | | | | | | | In zpl_revalidate() it's possible for the nameidata to be NULL for kernels which still accept the parameter. In particular, lookup_one_len() calls d_revalidate() with a NULL nameidata. Resolve the issue by checking for a NULL nameidata in which case just set the flags to 0. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1226
* Use sb->s_d_op default dentry operationsBrian Behlendorf2013-01-184-11/+29
| | | | | | | | | | As of Linux 2.6.37 the right way to register custom dentry operations is to use the super block's ->s_d_op field. For older kernels they should be registered as part of the lookup operation. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1223
* Fix zpool on zvol deadlockMassimo Maggi2013-01-181-13/+13
| | | | | | | | | | | | | | | Commit 65d56083b4617a4cade0cff68cbbaf68114169d6 fixes the lock inversion between spa_namespace_lock and bdev->bd_mutex but only for the first user of spa_namespace_lock: dmu_objset_own(). Later spa_namespace_lock gets acquired by dsl_prop_get_integer() though dsl_prop_get()->dsl_dataset_hold()->dsl_dir_open_spa()-> spa_open()->spa_open_common() without this "protection". By moving the mutex release after this second use, even this acquisition of the lock is "protected" by the ERESTARTSYS trick. Signed-off-by: Massimo Maggi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1220
* Revert "Revert "Fix unlink/xattr deadlock""Brian Behlendorf2013-01-172-55/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 53c7411919a64d6f0889aa0d6974610f6cd35744 effectively reinstating the asynchronous xattr cleanup code. These Linux changes were reverted because after testing and careful contemplation I was convinced that due to the 89260a1c8851ce05ea04b23606ba438b271d890 commit they were no longer required. Unfortunately, the deadlock described in #1176 was a case which wasn't considered. At mount zfs_unlinked_drain() can occur which will unlink a list of znodes in effectively a random order which isn't safe. The only reason it was safe to originally revert this change was the we could guarantee that the VFS would always prune the xattr leaves before the parents. Therefore, until we can cleanly resolve this deadlock for all cases we need to keep this change in spite of the xattr unlink performance penalty associated with it. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1176 Issue #457
* Fix 'zfs rollback' on mounted file systemsBrian Behlendorf2013-01-179-42/+144
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rolling back a mounted filesystem with open file handles and cached dentries+inodes never worked properly in ZoL. The major issue was that Linux provides no easy mechanism for modules to invalidate the inode cache for a file system. Because of this it was possible that an inode from the previous filesystem would not get properly dropped from the cache during rolling back. Then a new inode with the same inode number would be create and collide with the existing cached inode. Ideally this would trigger an VERIFY() but in practice the error wasn't handled and it would just NULL reference. Luckily, this issue can be resolved by sprucing up the existing Solaris zfs_rezget() functionality for the Linux VFS. The way it works now is that when a file system is rolled back all the cached inodes will be traversed and refetched from disk. If a version of the cached inode exists on disk the in-core copy will be updated accordingly. If there is no match for that object on disk it will be unhashed from the inode cache and marked as stale. This will effectively make the inode unfindable for lookups allowing the inode number to be immediately recycled. The inode will then only be accessible from the cached dentries. Subsequent dentry lookups which reference a stale inode will result in the dentry being invalidated. Once invalidated the dentry will drop its reference on the inode allowing it to be safely pruned from the cache. Special care is taken for negative dentries since they do not reference any inode. These dentires will be invalidate based on when they were added to the dentry cache. Entries added before the last rollback will be invalidate to prevent them from masking real files in the dataset. Two nice side effects of this fix are: * Removes the dependency on spl_invalidate_inodes(), it can now be safely removed from the SPL when we choose to do so. * zfs_znode_alloc() no longer requires a dentry to be passed. This effectively reverts this portition of the code to its upstream counterpart. The dentry is not instantiated more correctly in the Linux ZPL layer. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #795
* Fix false ENOENT on snapshot control dentriesNed Bass2013-01-164-58/+158
| | | | | | | | | | | | | | | | | | | | | | Lookups in the snapshot control directory for an existing snapshot fail with ENOENT if an earlier lookup failed before the snapshot was created. This is because the earlier lookup causes a negative dentry to be cached which is never invalidated. The bug can be reproduced as follows (the second ls should succeed): $ ls /tank/.zfs/snapshot/s ls: cannot access /tank/.zfs/snapshot/s: No such file or directory $ zfs snap tank@s $ ls /tank/.zfs/snapshot/s ls: cannot access /tank/.zfs/snapshot/s: No such file or directory To remedy this, always invalidate cached dentries in the snapshot control directory. Since these entries never exist on disk there is no significant performance penalty for the extra lookups. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1192
* Fix quoting error in unmount commandNed Bass2013-01-161-1/+1
| | | | | | | | | A misplaced single quote caused the umount command to fail with a syntax error when unmounting snapshots under the .zfs/snapshot control directory. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1210
* Ensure that zfs diff prints unicode safely.Darik Horn2013-01-161-1/+1
| | | | | | | | | In the stream_bytes() library function used by `zfs diff`, explicitly cast each byte in the input string to an unsigned character so that the Linux fprintf() correctly escapes to octal and does not mangle the output. Signed-off-by: Brian Behlendorf <[email protected]> Closes #1172
* Illumos #3189 kernel panic in test hotspare_onoffline_004_negChristopher Siden2013-01-141-1/+1
| | | | | | | | | | | | | | | 3189 kernel panic in ZFS test suite during hotspare_onoffline_004_neg Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Arne Jansen <[email protected]> Approved by: Dan McDonald <[email protected]> References: illumos/illumos-gate@8f0b538d1dc99df23a6a89cfd9ffddc1b9804a00 changeset: 13818:e9ad0a945d45 https://www.illumos.org/issues/3189 Ported-by: Brian Behlendorf <[email protected]>
* Illumos #1862 incremental zfs receive fails for sparse file > 8PBArne Jansen2013-01-141-15/+27
| | | | | | | | | | | | | | | 1862 incremental zfs receive fails for sparse file > 8PB Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Simon Klinkert <[email protected]> Approved by: Eric Schrock <[email protected]> References: illumos/illumos-gate@31495a1e56860f4575614774a592fe33fc9c71f2 illumos changeset: 13789:f0c17d471b7a https://www.illumos.org/issues/1862 Ported-by: Brian Behlendorf <[email protected]>
* Illumos #3208 cross-endian incorrect user/group accountingMatthew Ahrens2013-01-142-10/+27
| | | | | | | | | | | | | | | | | | 3208 moving zpool cross-endian results in incorrect user/group accounting Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Christopher Siden <[email protected]> Approved by: Richard Lowe <[email protected]> References: illumos/illumos-gate@e828a46d29ad418487f50d56b5c19e2a1f9033a7 illumos changeset: 13835:eea81edc4f14 https://www.illumos.org/issues/3208 Ported-by: Brian Behlendorf <[email protected]> Closes #627 Closes #1136