summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Disable direct reclaim on zvolsRichard Yao2012-04-301-1/+7
| | | | | | | | | | | Previously, it was possible for the direct reclaim path to be invoked when a write to a zvol was made. When a zvol is used as a swap device, this often causes swap requests to depend on additional swap requests, which deadlocks. We address this by disabling the direct reclaim path on zvols. Signed-off-by: Brian Behlendorf <[email protected]> Closes #342
* Update ARC memory limits to account for SLUB internal fragmentationRichard Yao2012-04-301-5/+1
| | | | | | | | | | | | | | | | 23bdb07d4e4c435205d25d3efdb5fef2d089ce5e updated the ARC memory limits to be 1/2 of memory or all but 4GB. Unfortunately, these values assume zero internal fragmentation in the SLUB allocator, when in reality, the internal fragmentation could be as high as 50%, effectively doubling memory usage. This poses clear safety issues, because it permits the size of ARC to exceed system memory. This patch changes this so that the default value of arc_c_max is always 1/2 of system memory. This effectively limits the ARC to the memory that the system has physically installed. Signed-off-by: Brian Behlendorf <[email protected]> Closes #660
* Integrate ARC more tightly with LinuxBrian Behlendorf2012-04-301-155/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under Solaris the ARC was designed to stay one step ahead of the VM subsystem. It would attempt to recognize low memory situtions before they occured and evict data from the cache. It would also make assessments about if there was enough free memory to perform a specific operation. This was all possible because Solaris exposes a fairly decent view of the memory state of the system to other kernel threads. Linux on the other hand does not make this information easily available. To avoid extensive modifications to the ARC the SPL attempts to provide these same interfaces. While this works it is not ideal and problems can arise when the ARC and Linux have different ideas about when your out of memory. This has manifested itself in the past as a spinning arc_reclaim_thread. This patch abandons the emulated Solaris interfaces in favor of the prefered Linux interface. That means moving the bulk of the memory reclaim logic out of the arc_reclaim_thread and in to the evict driven shrinker callback. The Linux VM will call this function when it needs memory. The ARC is then responsible for attempting to free the requested amount of memory if possible. Several interfaces have been modified to accomidate this approach, however the basic user space implementation remains the same. The following changes almost exclusively just apply to the kernel implementation. * Removed the hdr_recl() reclaim callback which is redundant with the broader arc_shrinker_func(). * Reduced arc_grow_retry to 5 seconds from 60. This is now used internally in the ARC with arc_no_grow to indicate that direct reclaim was recently performed. This typically indicates a rapid change in memory demands which the kswapd threads were unable to keep ahead of. As long as direct reclaim is happening once every 5 seconds arc growth will be paused to avoid further contributing to the existing memory pressure. The more common indirect reclaim paths will not set arc_no_grow. * arc_shrink() has been extended to take the number of bytes by which arc_c should be reduced. This allows for a more granual reduction of the arc target. Since the kernel provides a reclaim value to the arc_shrinker_func() this value is used instead of 1<<arc_shrink_shift. * arc_reclaim_needed() has been removed. It was used to determine if the system was under memory pressure and relied extensively on Solaris specific VM interfaces. In most case the new code just checks arc_no_grow which indicates that within the last arc_grow_retry seconds direct memory reclaim occurred. * arc_memory_throttle() has been updated to always include the amount of evictable memory (arc and page cache) in its free space calculations. This space is largely available in most call paths due to direct memory reclaim. * The Solaris pageout code was also removed to avoid confusion. It has always been disabled due to proc_pageout being defined as NULL in the Linux port. Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Add zfs_mdcomp_disable module optionBrian Behlendorf2012-04-271-0/+3
| | | | | | | | | | Expose the zfs_mdcomp_disable variable as a module option. This can be used to disable compression of zfs meta data which is enabled by default. This shouldn't need to be tuned but for most workloads, however there may be very specific instances where it makes sense to trade disk capacity for extra cpu cycles. Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos #2067: uninitialized variables in zfs(1M) may make snapshots ↵Richard Lowe2012-04-271-16/+16
| | | | | | | | | | | | | | | | | | undestroyable Reviewed by: Joshua M. Clulow <[email protected]> Reviewed by: Milan Jurik <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Steve Gonczi <[email protected]> Approved by: Garrett D'Amore <[email protected]> References: https://www.illumos.org/issues/2067 Ported by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos #1946: incorrect formatting when listing output of multiple pools ↵Frederik Wessels2012-04-271-1/+5
| | | | | | | | | | | | | | with zpool iostat -v Reviewed by: Richard Elling <[email protected]> Reviewed by: Joshua M. Clulow <[email protected]> Approved by: Richard Lowe <[email protected]> Reference to Illumos issue: https://www.illumos.org/issues/1946 Ported by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos #952: separate intent logs should be obvious in 'zpool iostat' outputMike Harsch2012-04-271-3/+33
| | | | | | | | | | | | | | | | Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Eric Schrock <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Approved by: Eric Schrock <[email protected]> Refererce to Illumos issue: https://www.illumos.org/issues/952 Ported-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #607
* Illumos #1909: disk sync write perf regression when slog is used post oi_148Brian Behlendorf2012-04-192-7/+16
| | | | | | | | | | | | | | | | | | | Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Eric Schrock <[email protected]> Reviewed by: Robert Mustacchi <[email protected]> Reviewed by: Bill Pijewski <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Steve Gonczi <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Albert Lee <[email protected]> Approved by: Eric Schrock <[email protected]> Refererces to Illumos issue: https://www.illumos.org/issues/1909 Signed-off-by: Brian Behlendorf <[email protected]> Closes #680
* Use KM_PUSHPAGE in l2arc_write_buffersPrakash Surya2012-04-171-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is potential for deadlock in the l2arc_feed thread if KM_PUSHPAGE is not used for the allocations made in l2arc_write_buffers. Specifically, if KM_PUSHPAGE is not used for these allocations, it is possible for reclaim to be triggered which can cause the l2arc_feed thread to deadlock itself on the ARC_mru mutex. An example of this is demonstrated in the following backtrace of the l2arc_feed thread: crash> bt 4123 PID: 4123 TASK: ffff88062f8c1500 CPU: 6 COMMAND: "l2arc_feed" 0 [ffff88062511d610] schedule at ffffffff814eeee0 1 [ffff88062511d6d8] __mutex_lock_slowpath at ffffffff814f057e 2 [ffff88062511d748] mutex_lock at ffffffff814f041b 3 [ffff88062511d768] arc_evict at ffffffffa05130ca [zfs] 4 [ffff88062511d858] arc_adjust at ffffffffa05139a9 [zfs] 5 [ffff88062511d878] arc_shrink at ffffffffa0513a95 [zfs] 6 [ffff88062511d898] arc_kmem_reap_now at ffffffffa0513be8 [zfs] 7 [ffff88062511d8c8] arc_shrinker_func at ffffffffa0513ccc [zfs] 8 [ffff88062511d8f8] shrink_slab at ffffffff8112a17a 9 [ffff88062511d958] do_try_to_free_pages at ffffffff8112bfdf 10 [ffff88062511d9e8] try_to_free_pages at ffffffff8112c3ed 11 [ffff88062511da98] __alloc_pages_nodemask at ffffffff8112431d 12 [ffff88062511dbb8] kmem_getpages at ffffffff8115e632 13 [ffff88062511dbe8] fallback_alloc at ffffffff8115f24a 14 [ffff88062511dc68] ____cache_alloc_node at ffffffff8115efc9 15 [ffff88062511dcc8] __kmalloc at ffffffff8115fbf9 16 [ffff88062511dd18] kmem_alloc_debug at ffffffffa047b8cb [spl] 17 [ffff88062511dda8] l2arc_feed_thread at ffffffffa0511e71 [zfs] 18 [ffff88062511dea8] thread_generic_wrapper at ffffffffa047d1a1 [spl] 19 [ffff88062511dee8] kthread at ffffffff81090a86 20 [ffff88062511df48] kernel_thread at ffffffff8100c14a Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* ZFS list snapshot property aliasP.SCH2012-04-112-5/+6
| | | | | | | | Add support for the `zfs list -t snap` alias which is available under Oracle Solaris 11. Signed-off-by: Brian Behlendorf <[email protected]> Closes #640
* ZFS snapshot aliasP.SCH2012-04-112-5/+11
| | | | | | | | | For consistency, and because it's handy, add the 'zfs snap' alias which was introduced by Oracle Solaris 11. This includes an update to the man page to reflect all the available alias (snap, umount, and recv). Signed-off-by: Brian Behlendorf <[email protected]> Closes #640
* Illumos #1346: zfs incremental receive may leave behind temporary clonesMartin Matuska2012-04-111-2/+4
| | | | | | | | | | | | | | | 1356 zfs dataset prefetch code not working Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Dan McDonald <[email protected]> Approved by: Gordon Ross <[email protected]> References to Illumos issue: https://www.illumos.org/issues/1346 https://www.illumos.org/issues/1356 Ported-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #647
* Illumos #1475: zfs spill block hold can access invalid spill blkptrAlbert Lee2012-04-111-10/+14
| | | | | | | | | | | | | | | Reviewed by: Dan McDonald <[email protected]> Reviewed by: Gordon Ross <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Garrett D'Amore <[email protected]> References to Illumos issue: https://www.illumos.org/issues/1475 Ported-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #648
* Illumos #1951: leaking a vdev when removing an l2cache deviceGeorge Wilson2012-04-114-9/+21
| | | | | | | | | | | | | | | | | | | | | 1952 memory leak when adding a file-based l2arc device 1954 leak in ZFS from metaslab_group_create and zfs_ereport_checksum Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Eric Schrock <[email protected]> Reviewed by: Bill Pijewski <[email protected]> Reviewed by: Dan McDonald <[email protected]> Approved by: Eric Schrock <[email protected]> References to Illumos issues: https://www.illumos.org/issues/1951 https://www.illumos.org/issues/1952 https://www.illumos.org/issues/1954 Ported-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #650
* OS-926: zfs panic in zfs_fill_zplprops_impl()Martin Matuska2012-04-111-6/+12
| | | | | | | | | | | | | This change appears to be exclusive to SmartOS. It is not present in illumos-gate but it just adds some needed error handling. This is clearly preferable to simply ASSERTING which is what would occur prior to the patch. Reviewed by: Jerry Jelinek <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Ported-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #652
* Illumos #1680: zfs vdev_file_io_start: validate vdev before using vdev_tsdAndriy Gapon2012-04-111-7/+8
| | | | | | | | | | | | vdev_tsd can be NULL for certain vdev states. At least in userland testing with ztest. References to Illumos issue: https://www.illumos.org/issues/1680 Ported-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #655
* Improve error message consistencyRichard Laager2012-04-111-5/+5
| | | | | Signed-off-by: Richard Laager <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Document the zle compression algorithmRichard Laager2012-04-111-2/+6
| | | | | Signed-off-by: Richard Laager <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Export additional dsl symbolsBrian Behlendorf2012-04-112-1/+12
| | | | | | | | | Principly these symbols were exported to get access to the dsl_prop_register/dsl_prop_unregister functions. They allow us to cleanly register a callback which is called when a dataset property is modified. Signed-off-by: Brian Behlendorf <[email protected]>
* Fixed a NULL pointer dereference bug in zfs_preumountGunnar Beutner2012-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | When zpl_fill_super -> zfs_domount fails (e.g. because the dataset was destroyed before it could be successfully mounted) the subsequent call to zpl_kill_sb -> zfs_preumount would derefence a NULL pointer. This bug can be reproduced using this shell script: #!/bin/sh ( while true; do zfs create -o mountpoint=legacz tank/bar zfs destroy tank/bar done ) & ( while true; do mount -t zfs tank/bar /mnt umount /mnt done ) & Signed-off-by: Brian Behlendorf <[email protected]> Closes #639
* Make Gentoo initscript use modinfoRichard Yao2012-04-031-1/+1
| | | | | | | | | The -l parameter to modprobe has been removed from the latest upstream code and this change has entered Gentoo. Using modinfo as a substitute addresses this. Signed-off-by: Brian Behlendorf <[email protected]> Closes #636
* Print human readable error message for ENOENTRichard Yao2012-04-031-0/+4
| | | | | | | | | | | | | | | A cryptic error code is printed when mounting a legacy dataset to a non-existent mountpoint. This patch changes this behavior to print "mount point '%s' does not exist", which is similar to the error message printed when mounting procfs. The single quotes were added to be consistent with the existing EBUSY error message, which is the only difference between this error message and the one that is printed when the same condition occurs when mounting procfs. Signed-off-by: Brian Behlendorf <[email protected]> Closes #633
* Properly expose the mfu ghost list kstatsBrian Behlendorf2012-03-271-1/+1
| | | | | | | | Due to a typo the mru ghost lists stats were accidentally being exposed as the mfu ghost list stats. This was harmless but confusing since memory usage could be over reported. Signed-off-by: Brian Behlendorf <[email protected]>
* Remove hard-coded 80 column outputCraig Sanders2012-03-271-4/+25
| | | | | | | | When stdout is detected to be a tty use the number of columns specified by the terminal. If that fails fall back to a default 80 column width. In the non-tty case allow for 999 column lines. Signed-off-by: Brian Behlendorf <[email protected]>
* ZFS 0.6.0-rc8zfs-0.6.0-rc8Brian Behlendorf2012-03-261-1/+1
|
* Fix executable permissionsBrian Behlendorf2012-03-263-0/+1
| | | | | | | | | | | | Caught by lint, this permission change was accidentally introduced by commit 42cb3819f1a1f536105faac81ffc150f3da90a80. Restore the correct permissions and while I'm at it add a missing whack-bang to config/ltmain.sh. lint: executable-not-elf-or-script: zpool_main.c zfs_main.c Signed-off-by: Brian Behlendorf <[email protected]> Closes #620
* Add --enable-debug-dmu-tx configure optionBrian Behlendorf2012-03-2362-25/+154
| | | | | | | | | | | | | | | | | | Allow rigorous (and expensive) tx validation to be enabled/disabled indepentantly from the standard zfs debugging. When enabled these checks ensure that all txs are constructed properly and that a dbuf is never dirtied without taking the correct tx hold. This checking is particularly helpful when adding new dmu consumers like Lustre. However, for established consumers such as the zpl with no known outstanding tx construction problems this is just overhead. --enable-debug-dmu-tx - Enable/disable validation of each tx as --disable-debug-dmu-tx it is constructed. By default validation is disabled due to performance concerns. Signed-off-by: Brian Behlendorf <[email protected]>
* Enhance a dmu_tx_dirty_buf() assertionBrian Behlendorf2012-03-231-1/+2
| | | | | | | | | | | | The following assertion is good to validate the correctness of new DMU consumers, but it doesn't quite provide enough information. Slightly rework the assertion so that when it is hit the actual offending values will be included in the output. SPLError: 4787:0:(dmu_tx.c:828:dmu_tx_dirty_buf()) ASSERTION(dn == NULL || dn->dn_assigned_txg == tx->tx_txg) failed Signed-off-by: Brian Behlendorf <[email protected]>
* Add ZFS_META_RELEASE to module load/unload messagesBrian Behlendorf2012-03-231-6/+7
| | | | | | | | Include the ZFS_META_RELEASE in the module load/unload messages to more clearly indidcate exactly what version of ZFS has been loaded. Signed-off-by: Brian Behlendorf <[email protected]>
* Account for .zfs ctldir inodesBrian Behlendorf2012-03-221-0/+1
| | | | | | | | | | | | | | Because the .zfs ctldir inodes are not backed by physical storage they use a different create path which was not properly accounting for them as used. This could result in ->nr_cached_objects() returning 0 and cause a divide by zero error in prune_super(). In my option there's a kernel bug here too which allows this to happen. They should either be checking for 0 or adding +1 like they correctly do earlier in the function. Signed-off-by: Brian Behlendorf <[email protected]> Closes #617
* Add .zfs control directoryBrian Behlendorf2012-03-2279-101/+2037
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for the .zfs control directory. This was accomplished by leveraging as much of the existing ZFS infrastructure as posible and updating it for Linux as required. The bulk of the core functionality is now all there with the following limitations. *) The .zfs/snapshot directory automount support requires a 2.6.37 or newer kernel. The exception is RHEL6.2 which has backported the d_automount patches. *) Creating/destroying/renaming snapshots with mkdir/rmdir/mv in the .zfs/snapshot directory works as expected. However, this functionality is only available to root until zfs delegations are finished. * mkdir - create a snapshot * rmdir - destroy a snapshot * mv - rename a snapshot The following issues are known defeciences, but we expect them to be addressed by future commits. *) Add automount support for kernels older the 2.6.37. This should be possible using follow_link() which is what Linux did before. *) Accessing the .zfs/snapshot directory via NFS is not yet possible. The majority of the ground work for this is complete. However, finishing this work will require resolving some lingering integration issues with the Linux NFS kernel server. *) The .zfs/shares directory exists but no futher smb functionality has yet been implemented. Contributions-by: Rohan Puri <[email protected]> Contributiobs-by: Andrew Barnes <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #173
* Add zio constructor/destructorBrian Behlendorf2012-03-211-15/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add a standard zio constructor and destructor. Normally, this is done to reduce to cost of allocating a new structure by reducing expensive operations such as memory allocations. However, in this case none of the operations moved out of zio_create() were really very expensive. This change was principly made as a debug patch (and workaround) for a zio_destroy() race. The is good evidence that zio_create() is reinitializing a mutex which is really still in use by another thread. This would completely explain the observed symptoms in the issue report. This patch doesn't fix the root cause of the race, but it should make it less likely by only initializing the mutex once in the constructor. Also, this particular flaw might have gone unnoticed in other zfs implementations due to the specific implementation details of Linux ticket spinlocks. Once the real root cause is determined and resolved this change can be safely reverted. Until then this should help workaround the issue. Signed-off-by: Brian Behlendorf <[email protected]> Issue #496
* Revert "Add zio constructor/destructor"Brian Behlendorf2012-03-211-62/+15
| | | | | | | | | | | | | | | | | | | This patch was slightly flawed and allowed for zio->io_logical to potentially not be reinitialized for a new zio. This could lead to assertion failures in specific cases when debugging is enabled (--enable-debug) and I/O errors are encountered. It may also have caused problems when issues logical I/Os. Since we want to make sure this workaround can be easily removed in the future (when we have the real fix). I'm reverting this change and applying a new version of the patch which includes the zio->io_logical fix. This reverts commit 2c6d0b1e07b0265f0661ed7851d3aa8d3e75e7a9. Signed-off-by: Brian Behlendorf <[email protected]> Issue #602 Issue #604
* ZFS 0.6.0-rc7zfs-0.6.0-rc7Brian Behlendorf2012-03-161-1/+1
|
* Add missing NULL in zpl_xattr_handlersBrian Behlendorf2012-03-151-0/+1
| | | | | | | | | | | | | The xattr_resolve_name() helper function expects the registered list of xattr handlers to be NULL terminated. This NULL was accidentally missing which could result in a NULL dereference. Interestingly this issue only manifested itself on certain 32-bit systems. Presumably on 64-bit kernels we just always happen to get lucky and the memory following the structure is zeroed. Signed-off-by: Brian Behlendorf <[email protected]> Issue #594
* Use stderr for 'no pools/datasets available' errorGregor Kopka2012-03-152-6/+6
| | | | | | | | | | | The 'zfs list' and 'zpool list' commands output the message 'no datasets/pools available' to stdout. This should go to stderr and only the available datasets/pools should go to stdout. Returning nothing to stdout is expected behavior when there is nothing to list. Signed-off-by: Brian Behlendorf <[email protected]> Closes #581
* Add sa_spill_rele() interfaceBrian Behlendorf2012-03-072-0/+15
| | | | | | | | | | | Add a SA interface which allows us to release the spill block from a SA handle without destroying the handle. This is useful because we can then ensure that a copy of the dirty spill block is not made at sync time due to the extra hold. Susequent calls to sa_update() or sa_lookup() with transparently refetch the spill block dbuf from the ARC hash. Signed-off-by: Brian Behlendorf <[email protected]>
* Add zio constructor/destructorBrian Behlendorf2012-03-071-15/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add a standard zio constructor and destructor. Normally, this is done to reduce to cost of allocating a new structure by reducing expensive operations such as memory allocations. However, in this case none of the operations moved out of zio_create() were really very expensive. This change was principly made as a debug patch (and workaround) for a zio_destroy() race. The is good evidence that zio_create() is reinitializing a mutex which is really still in use by another thread. This would completely explain the observed symptoms in the issue report. This patch doesn't fix the root cause of the race, but it should make it less likely by only initializing the mutex once in the constructor. Also, this particular flaw might have gone unnoticed in other zfs implementations due to the specific implementation details of Linux ticket spinlocks. Once the real root cause is determined and resolved this change can be safely reverted. Until then this should help workaround the issue. Signed-off-by: Brian Behlendorf <[email protected]> Issue #496
* Fix distribution detectionRichard Yao2012-03-052-50/+58
| | | | | | | | | | | | | | Improve the distribution detection by moving the tests for distribution specific files first. The Ubuntu and Debian checks are left for last because they are the least likely to be unique. This is particularly true in the case of Debian since so many distributions are based on Debian. Since this is currently only used to identify the correct packaging method for this system the result in many instances is simply cosmetic. Signed-off-by: Brian Behlendorf <[email protected]>
* Align parition end on 1 MiB boundaryNed Bass2012-03-051-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | Some devices have exhibited sensitivity to the ending alignment of partitions. In particular, even if the first partition begins at 1 MiB, we have seen many sd driver task abort errors with certain SSDs if the first partition doesn't end on a 1 MiB boundary. This occurs when the vdev label is read during pool creation or importation and causes a delay of about 30 seconds per device. It can also be simulated with dd when the pool isn't imported: dd if=/dev/sda1 of=/dev/null bs=262144 count=1 For the record, this problem was observed with SMARTMOD SG9XCA2E200GE01 200GB SSDs. Unfortunately I don't have a good explanation for this behavior. It seems to have something to do with highly fragmented single-sector requests being issued to the device, which it may not support. With end-aligned partitions at least page-sized requests were queued and issued to the driver according to blktrace. In any case, aligning the partition end is a fairly innocuous work-around, wasting at most 1 MiB of space. Signed-off-by: Brian Behlendorf <[email protected]> Closes #574
* Use SA_HDL_PRIVATE for SA xattrsBrian Behlendorf2012-03-021-5/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A private SA handle must be used to ensure we can drop the dbuf hold on the spill block prior to calling dmu_tx_commit(). If we call dmu_tx_commit() before sa_handle_destroy(), then our hold will trigger a copy of the dbuf to be made. This is done to prevent data from leaking in to the syncing txg. As a result the original dirty spill block will remain cached. Additionally, relying on the shared zp->z_sa_hdl is unsafe in the xattr context because the znode may be asynchronously dropped from the cache. It's far safer and simpler just to use a private handle for xattrs. Plus any additional overhead is offset by the avoidance of the previously mentioned memory copy. These forever dirty buffers can be noticed in the arcstats under the anon_size. On a quiescent system the value should be zero. Without this fix and a SA xattr write workload you will see anon_size increase. Eventually, if enough dirty data builds up your system it will appear to hang. This occurs because the dmu won't allow new txs to be assigned until that dirty data is flushed, and it won't be because it's not part of an assigned tx. As an aside, I typically see anon_size lurk around 16k so I think there is another place in the code which needs a similar fix. However, this value doesn't grow over time so it isn't critical. Signed-off-by: Brian Behlendorf <[email protected]> Issue #503 Issue #513
* Cleanly support debug packagesBrian Behlendorf2012-02-2760-16/+99
| | | | | | | | | | | | | | | | | | | Allow a source rpm to be rebuilt with debugging enabled. This avoids the need to have to manually modify the spec file. By default debugging is still largely disabled. To enable specific debugging features use the following options with rpmbuild. '--with debug' - Enables ASSERTs # For example: $ rpmbuild --rebuild --with debug zfs-modules-0.6.0-rc6.src.rpm Additionally, ZFS_CONFIG has been added to zfs_config.h for packages which build against these headers. This is critical to ensure both zfs and the dependant package are using the same prototype and structure definitions. Signed-off-by: Brian Behlendorf <[email protected]>
* Add 'dmu_tx' kstats entryBrian Behlendorf2012-02-277-4/+93
| | | | | | | | | Keep counters for the various reasons that a thread may end up in txg_wait_open() waiting on a new txg. This can be useful when attempting to determine why a particular workload is under performing. Signed-off-by: Brian Behlendorf <[email protected]>
* Add arc_state_t stats to arcstatsBrian Behlendorf2012-02-271-0/+73
| | | | | | | | | | To ensure the arc is behaving properly we need greater visibility in to exactly how it's managing the systems memory. This patch takes one step in that direction be adding the current arc_state_t for the anon, mru, mru_ghost, mfu, and mfs_ghost lists. The l2 arc_state_t is already well represented in the arcstats. Signed-off-by: Brian Behlendorf <[email protected]>
* Return success from check_slice() if device doesn't existNed Bass2012-02-271-9/+0
| | | | | | | | | | | | | | | When creating a new pool, make_root_vdev() calls check_in_use() to ensure that none of the consituent disks are in use. If the disk contains a valid vdev label it is read to retrieve the list of its child vdevs and these are checked recursively. However, the partitions stored in the vdev label my no longer exist, for example if the partition table has since been altered. In any such case we would want the pool creation to proceed, so this change removes the check from check_slice() that returns an error if the device doesn't exist. As an added assurance, the Solaris implementation also returns sucess on ENOENT. Signed-off-by: Brian Behlendorf <[email protected]>
* Export symbols for zero-copyAlex Zhuravlev2012-02-171-0/+2
| | | | | | | | Export additional symbols to make use of the DMU's zero-copy API. This allows external modules to move data in to and out of the ARC without incurring the cost of a memory copy. Signed-off-by: Brian Behlendorf <[email protected]>
* Support ashift=13 for 8KB SSD block sizesRichard Yao2012-02-132-2/+2
| | | | | | | | | | | | | | New SSDs are now available which use an internal 8k block size. To make sure ZFS can get the maximum performance out of these devices we're increasing the maximum ashift to 13 (8KB). This value is still small enough that we can fit 16 uberblocks in the vdev ring label. However, I don't want to increase this any futher or it will limit the ability the safely roll back a pool to recover it. Signed-off-by: Brian Behlendorf <[email protected]> Closes #565
* Add 'fsid' mount option to allowed options.Turbo Fredriksson2012-02-131-1/+1
| | | | | | | | | | Resolves nfs-utils-1.0.x compatibility issue which requires that the fsid be set in the export options. exportfs: Warning: /tank/dir requires fsid= for NFS export Signed-off-by: Brian Behlendorf <[email protected]> Closes #570
* Export symbols for zero-copyBrian Behlendorf2012-02-101-4/+4
| | | | | | | | Exported the required symbols to make use of the DMU's zero-copy API. This allows external modules to move data in to and out of the ARC without incurring the cost of a memory copy. Signed-off-by: Brian Behlendorf <[email protected]>
* Use spl_debug_* helpersBrian Behlendorf2012-02-091-2/+2
| | | | | | | | When configuring the spl debug log support use the provided wrapper functions. This ensures that if --disable-debug-log was used when buiding the spl the functions will have no effect. Signed-off-by: Brian Behlendorf <[email protected]>