summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Illumos 5117 - spacemap reallocation can cause corruptionGeorge Wilson2014-09-081-5/+5
| | | | | | | | | | | | | | | | 5117 space map reallocation can cause corruption Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Sebastien Roy <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Richard Lowe <[email protected]> References: https://www.illumos.org/projects/illumos-gate/issues/5117 https://github.com/illumos/illumos-gate/commit/e503a68 Ported by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2662
* Add object type checking to zap_lockdir()Brian Behlendorf2014-09-081-7/+4
| | | | | | | | | | | | | | | | | | | | | | If a non-ZAP object is passed to zap_lockdir() it will be treated as a valid ZAP object. This can result in zap_lockdir() attempting to read what it believes are leaf blocks from invalid disk locations. The SCSI layer will eventually generate errors for these bogus IOs but the caller will hang in zap_get_leaf_byblk(). The good news is that is a situation which can not occur unless the pool has been damaged. The bad news is that there are reports from both FreeBSD and Solaris of damaged pools. Specifically, there are normal files in the filesystem which reference another normal file as their parent. Since pools like this are known to exist the zap_lockdir() function has been updated to verify the type of the object. If a non-ZAP object has been passed it EINVAL will be returned immediately. Signed-off-by: Brian Behlendorf <[email protected]> Issue #2597 Issue #2602
* Linux AIO SupportRichard Yao2014-09-054-39/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfsd uses do_readv_writev() to implement fops->read and fops->write. do_readv_writev() will attempt to read/write using fops->aio_read and fops->aio_write, but it will fallback to fops->read and fops->write when AIO is not available. However, the fallback will perform a call for each individual data page. Since our default recordsize is 128KB, sequential operations on NFS will generate 32 DMU transactions where only 1 transaction was needed. That was unnecessary overhead and we implement fops->aio_read and fops->aio_write to eliminate it. ZFS originated in OpenSolaris, where the AIO API is entirely implemented in userland's libc by intelligently mapping them to VOP_WRITE, VOP_READ and VOP_FSYNC. Linux implements AIO inside the kernel itself. Linux filesystems therefore must implement their own AIO logic and nearly all of them implement fops->aio_write synchronously. Consequently, they do not implement aio_fsync(). However, since the ZPL works by mapping Linux's VFS calls to the functions implementing Illumos' VFS operations, we instead implement AIO in the kernel by mapping the operations to the VOP_READ, VOP_WRITE and VOP_FSYNC equivalents. We therefore implement fops->aio_fsync. One might be inclined to make our fops->aio_write implementation synchronous to make software that expects this behavior safe. However, there are several reasons not to do this: 1. Other platforms do not implement aio_write() synchronously and since the majority of userland software using AIO should be cross platform, expectations of synchronous behavior should not be a problem. 2. We would hurt the performance of programs that use POSIX interfaces properly while simultaneously encouraging the creation of more non-compliant software. 3. The broader community concluded that userland software should be patched to properly use POSIX interfaces instead of implementing hacks in filesystems to cater to broken software. This concept is best described as the O_PONIES debate. 4. Making an asynchronous write synchronous is non sequitur. Any software dependent on synchronous aio_write behavior will suffer data loss on ZFSOnLinux in a kernel panic / system failure of at most zfs_txg_timeout seconds, which by default is 5 seconds. This seems like a reasonable consequence of using non-compliant software. It should be noted that this is also a problem in the kernel itself where nfsd does not pass O_SYNC on files opened with it and instead relies on a open()/write()/close() to enforce synchronous behavior when the flush is only guarenteed on last close. Exporting any filesystem that does not implement AIO via NFS risks data loss in the event of a kernel panic / system failure when something else is also accessing the file. Exporting any file system that implements AIO the way this patch does bears similar risk. However, it seems reasonable to forgo crippling our AIO implementation in favor of developing patches to fix this problem in Linux's nfsd for the reasons stated earlier. In the interim, the risk will remain. Failing to implement AIO will not change the problem that nfsd created, so there is no reason for nfsd's mistake to block our implementation of AIO. It also should be noted that `aio_cancel()` will always return `AIO_NOTCANCELED` under this implementation. It is possible to implement aio_cancel by deferring work to taskqs and use `kiocb_set_cancel_fn()` to set a callback function for cancelling work sent to taskqs, but the simpler approach is allowed by the specification: ``` Which operations are cancelable is implementation-defined. ``` http://pubs.opengroup.org/onlinepubs/009695399/functions/aio_cancel.html The only programs on my system that are capable of using `aio_cancel()` are QEMU, beecrypt and fio use it according to a recursive grep of my system's `/usr/src/debug`. That suggests that `aio_cancel()` users are rare. Implementing aio_cancel() is left to a future date when it is clear that there are consumers that benefit from its implementation to justify the work. Lastly, it is important to know that handling of the iovec updates differs between Illumos and Linux in the implementation of read/write. On Linux, it is the VFS' responsibility whle on Illumos, it is the filesystem's responsibility. We take the intermediate solution of copying the iovec so that the ZFS code can update it like on Solaris while leaving the originals alone. This imposes some overhead. We could always revisit this should profiling show that the allocations are a problem. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #223 Closes #2373
* Fragmentation should display as '-' if spacemap_histogram=disabledilovezfs2014-09-051-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | When com.delphix:spacemap_histogram is disabled, the value of fragmentation was printing as 18446744073709551615 (UINT64_MAX), when it should print as '-'. The issue was caused by a small mistake during the merge of "4980 metaslabs should have a fragmentation metric." upstream: https://github.com/illumos/illumos-gate/commit/2e4c998 ZoL: https://github.com/zfsonlinux/zfs/commit/f3a7f66 The problem is in zpool_get_prop_literal, where the handling of the pool property ZPOOL_PROP_FRAGMENTATION was added to wrong the section. In particular, ZPOOL_PROP_FRAGMENTATION should not be in the section where zpool_get_state(zhp) == POOL_STATE_UNAVAIL, but lower down after it's already been determined that the pool is in fact available, which is where upstream illumos correctly has had it. Thanks to lundman for helping to track down this bug. Signed-off-by: Jorgen Lundman <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2664
* Illumos 5049 - panic when removing log deviceAlex Reece2014-09-051-1/+2
| | | | | | | | | | | | | | | | Reviewed by: George Wilson <[email protected]> Reviewed by: Mattew Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Approved by: Rich Lowe <[email protected]> References: https://www.illumos.org/issues/5049 https://github.com/illumos/illumos-gate/commit/2986efa Ported-by: Brian Behlendorf <[email protected]> Closes #2636
* Fix invalid locking order in rename operationStanislav Seletskiy2014-09-041-17/+20
| | | | | | | | | | | | | This commit should prevent a deadlock on dp_config_rwlock when running `zfs rename` by ensuring zvol_rename_minors() is not called under this lock. Signed-off-by: Stanislav Seletskiy <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2652. Closes #2525.
* Import zfs pools after cryptsetupalteriks2014-09-042-0/+2
| | | | | | | | | | The zfs-import-cache.service and zfs-import-scan.service should should be started after cryptsetup to ensure all LUKS devices have been opened. Signed-off-by: alteriks <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1474
* Change the default 'zfs_dedup_prefetch' value to '0'Alexey Smirnoff2014-09-042-2/+2
| | | | | | | | | This gives a huge performance improvement in operations with deduped datasets especially when the bottleneck is the amount of ram available for zfs. Signed-off-by: Brian Behlendorf <[email protected]> Closes #2639
* Improve handling of filesystem versionsDan Swartzendruber2014-09-033-21/+75
| | | | | | | | | | | | | Change mount code to diagnose filesystem versions that are not supported by the current implementation. Change upgrade code to do likewise and refuse to upgrade a pool if any filesystems on it are a version which is not supported by the current implementation. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Dan Swartzendruber <[email protected]> Closes: #2616
* Change delimiter for ZED email scriptslouwrentius2014-09-022-9/+9
| | | | | | | | | | | | | | | | When the ZED_EMAIL_INTERVAL_SECS="3600" option is set in zed.rc configuration file then notification emails should be rate limited. Rate limiting is accomplished by maintaining a colon delimited state file which includes the device name. Unfortunately there are valid device names which include a colon and therefore prevent the rate limiting for working properly. For this reason the delimiter has been changed to a semi-colon. Signed-off-by: louwrentius <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlap <[email protected]> Closes #2645
* Change startup mode of ZEDRalf Ertzinger2014-09-021-5/+1
| | | | | | | | | | | | Change the startup mode of ZED to non-forking. While systemd can track processes that detach from the terminal just fine, running processes in non-forking mode is the preferred mode of operation. Also remove user/group definitions as root/root is the default. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2252
* Cleanup zed loggingChris Dunlap2014-09-024-62/+58
| | | | | | | | | | | | | | This is a set of minor cleanup changes related to zed logging: - Remove the program identity prefix from messages written to stderr since systemd already prepends this output with the program name. - Replace the copy of the program identity string with a ptr reference. - Replace "pid" with "PID" for consistency in comments & strings. - Rename the zed_log.c struct _ctx component "level" to "priority". - Add the LOG_PID option for messages written to syslog. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2252
* Fix race condition with zed pidfile creationChris Dunlap2014-09-024-21/+183
| | | | | | | | | | | | | | | | | | | When the zed is started as a forking daemon (by default), a race-condition exists where the parent process can terminate before the pidfile has been created by the grandchild process. When invoked as a Type=forking systemd service, this can result in the following: systemd[1]: Starting ZFS Event Daemon (zed)... systemd[1]: PID file /var/run/zed.pid not readable (yet?) after start. This commit adds a daemonize pipe to allow the grandchild process to signal the parent process that initialization is complete (and the pidfile has been created). The parent process will wait for this notification before exiting. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2252
* Add a missing > to AUTHORSBrian Behlendorf2014-09-021-1/+1
| | | | | | | An email address in the AUTHORS file was missing its trailing >. This patch fixes that typo. Signed-off-by: Brian Behlendorf <[email protected]>
* Add a pkgconfig fileTurbo Fredriksson2014-08-286-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | Providing a pkg-config file makes is easy for 3rd party applications to link against the libzfs libraries. It also allows the libzfs developers to modify the list of required libraries and cflags without breaking existing applications. The following example illustrates how pkg-config can be used: cc `pkg-config --cflags --libs libzfs` -o myapp myapp.c /* * myapp.c */ void main() { libzfs_handle_t *hdl; hdl = libzfs_init(); if (hdl) libzfs_fini(hdl); } Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes: #585
* Retire HAVE_IOCTL_* configure checksBrian Behlendorf2014-08-283-44/+0
| | | | | | | | | | | The HAVE_IOCTL_* configure checks were originally added for compatibility with an ancient version of glibc. This support and additional complexity is no longer needed and is therefore being removed. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Turbo Fredriksson <[email protected]> Closes #585
* Illumos 4970-4974 - extreme rewind enhancementsMatthew Ahrens2014-08-263-15/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | 4970 need controls on i/o issued by zpool import -XF 4971 zpool import -T should accept hex values 4972 zpool import -T implies extreme rewind, and thus a scrub 4973 spa_load_retry retries the same txg 4974 spa_load_verify() reads all data twice Reviewed by: Christopher Siden <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://www.illumos.org/issues/4970 https://www.illumos.org/issues/4971 https://www.illumos.org/issues/4972 https://www.illumos.org/issues/4973 https://www.illumos.org/issues/4974 https://github.com/illumos/illumos-gate/commit/e42d205 Notes: This set of patches adds a set of tunable parameters for the "extreme rewind" mode of pool import which allows control over the traversal performed during such an import. Ported by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2598
* Illumos 5034 - ARC's buf_hash_table is too smallMatthew Ahrens2014-08-262-3/+26
| | | | | | | | | | | | | | | | | 5034 ARC's buf_hash_table is too small Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Gordon Ross <[email protected]> References: https://www.illumos.org/issues/5034 https://github.com/illumos/illumos-gate/commit/63e911b Ported-by: Brian Behlendorf <[email protected]> Closes #2615
* 2493 change efi_rescan() to wait longerAndrew Hamilton2014-08-261-2/+3
| | | | | | | | | | | Change efi_rescan() to loop 10 times instead of 5 on EBUSY and to sleep at the end of each loop. This helps with some instances where the kernel does not reload the partition table fast enough for ZFS to detect. Signed-off-by: Andrew Hamilton <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2493
* Fixed memory leaks in zevent handlingIsaac Huang2014-08-203-19/+45
| | | | | | | | | | Some nvlist_t could be leaked in error handling paths. Also make sure cb argument to zfs_zevent_post() cannnot be NULL. Signed-off-by: Isaac Huang <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2158
* Illumos 4631 - zvol_get_stats triggering too many readsMatthew Ahrens2014-08-203-79/+64
| | | | | | | | | | | | | | | | | | 4631 zvol_get_stats triggering too many reads Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Sebastien Roy <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/4631 https://github.com/illumos/illumos-gate/commit/bbfa8ea Ported-by: Boris Protopopov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2612 Closes #2480
* Drive database updateRichard Yao2014-08-181-14/+22
| | | | | | | | | | | | | | | | | | | The Intel DC S3500 and Intel DC S3700 are optimized to handle 4KB sectors well despite of their 8KB page sizes, so we move them to a new category for enterprise drives where they will receive ashift=12. They are joined by the Intel 730 series, which uses the same disk controller, as well as a San Disk enterprise drive. The drive IDs for these two were obtained by myself with the drive_id utility. The drive ID for the 240GB Intel 730 model was extrapolated from the drive ID for the 480GB model. Lastly, we also add some Western Digital mobile drives. ryuo in \#zfsonlinux on freenode obtained "ATA WDC WD2500BEVT-0" from running drive_id on his own hardware. The additional drives in that family were extrapolated from that identifer. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2601
* Don't upgrade a metaslab when the pool is not writableTim Chase2014-08-181-5/+8
| | | | | | | | | | | | | | Illumos 4982 added code to metaslab_fragmentation() to proactively update space maps when the spacemap_histogram feature is enabled. This should only happen when the pool is writeable. References: https://www.illumos.org/issues/4982 https://github.com/illumos/illumos-gate/commit/2e4c998 Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2595
* Illumos 4976-4984 - metaslab improvementsGeorge Wilson2014-08-1818-243/+836
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 4976 zfs should only avoid writing to a failing non-redundant top-level vdev 4978 ztest fails in get_metaslab_refcount() 4979 extend free space histogram to device and pool 4980 metaslabs should have a fragmentation metric 4981 remove fragmented ops vector from block allocator 4982 space_map object should proactively upgrade when feature is enabled 4983 need to collect metaslab information via mdb 4984 device selection should use fragmentation metric Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Christopher Siden <[email protected]> Approved by: Garrett D'Amore <[email protected]> References: https://www.illumos.org/issues/4976 https://www.illumos.org/issues/4978 https://www.illumos.org/issues/4979 https://www.illumos.org/issues/4980 https://www.illumos.org/issues/4981 https://www.illumos.org/issues/4982 https://www.illumos.org/issues/4983 https://www.illumos.org/issues/4984 https://github.com/illumos/illumos-gate/commit/2e4c998 Notes: The "zdb -M" option has been re-tasked to display the new metaslab fragmentation metric and the new "zdb -I" option is used to control the maximum number of in-flight I/Os. The new fragmentation metric is derived from the space map histogram which has been rolled up to the vdev and pool level and is presented to the user via "zpool list". Add a number of module parameters related to the new metaslab weighting logic. Ported by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2595
* Create an 'overlay' propertyTurbo Fredriksson2014-08-154-0/+28
| | | | | | | | | | | | | | | Add a new 'overlay' property (default 'off') that controls whether the filesystem should be mounted even if the mountpoint is busy or if it should fail with a 'mountpoint not empty'. Doing overlay mounts is the default mount behavior on Linux, but not in ZFS. It have been decided that following the ZFS behavior should be the default, but this overlay allows for site administrator to override this decision on a per-dataset basis. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes: #2503
* Include sys/taskq.h in linux/vfs_compat.hRichard Yao2014-08-141-0/+2
| | | | | | | | | | | | We should have included sys/taskq.h directly because we use the taskq code here, but we instead had files that included sys/taskq.h also include sys/kmem.h, which happened to include sys/taskq.h. sys/kmem.h no longer does this, so we must define the include as we should have done in the first place. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2411
* Correct autodetection of bootfs propertyEvan Susarret2014-08-131-2/+2
| | | | | | | | | | | | | Remove lines that contain only a hyphen (match '^-$' instead of '-'). I had a root fs with a hyphen in the name (fedora/ROOT/Fedora20-Dev), it was not detected because sed eliminated that line of output from 'zpool list -Ho bootfs'. Signed-off-by: Evan Susarret <[email protected]> Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2196
* Avoid PAGESIZE redefinitionAlec Salazar2014-08-131-0/+2
| | | | | | | | | Add #ifndef PAGESIZE to avoid redefinition warning on platforms where this value is already provided. Signed-off-by: Alec Salazar <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2588
* Replace __va_list with va_listAlec Salazar2014-08-133-7/+3
| | | | | | | | | | Most of the code base already uses va_list, which is specified by iso-c. gcc/glibc provides 'typedef __gnuc_va_list va_list'. and when not using gcc/glibc we can't expect to find __gnuc_va_list. Signed-off-by: Alec Salazar <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2588
* Revert "Revert "Revert "Fix unlink/xattr deadlock"""Brian Behlendorf2014-08-111-85/+57
| | | | | | | | | | | | | | | | | | This reverts commit 7973e46 which brings the basic flow of the code back in line with the other ZFS implementations. This was possible due to the following related changes. e89260a Directory xattr znodes hold a reference on their parent 6f9548c Fix deadlock in zfs_zget() 0a50679 Add zfs_iput_async() interface 4dd1893 Avoid 128K kmem allocations in mzap_upgrade() Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #457 Closes #2058 Closes #2128 Closes #2240
* Add zfs_iput_async() interfaceBrian Behlendorf2014-08-114-11/+17
| | | | | | | | | | | Handle all iputs in zfs_purgedir() and zfs_inode_destroy() asynchronously to prevent deadlocks. When the iputs are allowed to run synchronously in the destroy call path deadlocks between xattr directory inodes and their parent file inodes are possible. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #457
* Avoid 128K kmem allocations in mzap_upgrade()Brian Behlendorf2014-08-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | As originally implemented the mzap_upgrade() function will perform up to SPA_MAXBLOCKSIZE allocations using kmem_alloc(). These large allocations can potentially block indefinitely if contiguous memory is not available. Since this allocation is done under the zap->zap_rwlock it can appear as if there is a deadlock in zap_lockdir(). This is shown below. The optimal fix for this would be to rework mzap_upgrade() such that no large allocations are required. This could be done but it would result in us diverging further from the other implementations. Therefore I've opted against doing this unless it becomes absolutely necessary. Instead mzap_upgrade() has been updated to use zio_buf_alloc() which can reliably provide buffers of up to SPA_MAXBLOCKSIZE. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Close #2580
* Avoid dynamic allocation of 'search zio'Brian Behlendorf2014-08-112-6/+5
| | | | | | | | | | | | | | | | As part of commit e8b96c6 the search zio used by the vdev_queue_io_to_issue() function was moved to the heap to minimize stack usage. Functionally this is fine, but to maximize performance it's best to minimize the number of dynamic allocations. To avoid this allocation temporary space for the search zio has been reserved in the vdev_queue structure. All access must be serialized through the vq_lock. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #2572
* Use KM_PUSHPAGE in dsl_dataset_rollback_check()Brian Behlendorf2014-08-061-2/+2
| | | | | | | | | | | The dsl_dataset_rollback_check() function is executed in the txg_sync context. To prevent a potential deadlock due to direct memory reclaim it must use KM_PUSHPAGE. This was introduced by the recent 'zfs bookmark' features, commit da53684. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Eric Dillmann <[email protected]> Closes #2569
* Add bash completions by Aneurin Price.Turbo Fredriksson2014-08-065-1/+401
| | | | | | | | | | These can be manually installed as needed by end users. They have been added to the repository so they can be kept up to date with the latest code. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1588
* Illumos 4914 - zfs on-disk bookmark structure should be named *_phys_tMatthew Ahrens2014-08-0627-132/+137
| | | | | | | | | | | | | | | | | | | | | | | | 4914 zfs on-disk bookmark structure should be named *_phys_t Reviewed by: George Wilson <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Richard Lowe <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://www.illumos.org/issues/4914 https://github.com/illumos/illumos-gate/commit/7802d7b Porting notes: There were a number of zfsonlinux-specific uses of zbookmark_t which needed to be updated. This should reduce the likelihood of further problems like issue #2094 from occurring. Ported by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2558
* Illumos 4881 - zfs send performance regression with embedded dataMatthew Ahrens2014-08-061-15/+21
| | | | | | | | | | | | | | | | | 4881 zfs send performance degradation when embedded block pointers are encountered Reviewed by: George Wilson <[email protected]> Reviewed by: Christopher Siden <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/4881 https://github.com/illumos/illumos-gate/commit/06315b7 Ported by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2547
* Illumos 4897 - Space accounting mismatch in L2ARC/zpoolSaso Kiselkov2014-08-061-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | 4897 Space accounting mismatch in L2ARC/zpool Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Boris Protopopov <[email protected]> Approved by: Dan McDonald <[email protected]> From the illumos issue tracker: L2ARC vdev space usage statistics are calculated as the delta between the maximum and minimum vdev offset ever written to by the L2ARC fill thread, but do not inform the user of how much space in between these two offsets is actually taken up by cached buffers. This fix changes that so that vdev space usage stats on L2ARC devices accurately track the volume of buffers stored on them, allowing users to see the exact L2ARC usage in "zpool iostat -v". References: https://www.illumos.org/issues/4897 https://github.com/illumos/illumos-gate/commit/3038a2b Ported by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2555
* Illumos 4390 - I/O errors can corrupt space map when deleting fs/volMatthew Ahrens2014-08-0417-157/+339
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 4390 i/o errors when deleting filesystem/zvol can lead to space map corruption Reviewed by: George Wilson <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/4390 https://github.com/illumos/illumos-gate/commit/7fd05ac Porting notes: Previous stack-reduction efforts in traverse_visitb() caused a fair number of un-mergable pieces of code. This patch should reduce its stack footprint a bit more. The new local bptree_entry_phys_t in bptree_add() is dynamically-allocated using kmem_zalloc() for the purpose of stack reduction. The new global zfs_free_leak_on_eio has been defined as an integer rather than a boolean_t as was the case with the related zfs_recover global. Also, zfs_free_leak_on_eio's definition has been inserted into zfs_debug.c for consistency with the existing definition of zfs_recover. Illumos placed it in spa_misc.c. Ported by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2545
* Illumos 4757, 4913Matthew Ahrens2014-08-0146-259/+1196
| | | | | | | | | | | | | | | | | | | | | | | | | | | | 4757 ZFS embedded-data block pointers ("zero block compression") 4913 zfs release should not be subject to space checks Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Max Grossman <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Dan McDonald <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/4757 https://www.illumos.org/issues/4913 https://github.com/illumos/illumos-gate/commit/5d7b4d4 Porting notes: For compatibility with the fastpath code the zio_done() function needed to be updated. Because embedded-data block pointers do not require DVAs to be allocated the associated vdevs will not be marked and therefore should not be unmarked. Ported by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2544
* Illumos 3835 zfs need not store 2 copies of all metadataMatthew Ahrens2014-07-317-23/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reviewed by: George Wilson <[email protected]> Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Dan McDonald <[email protected]> Approved by: Richard Lowe <[email protected]> Description from Matt Ahrens's bug report at Delphix: Add a new zfs property, "redundant_metadata" which can have values "all" or "most". The default will be "all", which is the current behavior. Setting to "most" will cause us to only store 1 copy of level-1 indirect blocks of user data files. Additional notes: The new man page section for this property states "The exact behavior of which metadata blocks are stored redundantly may change in future releases." and: "When set to most, ZFS stores an extra copy of most types of metadata. This can improve performance of random writes, because less metadata must be written." The current implementation is as described above in Matt's blog. It is controlled by a new global integer "zfs_redundant_metadata_most_ditto_level", currently initialized to 2. When "redundant_metadata" is set to "most", only indirect blocks of the specified level and higher will have additional ditto blocks created. Ported by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2542
* zed needs libzfs_coreTim Chase2014-07-311-1/+2
| | | | | | | | | | As of a recent group of Illumos/Delphix updates, zed needs libzfs_core in order to resolve lzc_get_bookmarks() and likely other functions going forward. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2534
* Illumos 4754, 4755George Wilson2014-07-303-52/+7
| | | | | | | | | | | | | | | | | | | 4754 io issued to near-full luns even after setting noalloc threshold 4755 mg_alloc_failures is no longer needed Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Dan McDonald <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/4754 https://www.illumos.org/issues/4755 https://github.com/illumos/illumos-gate/commit/b6240e8 Ported by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2533
* Illumos #4374Matthew Ahrens2014-07-3019-174/+128
| | | | | | | | | | | | | | | | | | | 4374 dn_free_ranges should use range_tree_t Reviewed by: George Wilson <[email protected]> Reviewed by: Max Grossman <[email protected]> Reviewed by: Christopher Siden <[email protected] Reviewed by: Garrett D'Amore <[email protected]> Reviewed by: Dan McDonald <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/4374 https://github.com/illumos/illumos-gate/commit/bf16b11 Ported by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2531
* Illumos 4368, 4369.Matthew Ahrens2014-07-2933-264/+1671
| | | | | | | | | | | | | | | | | 4369 implement zfs bookmarks 4368 zfs send filesystems from readonly pools Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Garrett D'Amore <[email protected]> References: https://www.illumos.org/issues/4369 https://www.illumos.org/issues/4368 https://github.com/illumos/illumos-gate/commit/78f1710 Ported by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2530
* Illumos 4370, 4371Max Grossman2014-07-2832-384/+689
| | | | | | | | | | | | | | | | | | | | 4370 avoid transmitting holes during zfs send 4371 DMU code clean up Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Josef 'Jeff' Sipek <[email protected]> Approved by: Garrett D'Amore <[email protected]>a References: https://www.illumos.org/issues/4370 https://www.illumos.org/issues/4371 https://github.com/illumos/illumos-gate/commit/43466aa Ported by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2529
* Illumos 4171, 4172Matthew Ahrens2014-07-2527-276/+369
| | | | | | | | | | | | | | | | | | | | 4171 clean up spa_feature_*() interfaces 4172 implement extensible_dataset feature for use by other zpool features Reviewed by: Max Grossman <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Jerry Jelinek <[email protected]> Approved by: Garrett D'Amore <[email protected]>a References: https://www.illumos.org/issues/4171 https://www.illumos.org/issues/4172 https://github.com/illumos/illumos-gate/commit/2acef22 Ported-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2528
* Remove patches directoryBrian Behlendorf2014-07-252-63/+0
| | | | | | | | Support for ZFS has now been merged in to both blkid and grub. Therefore, there is no longer a need to carry these stale patches in the ZFS source tree. Signed-off-by: Brian Behlendorf <[email protected]>
* Support '-H' (scripted mode) to 'zpool get'Turbo Fredriksson2014-07-252-3/+18
| | | | | | | | | This functionality is already available in 'zfs get'. Providing it for 'zpool get' is useful and good for consistency. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes: #2522
* Initial attempt to document events and payloads.Turbo Fredriksson2014-07-252-1/+750
| | | | | | | | | | | | In no way complete - most have been trial and error and some deducing what they could mean. It needs more information from someone that knows the code better. But this is a start and it lays the basic structure for adding this additional detail. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2357