aboutsummaryrefslogtreecommitdiffstats
path: root/module
Commit message (Collapse)AuthorAgeFilesLines
* Change KM_SLEEP to TQ_SLEEP in spa_deadman()Tim Chase2016-03-091-1/+1
| | | | | | | | | | Since they both evaluate to zero, this is a semi-cosmetic change but the latter is the proper value to use as an argument to taskq_dispatch_delay(). Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4393
* Make zvol update volsize operation synchronous.ab-oe2016-02-291-0/+4
| | | | | | | | | | | | | There is a race condition when new transaction group is added to dp->dp_dirty_datasets list by the zap_update in the zvol_update_volsize. Meanwhile, before these dirty data are synchronized, the receive process can cause that dmu_recv_end_sync is executed. Then finally dirty data are going to be synchronized but the synchronization ends with the NULL pointer dereference error. Signed-off-by: ab-oe <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4116
* FreeBSD r256956: Improve ZFS N-way mirror read performance by using load and ↵smh2016-02-262-111/+240
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | locality information. The existing algorithm selects a preferred leaf vdev based on offset of the zio request modulo the number of members in the mirror. It assumes the devices are of equal performance and that spreading the requests randomly over both drives will be sufficient to saturate them. In practice this results in the leaf vdevs being under utilized. The new algorithm takes into the following additional factors: * Load of the vdevs (number outstanding I/O requests) * The locality of last queued I/O vs the new I/O request. Within the locality calculation additional knowledge about the underlying vdev is considered such as; is the device backing the vdev a rotating media device. This results in performance increases across the board as well as significant increases for predominantly streaming loads and for configurations which don't have evenly performing devices. The following are results from a setup with 3 Way Mirror with 2 x HD's and 1 x SSD from a basic test running multiple parrallel dd's. With pre-fetch disabled (vfs.zfs.prefetch_disable=1): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s With pre-fetch enabled (vfs.zfs.prefetch_disable=0): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s In addition to the performance changes the code was also restructured, with the help of Justin Gibbs, to provide a more logical flow which also ensures vdevs loads are only calculated from the set of valid candidates. The following additional sysctls where added to allow the administrator to tune the behaviour of the load algorithm: * vfs.zfs.vdev.mirror.rotating_inc * vfs.zfs.vdev.mirror.rotating_seek_inc * vfs.zfs.vdev.mirror.rotating_seek_offset * vfs.zfs.vdev.mirror.non_rotating_inc * vfs.zfs.vdev.mirror.non_rotating_seek_inc These changes where based on work started by the zfsonlinux developers: https://github.com/zfsonlinux/zfs/pull/1487 Reviewed by: gibbs, mav, will MFC after: 2 weeks Sponsored by: Multiplay References: https://github.com/freebsd/freebsd@5c7a6f5d https://github.com/freebsd/freebsd@31b7f68d https://github.com/freebsd/freebsd@e186f564 Performance Testing: https://github.com/zfsonlinux/zfs/pull/4334#issuecomment-189057141 Porting notes: - The tunables were adjusted to have ZoL-style names. - The code was modified to use ZoL's vd_nonrot. - Fixes were done to make cstyle.pl happy - Merge conflicts were handled manually - freebsd/freebsd@e186f564bc946f82c76e0b34c2f0370ed9aea022 by my collegue Andriy Gapon has been included. It applied perfectly, but added a cstyle regression. - This replaces 556011dbec2d10579819078559a77630fc559112 entirely. - A typo "IO'a" has been corrected to say "IO's" - Descriptions of new tunables were added to man/man5/zfs-module-parameters.5. Ported-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4334
* Add l2arc_max_block_size tunableBrian Behlendorf2016-02-251-1/+33
| | | | | | | | | | | Set a limit for the largest compressed block which can be written to an L2ARC device. By default this limit is set to 16M so there is no change in behavior. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Elling <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #4323
* Make zvol minor functionality more robustBoris Protopopov2016-02-241-16/+77
| | | | | | | | | | | | | | | | | | | | | | | | Close the race window in zvol_open() to prevent removal of zvol_state in the 'first open' code path. Move the call to check_disk_change() under zvol_state_lock to make sure the zvol_media_changed() and zvol_revalidate_disk() called by check_disk_change() are invoked with positive zv_open_count. Skip opened zvols when removing minors and set private_data to NULL for zvols that are not in use whose minors are being removed, to indicate to zvol_open() that the state is gone. Skip opened zvols when renaming minors to avoid modifying zv_name that might be in use, e.g. in zvol_ioctl(). Drop zvol_state_lock before calling add_disk() when creating minors to avoid deadlocks with zvol_open(). Wrap dmu_objset_find() with spl_fstran_mark()/unmark(). Signed-off-by: Boris Protopopov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #4344
* Call dmu_read_uio_dbuf() in zvol_read()Richard Yao2016-02-181-1/+1
| | | | | | | | | | | | | | | | | The difference between `dmu_read_uio()` and `dmu_read_uio_dbuf()` is that the former takes a hold while the latter uses an existing hold. `zfs_read()` in the ZPL will use `dmu_read_uio_dbuf()` while our analogous `zvol_write()` will use `dmu_write_uio_dbuf()`, but for no apparent reason, we inherited a `zvol_read()` function from OpenSolaris that does `dmu_read_uio()`. illumos-gate also still uses `dmu_read_uio()` to this day. Lets switch to `dmu_read_uio_dbuf()`, which is more performant. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #4316
* Clean up zvol request processing to pass uio and fix porting regressionsRichard Yao2016-02-181-76/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In illumos-gate, `zvol_read` and `zvol_write` are both passed uio_t rather than bio_t. Since we are translating from bio to uio for both, we might as well unify the logic and have code more similar to its illumos counterpart. At the same time, we can fix some regressions that occurred versus the original code from illumos-gate. We refactor zvol_write to take uio and also correct the following problems: 1. We did `dnode_hold()` on each IO when we already had a hold. 2. We would attempt to send writes that exceeded `DMU_MAX_ACCESS` to the DMU. 3. We could call `zil_commit()` twice. In this case, this is because Linux uses the `->write` function to send flushes and can aggregate the flush with a write. If a synchronous write occurred with the flush, we effectively flushed twice when there is no need to do that. zvol_read also suffers from the first two problems. Other platforms suffer from the first, so we leave that for a second patch so that there is a discrete patch for them to cherry-pick. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #4316
* Remove wrong ASSERT in annotate_ecksumChunwei Chen2016-02-171-2/+2
| | | | | | | | | | When using large blocks like 1M, there will be more than UINT16_MAX qwords in one block, so this ASSERT would go off. Also, it is possible for the histogram to overflow. We cap them to UINT16_MAX to prevent this. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4257
* Illumos 5809 - Blowaway full receive in v1 pool causes kernel panicPaul Dagnelie2016-02-081-1/+2
| | | | | | | | | | | | | | 5809 Blowaway full receive in v1 pool causes kernel panic Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Alex Reece <[email protected]> Reviewed by: Will Andrews <[email protected]> Approved by: Gordon Ross <[email protected]> References: https://www.illumos.org/issues/5809 https://github.com/illumos/illumos-gate/commit/f40b29c Ported-by: Brian Behlendorf <[email protected]>
* Illumos 6537 - Panic on zpool scrub with DEBUG kernelGary Mills2016-02-051-2/+4
| | | | | | | | | | | | | | | 6537 Panic on zpool scrub with DEBUG kernel Reviewed by: Steve Gonczi <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Matthew Ahrens <[email protected]> References: https://www.illumos.org/issues/6537 https://github.com/illumos/illumos-gate/commit/8c04a1f Ported-by: Brian Behlendorf <[email protected]>
* Illumos 6096 - ZFS_SMB_ACL_RENAME needs to cleanup betterDan McDonald2016-02-051-0/+1
| | | | | | | | | | | | | | 6096 ZFS_SMB_ACL_RENAME needs to cleanup better Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Gordon Ross <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://www.illumos.org/issues/6096 https://github.com/illumos/illumos-gate/commit/8f5190a5 Ported-by: Brian Behlendorf <[email protected]>
* Illumos 6450 - scrub/resilver unnecessarily traverses snapshotsMatthew Ahrens2016-02-021-4/+50
| | | | | | | | | | | | | | | | 6450 scrub/resilver unnecessarily traverses snapshots created after the scrub started Reviewed by: George Wilson <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Richard Lowe <[email protected]> References: https://www.illumos.org/issues/6450 https://github.com/illumos/illumos-gate/commit/38d6103 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Handling negative dentries in a CI file system.Richard Sharpe2016-02-021-2/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | For a Case Insensitive file system we must avoid creating negative entries in the dentry cache. We must also pass the FIGNORECASE into zfs_lookup so that special files are handled correctly. We must also prevent negative dentries from being created when files are unlinked. Tested by running fsstress from LTP (10 loops, 10 processes, 10,000 ops.) Also tested with printks (now removed) to ensure that lookups come to zpl_lookup when negative should not exist. Tests: 1. ls Some-file.txt; touch some-file.txt; ls Some-file.txt and ensure no errors. 2. touch Some-file.txt; rm some-file.txt; ls Some-file.txt and ensure that the last ls shows log messages showing the lookup went all the way to zpl_lookup. Thanks to tuxoko for helping me get this correct. Signed-off-by: Richard Sharpe <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4243
* Illumos 6527 - Possible access beyond end of string in zpool commentJorgen Lundman2016-01-281-1/+0
| | | | | | | | | | | | | | | 6527 Possible access beyond end of string in zpool comment Reviewed by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Dan McDonald <[email protected]> Approved by: Gordon Ross <[email protected]> References: https://www.illumos.org/issues/6527 https://github.com/illumos/illumos-gate/commit/2bd7a8d Ported-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]>
* Illumos 6495 - Fix mutex leak in dmu_objset_find_dpBrian Behlendorf2016-01-281-0/+3
| | | | | | | | | | | | | | 6495 Fix mutex leak in dmu_objset_find_dp Reviewed by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Albert Lee <[email protected]> References: https://www.illumos.org/issues/6495 https://github.com/illumos/illumos-gate/commit/2bad225 Ported-by: Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]>
* Illumos 6414 - vdev_config_sync could be simplerBrian Behlendorf2016-01-282-14/+15
| | | | | | | | | | | | | | 6414 vdev_config_sync could be simpler Reviewed by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://www.illumos.org/issues/6414 https://github.com/illumos/illumos-gate/commit/eb5bb58 Ported-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]>
* llumos 6334 - Cannot unlink files when over quotaSimon Klinkert2016-01-261-5/+2
| | | | | | | | | | | | | 6334 Cannot unlink files when over quota Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Toomas Soome <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/6334 https://github.com/illumos/illumos-gate/commit/6575bca Ported-by: Brian Behlendorf <[email protected]>
* Reintroduce zfs_remove() synchronous deleteskernelOfTruth2016-01-261-12/+66
| | | | | | | | | | | | | | | | | | | | | | | | Reintroduce a slightly adapted version of the Illumos logic for synchronous unlinks. The basic idea here is that only files smaller than zfs_delete_blocks (20480) blocks should be deleted synchronously. Unlinking larger files should be handled asynchronously to minimize impact to the caller. To accomplish this iput() which is responsible for calling zfs_znode_delete() on Linux is only called in the delete_now path. Otherwise zfs_async_iput() is used which allows the last reference to be dropped by a taskq thread effectively making the removal asynchronous. Porting notes: - Add zfs_delete_blocks module option for performance analysis. The default value is DMU_MAX_DELETEBLKCNT which is the same as upstream. Reducing this value means that smaller files will be unlinked asynchronously like large files. - All occurrences of zfsvfs changes to zsb. Ported-by: KernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Log zvol truncate/discard operationsDan McDonald2016-01-261-8/+59
| | | | | | | | | | | | | | As the comments in zvol_discard() suggested, the discard operation could be logged to the zil. This is a port of the relevant code from Nexenta as it was added in "701 UNMAP support for COMSTAR" and has been attributed to the author of that commit. References: https://github.com/Nexenta/illumos-nexenta/commit/b77b923 https://github.com/zfsonlinux/zfs/blob/089fa91b/module/zfs/zvol.c#L637 Ported-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 6251 - add tunable to disable free_bpobj processingGeorge Wilson2016-01-251-2/+11
| | | | | | | | | | | | | | | | | | | | | 6251 - add tunable to disable free_bpobj processing Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Simon Klinkert <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Albert Lee <[email protected]> Reviewed by: Xin Li <[email protected]> Approved by: Garrett D'Amore <[email protected]> References: https://www.illumos.org/issues/6251 https://github.com/illumos/illumos-gate/commit/139510f Porting notes: - Added as module option declaration. - Added to zfs-module-parameters.5 man page. Ported-by: Signed-off-by: Brian Behlendorf <[email protected]>
* Set arc_c_min properly in userland buildsTim Chase2016-01-251-4/+4
| | | | | | | | | | Since it's set to arc_c_max / 2, it must be set after arc_c_max is set. Also added protection against it falling below 2 * maxblocksize in userland builds. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4268
* Prevent arc_c collapseTim Chase2016-01-251-8/+8
| | | | | | | | | | | | | | | Adjusting arc_c directly is racy because it can happen in the context of multiple threads. It should always be >= 2 * maxblocksize. Set it to a known valid value rather than adjusting it directly. In addition refactor arc_shrink() to a simpler structure, protect against underflow in the calculation of the new arc_c value. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Reverts: 935434ef Closes: #3904 Closes: #4161
* Illumos 4950 - files sometimes can't be removed from a full filesystemMatthew Ahrens2016-01-216-7/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 4950 files sometimes can't be removed from a full filesystem Reviewed by: Adam Leventhal <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Sebastien Roy <[email protected]> Reviewed by: Boris Protopopov <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/4950 https://github.com/illumos/illumos-gate/commit/4bb7380 Porting notes: - ZoL currently does not log discards to zvols, so the portion of this patch that modifies the discard logging to mark it as freeing space has been discarded. 2. may_delete_now had been removed from zfs_remove() in ZoL. It has been reintroduced. 3. We do not try to emulate vnodes, so the following lines are not valid on Linux: mutex_enter(&vp->v_lock); may_delete_now = vp->v_count == 1 && !vn_has_cached_data(vp); mutex_exit(&vp->v_lock); This has been replaced with: mutex_enter(&zp->z_lock); may_delete_now = atomic_read(&ip->i_count) == 1 && !(zp->z_is_mapped); mutex_exit(&zp->z_lock); Ported-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Close possible zfs_znode_held() raceBrian Behlendorf2016-01-201-4/+3
| | | | | | | | | | | | Check if the lock is held while holding the z_hold_locks() lock. This prevents a possible use-after-free bug for callers which are not holding the lock. There currently are no such callers so this can't cause a problem today but it has been fixed regardless. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #4244 Issue #4124
* Linux 4.5 compat: xattr list handlerBrian Behlendorf2016-01-201-197/+255
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The registered xattr .list handler was simplified in the 4.5 kernel to only perform a permission check. Given a dentry for the file it must return a boolean indicating if the name is visible. This differs slightly from the previous APIs which also required the function to copy the name in to the provided list and return its size. That is now all the responsibility of the caller. This should be straight forward change to make to ZoL since we've always required the caller to make the copy. However, this was slightly complicated by the need to support 3 older APIs. Yes, between 2.6.32 and 4.5 there are 4 versions of this interface! Therefore, while the functional change in this patch is small it includes significant cleanup to make the code understandable and maintainable. These changes include: - Improved configure checks for .list, .get, and .set interfaces. - Interfaces checked from newest to oldest. - Strict checking for each possible known interface. - Configure fails when no known interface is available. - HAVE_*_XATTR_LIST renamed HAVE_XATTR_LIST_* for consistency with similar iops and fops configure checks. - POSIX_ACL_XATTR_{DEFAULT|ACCESS} were removed forcing callers to move to their replacements, XATTR_NAME_POSIX_ACL_{DEFAULT|ACCESS}. Compatibility wrapper were added for old kernels. - ZPL_XATTR_LIST_WRAPPER added which behaves the same as the existing ZPL_XATTR_{GET|SET} WRAPPERs. Only the inode is guaranteed to be a valid pointer, passing NULL for the 'list' and 'name' variables is allowed and must be checked for. All .list functions were updated to use the wrapper to aid readability. - zpl_xattr_filldir() updated to use the .list function for its permission check which is consistent with the updated Linux 4.5 interface. If a .list function is registered it should return 0 to indicate a name should be skipped, if there is no registered function the name will be added. - Additional documentation from xattr(7) describing the correct behavior for each namespace was added before the relevant handlers. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Issue #4228
* Linux 4.5 compat: get_link() / put_link()Brian Behlendorf2016-01-201-33/+92
| | | | | | | | | | | | | | | | | | | | The follow_link() interface was retired in favor of get_link(). In the process of phasing in get_link() the Linux kernel went through two different versions. The first of which depended on put_link() and the final version on a delayed done function. - Improved configure checks for .follow_link, .get_link, .put_link. - Interfaces checked from newest to oldest. - Strict checking for each possible known interface. - Configure fails when no known interface is available. - Both versions .get_link are detected and supported as well two previous versions of .follow_link. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Issue #4228
* Illumos 5045 - use atomic_{inc,dec}_* instead of atomic_add_*Josef 'Jeff' Sipek2016-01-156-71/+68
| | | | | | | | | | | | | | | | | | 5045 use atomic_{inc,dec}_* instead of atomic_add_* Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://www.illumos.org/issues/5045 https://github.com/illumos/illumos-gate/commit/1a5e258 Porting notes: - All changes to non-ZFS files dropped. - Changes to zfs_vfsops.c dropped because they were Illumos specific. Ported-by: Brian Behlendorf <[email protected]> Closes #4220
* Illumos 4039 - zfs_rename()/zfs_link() needs stronger test for XDEVMarcel Telka2016-01-151-5/+14
| | | | | | | | | | | | | | | | | | | | | 4039 zfs_rename()/zfs_link() needs stronger test for XDEV Reviewed by: Gordon Ross <[email protected]> Reviewed by: Kevin Crowe <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/4039 https://github.com/illumos/illumos-gate/commit/18e6497 Porting notes: - This check was updated in Linux in a similar fashion early on in the port. Therefore, this patch just reorders the function and updates the comment so it flows the same way as the upstream code. Ported-by: Brian Behlendorf <[email protected]> Closes #4218
* Illumos 3557, 3558, 3559, 3560George Wilson2016-01-151-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | 3557 dumpvp_size is not updated correctly when a dump zvol's size is changed 3558 setting the volsize on a dump device does not return back ENOSPC 3559 setting a volsize larger than the space available sometimes succeeds 3560 dumpadm should be able to remove a dump device Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Christopher Siden <[email protected]> Approved by: Albert Lee <[email protected]> References: https://www.illumos.org/issues/3559 https://github.com/illumos/illumos-gate/commit/c61ea56 Porting notes: - Internal zvol.c changes not applied due to implementation differences. The external interface and behavior was already consistent with the latest upstream code. - Retired 2.6.28 HAVE_CHECK_DISK_SIZE_CHANGE configure check. All supported kernels (2.6.32 and newer) provide this interface. Ported-by: Brian Behlendorf <[email protected]> Closes #4217
* Prevent duplicated xattr between SA and dirChunwei Chen2016-01-151-5/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When replacing an xattr would cause overflowing in SA, we would fallback to xattr dir. However, current implementation don't clear the one in SA, so we would end up with duplicated SA. For example, running the following script on an xattr=sa filesystem would cause duplicated "user.1". -- dup_xattr.sh begin -- randbase64() { dd if=/dev/urandom bs=1 count=$1 2>/dev/null | openssl enc -a -A } file=$1 touch $file setfattr -h -n user.1 -v `randbase64 5000` $file setfattr -h -n user.2 -v `randbase64 20000` $file setfattr -h -n user.3 -v `randbase64 20000` $file setfattr -h -n user.1 -v `randbase64 20000` $file getfattr -m. -d $file -- dup_xattr.sh end -- Also, when a filesystem is switch from xattr=sa to xattr=on, it will never modify those in SA. This would cause strange behavior like, you cannot delete an xattr, or setxattr would cause duplicate and the result would not match when you getxattr. For example, the following shell sequence. -- shell begin -- $ sudo zfs set xattr=sa pp/fs0 $ touch zzz $ setfattr -n user.test -v asdf zzz $ sudo zfs set xattr=on pp/fs0 $ setfattr -x user.test zzz setfattr: zzz: No such attribute $ getfattr -d zzz user.test="asdf" $ setfattr -n user.test -v zxcv zzz $ getfattr -d zzz user.test="asdf" user.test="asdf" -- shell end -- We fix this behavior, by first finding where the xattr resides before setxattr. Then, after we successfully updated the xattr in one location, we will clear the other location. Note that, because update and clear are not in single tx, we could still end up with duplicated xattr. But by doing setxattr again, it can be fixed. Signed-off-by: Chunwei Chen <[email protected]> Closes #3472 Closes #4153
* Remove fastwrite mutexRichard Yao2016-01-151-9/+0
| | | | | | | | | | | | | | | | The fast write mutex is intended to protect accounting, but it is redundant because all accounting is performed through atomic operations. It also serializes all metaslab IO behind a mutex, which introduces a theoretical scaling regression that the Illumos developers did not like when we showed this to them. Removing it makes the selection of the metaslab_group lock free as it is on Illumos. The selection is not quite the same without the lock because the loop races with IO completions, but any imbalances caused by this are likely to be corrected by subsequent metaslab group selections. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3643
* Fix zsb->z_hold_mtx deadlockBrian Behlendorf2016-01-152-47/+205
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The zfs_znode_hold_enter() / zfs_znode_hold_exit() functions are used to serialize access to a znode and its SA buffer while the object is being created or destroyed. This kind of locking would normally reside in the znode itself but in this case that's impossible because the znode and SA buffer may not yet exist. Therefore the locking is handled externally with an array of mutexs and AVLs trees which contain per-object locks. In zfs_znode_hold_enter() a per-object lock is created as needed, inserted in to the correct AVL tree and finally the per-object lock is held. In zfs_znode_hold_exit() the process is reversed. The per-object lock is released, removed from the AVL tree and destroyed if there are no waiters. This scheme has two important properties: 1) No memory allocations are performed while holding one of the z_hold_locks. This ensures evict(), which can be called from direct memory reclaim, will never block waiting on a z_hold_locks which just happens to have hashed to the same index. 2) All locks used to serialize access to an object are per-object and never shared. This minimizes lock contention without creating a large number of dedicated locks. On the downside it does require znode_lock_t structures to be frequently allocated and freed. However, because these are backed by a kmem cache and very short lived this cost is minimal. Signed-off-by: Brian Behlendorf <[email protected]> Issue #4106
* Add zfs_object_mutex_size module optionBrian Behlendorf2016-01-152-9/+17
| | | | | | | | | | | Add a zfs_object_mutex_size module option to facilitate resizing the the per-dataset znode mutex array. Increasing this value may help make the deadlock described in #4106 less common, but this is not a proper fix. This patch is primarily to aid debugging and analysis. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Issue #4106
* Increase default user space stack sizeBrian Behlendorf2016-01-131-1/+1
| | | | | | | | | | | | | | | | | | | | Under RHEL6/CentOS6 the default stack size must be increased to 32K to prevent overflowing the stack when running ztest. This isn't an issue for other distributions due to either the version of pthreads or perhaps the compiler. Doubling the stack size resolves the issue safely for all distribution and leaves us some headroom. $ sudo -E ztest -V -T 300 -f /var/tmp/ 5 vdevs, 7 datasets, 23 threads, 300 seconds... loading space map for vdev 0 of 1, metaslab 0 of 30 ... ... loading space map for vdev 0 of 1, metaslab 14 of 30 ... child died with signal 11 Exited ztest with error 3 Signed-off-by: Brian Behlendorf <[email protected]> Closes #4215
* Illumos 3749 - zfs event processing should work on R/O root filesystemsWill Andrews2016-01-122-20/+75
| | | | | | | | | | | | | | | | | | | | | | | | | 3749 zfs event processing should work on R/O root filesystems Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Eric Schrock <[email protected]> Approved by: Christopher Siden <[email protected]> References: https://www.illumos.org/issues/3749 https://github.com/illumos/illumos-gate/commit/3cb69f7 Porting notes: - [include/sys/spa_impl.h] - ffe9d38 Add generic errata infrastructure - 1421c89 Add visibility in to arc_read - [include/sys/fm/fs/zfs.h] - 2668527 Add linux events - 6283f55 Support custom build directories and move includes - [module/zfs/spa_config.c] - Updated spa_config_sync() to match illumos with the exception of a Linux specific block. Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 5438 - zfs_blkptr_verify should continue after zfs_panic_recoverJustin Gibbs2016-01-121-1/+3
| | | | | | | | | | | | | | | 5438 zfs_blkptr_verify should continue after zfs_panic_recover Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Xin LI <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5438 https://github.com/illumos/illumos-gate/commit/5897eb4 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 5515 - dataset user hold doesn't reject empty tagsJosef 'Jeff' Sipek2016-01-121-2/+15
| | | | | | | | | | | | | | | 5515 dataset user hold doesn't reject empty tags Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Yuri Pankov <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Approved by: Matthew Ahrens <[email protected]> References: https://www.illumos.org/issues/5515 https://github.com/illumos/illumos-gate/commit/752fd8d Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 6281 - prefetching should apply to 1MB readsGeorge Wilson2016-01-122-3/+3
| | | | | | | | | | | | | | | | | | 6281 prefetching should apply to 1MB reads Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Alexander Motin <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Justin Gibbs <[email protected]> Reviewed by: Xin Li <[email protected]> Approved by: Gordon Ross <[email protected]> References: https://www.illumos.org/issues/6281 https://github.com/illumos/illumos-gate/commit/6328027 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 6367 - spa_config_tryenter incorrectly handles the multiple-lock caseSaso Kiselkov2016-01-121-3/+5
| | | | | | | | | | | | | | | | | | 6367 spa_config_tryenter incorrectly handles the multiple-lock case Reviewed by: Alek Pinchuk <[email protected]> Reviewed by: Josef 'Jeff' Sipek <[email protected]> Reviewed by: Prashanth Sreenivasa <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Steven Hartland <[email protected]> Approved by: Matthew Ahrens <[email protected]> References: https://www.illumos.org/issues/6367 https://github.com/illumos/illumos-gate/commit/e495b6e Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 6295 - metaslab_condense's dbgmsg should include vdev idJoe Stein2016-01-121-5/+6
| | | | | | | | | | | | | | | | | 6295 metaslab_condense's dbgmsg should include vdev id Reviewed by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Andriy Gapon <[email protected]> Reviewed by: Xin Li <[email protected]> Reviewed by: Justin Gibbs <[email protected]> Approved by: Richard Lowe <[email protected]> References: https://www.illumos.org/issues/6295 https://github.com/illumos/illumos-gate/commit/daec38e Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 6171 - dsl_prop_unregister() slows down dataset eviction.Justin T. Gibbs2016-01-125-187/+170
| | | | | | | | | | | | | | | | | | | | | | | | 6171 dsl_prop_unregister() slows down dataset eviction. Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Prakash Surya <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/6171 https://github.com/illumos/illumos-gate/commit/03bad06 Porting notes: - Conflicts - 3558fd7 Prototype/structure update for Linux - 2cf7f52 Linux compat 2.6.39: mount_nodev() - 13fe019 Illumos #3464 - 241b541 Illumos 5959 - clean up per-dataset feature count code - dsl_prop_unregister() preserved until out of tree consumers like Lustre can transition to dsl_prop_unregister_all(). - Fixing 'space or tab at end of line' in include/sys/dsl_dataset.h Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 6288 - dmu_buf_will_dirty could be fasterMatthew Ahrens2016-01-121-10/+52
| | | | | | | | | | | | | | | | | | | | | 6288 dmu_buf_will_dirty could be faster Reviewed by: George Wilson <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Justin Gibbs <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://www.illumos.org/issues/6288 https://github.com/illumos/illumos-gate/commit/0f2e7d0 Porting notes: - [module/zfs/dbuf.c] - Fix 'warning: ISO C90 forbids mixed declarations and code' by moving 'dbuf_dirty_record_t *dr' to start of code block. Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 6292 - exporting a pool while an async destroyGeorge Wilson2016-01-121-2/+15
| | | | | | | | | | | | | | | | | 6292 exporting a pool while an async destroy is running can leave entries in the deferred tree Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Andriy Gapon <[email protected]> Reviewed by: Fabian Keil <[email protected]> Approved by: Gordon Ross <[email protected]> References: https://www.illumos.org/issues/6292 https://github.com/illumos/illumos-gate/commit/a443cc8 Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 6319 - assertion failed in zio_ddt_write: bp->blk_birth == txgMatthew Ahrens2016-01-121-0/+2
| | | | | | | | | | | | | | | | | 6319 assertion failed in zio_ddt_write: bp->blk_birth == txg Reviewed by: George Wilson <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/6319 https://github.com/illumos/illumos-gate/commit/b39b744 Porting notes: - Re-enabled ztest for CentOS test slaves. Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]> Closes #3449
* Illumos 5987 - zfs prefetch code needs workMatthew Ahrens2016-01-125-619/+238
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5987 zfs prefetch code needs work Reviewed by: Adam Leventhal <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Approved by: Gordon Ross <[email protected]> References: https://www.illumos.org/issues/5987 zfs prefetch code needs work illumos/illumos-gate@cf6106c 5987 zfs prefetch code needs work Porting notes: - [module/zfs/dbuf.c] - 5f6d0b6 Handle block pointers with a corrupt logical size - [module/zfs/dmu_zfetch.c] - c65aa5b Fix gcc missing parenthesis warnings - 428870f Update core ZFS code from build 121 to build 141. - 79c76d5 Change KM_PUSHPAGE -> KM_SLEEP - b8d06fc Switch KM_SLEEP to KM_PUSHPAGE - Account for ISO C90 - mixed declarations and code - warnings - Module parameters (new/changed): - Replaced zfetch_block_cap with zfetch_max_distance (Max bytes to prefetch per stream (default 8MB; 8 * 1024 * 1024)) - Preserved zfs_prefetch_disable as 'int' for consistency with existing Linux module options. - [include/sys/trace_arc.h] - Added new tracepoints - DEFINE_ARC_BUF_HDR_EVENT(zfs_arc__sync__wait__for__async); - DEFINE_ARC_BUF_HDR_EVENT(zfs_arc__demand__hit__predictive__prefetch); - [man/man5/zfs-module-parameters.5] - Updated man page Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 6293 - ztest failure: error == 28 (0xc == 0x1c) in ztest_tx_assign()Brian Behlendorf2016-01-111-1/+11
| | | | | | | | | | | | | | 6293 ztest failure: error == 28 (0xc == 0x1c) in ztest_tx_assign() Reviewed by: George Wilson <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Richard Lowe <[email protected]> References: https://www.illumos.org/issues/6293 https://github.com/illumos/illumos-gate/commit/8fe00bf Ported-by: Brian Behlendorf <[email protected]>
* Revert "Illumos 3749 - zfs event processing should work on R/O root filesystems"Brian Behlendorf2016-01-112-76/+20
| | | | | | | | | | This reverts commit b47637ecdc7b647ec5bd9dfca888179eecfaa72d which introduced a regression in ztest. $ ./cmd/ztest/ztest -V 5 vdevs, 7 datasets, 23 threads, 300 seconds... *** Error in `/rpool/home/behlendo/src/git/zfs/cmd/ztest/.libs/lt-ztest': double free or corruption (fasttop): 0x0000000000d339f0 ***
* Fix 'prevsnap property' build failureBrian Behlendorf2016-01-111-1/+1
| | | | | | | | | | Fix build failure accidentally introduced by 1715493. This only results in a failure when debugging is disabled. dsl_dataset.c: In function 'dsl_dataset_stats': dsl_dataset.c:1698:45: error: 'dp' undeclared (first use in this function) Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 4929 - want prevsnap propertyMatthew Ahrens2016-01-112-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | 4929 want prevsnap property Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Matt Amdur <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Reviewed by: Boris Protopopov <[email protected]> Reviewed by: Richard Lowe <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/4929 https://github.com/illumos/illumos-gate/commit/b461c74 Porting notes: - [include/sys/fs/zfs.h] - f67d70 Create an 'overlay' property - 11b9ec Add full SELinux support - [fs/zfs/dsl_dataset.c] - This increases the stack size of dsl_dataset_stats() but nothing has been changed until this is shown to be an issue. Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>
* Illumos 4638 - Panic in ZFS via rfs3_setattr()/rfs3_write(): dirtying snapshot!Marcel Telka2016-01-111-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | 4638 Panic in ZFS via rfs3_setattr()/rfs3_write(): dirtying snapshot! Reviewed by: Alek Pinchuk <[email protected]> Reviewed by: Ilya Usvyatsky <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Garrett D'Amore <[email protected]> Approved by: Garrett D'Amore <[email protected]> References: https://www.illumos.org/issues/4638 https://github.com/illumos/illumos-gate/commit/2144b12 Porting notes: - [module/zfs/zfs_vnops.c] - 3558fd7 Prototype/structure update for Linux - 2cf7f52 Linux compat 2.6.39: mount_nodev() - Use zfs_is_readonly() wrapper - Remove first line of comment which doesn't apply Ported-by: kernelOfTruth [email protected] Signed-off-by: Brian Behlendorf <[email protected]>