summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Dbuf hash table should be sized as is the arc hash tableTim Chase2015-09-022-3/+7
| | | | | | | | | | | Commit 49ddb315066e372f31bda29a5c546a9eccc8b418 added the zfs_arc_average_blocksize parameter to allow control over the size of the arc hash table. The dbuf hash table's size should be determined similarly. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3721
* Add spa_slop_shift module optionBrian Behlendorf2015-09-022-0/+19
| | | | | | | | | | Allow for easy turning of a pools reserved free space. Previous versions of ZFS (v0.6.4 and earlier) held 1/64 of the pools capacity in reserve. Commits 3d45fdd and 0c60cc3 increased this to 1/32. Setting spa_slop_shift=6 will restore the previous default setting. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3724
* Reorder zfs-* services to allow /var on separate datasetJames Lee2015-09-025-46/+45
| | | | | | | | | | | | | | | | | | | | | | | | | ZED depends on /var. When /var is a separate dataset, it must be mounted before starting ZED. This change moves the zfs-zed service from starting first, to starting after zfs-mount, but before zfs-share. As discussed in issue #3513, ZED does not need to start first in order to consume events made during the zfs-import and zfs-mount services. The events will be queued and can be handled later in the boot process. ZED may, however, handle sharing in the future, so it should be started before the zfs-share service. This commit also stops the zfs-import service from writing temp files to /var/tmp on shutdown and it corrects the return code for the OpenRC service. Other OpenRC-specific changes noted in issue #3513 were reitereated in issue #3715 and committed in da619f3. Signed-off-by: James Lee <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3513
* Disable LBA weighting on files and SSDsRichard Yao2015-09-015-2/+17
| | | | | | | | | | | | | | The LBA weighting makes sense on rotational media where the outer tracks have twice the bandwidth of the inner tracks. However, it is detrimental on nonrotational media such as solid state disks, where the only effect is to ensure that metaslabs enter the best-fit allocation behavior sooner, which is detrimental to performance. It also makes no sense on files where the underlying filesystem can arrange things however it wants. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3712
* Check for RW_WRITE_HELD in zfs_inactivetuxoko2015-09-011-3/+10
| | | | | | | | | | | | | | | Before read locking z_teardown_inactive_lock, we need to check if we have already had write lock on it. Otherwise, we would deadlock on ourself when doing rollback: zfs_ioc_rollback ->zfs_suspend_fs (z_teardown_inactive_lock, RW_WRITER) ->zfs_resume_fs->zfs_rezget->zfs_iput_async->iput-> ... ->zfs_inactive (z_teardown_inactive_lock, RW_READER) Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2869
* Remove blk_queue_io_opt() autotools checkRichard Yao2015-09-013-34/+0
| | | | | | | | This is needed for supporting kernels earlier than 2.6.30. Support for those kernels was dropped, so we can safely remove this check. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Remove blk_queue_physical_block_size() autotools checkRichard Yao2015-09-013-36/+0
| | | | | | | | This is needed for supporting kernels earlier than 2.6.30. Support for those kernels was dropped, so we can safely remove this check. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 4.2 compat: misc_deregister()Brian Behlendorf2015-09-012-10/+2
| | | | | | | | | The misc_deregister() function was changed to a void return type. Rather than add compatibility code to detect this change simply ignore the return code on all kernels. It was only used to log an informational error message of no real value. Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 3.18 compat: Snapshot auto-mountingBrian Behlendorf2015-08-3113-419/+496
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-factor the .zfs/snapshot auto-mouting code to take in to account changes made to the upstream kernels. And to lay the groundwork for enabling access to .zfs snapshots via NFS clients. This patch makes the following core improvements. * All actively auto-mounted snapshots are now tracked in two global trees which are indexed by snapshot name and objset id respectively. This allows for fast lookups of any auto-mounted snapshot regardless without needing access to the parent dataset. * Snapshot entries are added to the tree in zfsctl_snapshot_mount(). However, they are now removed from the tree in the context of the unmount process. This eliminates the need complicated error logic in zfsctl_snapshot_unmount() to handle unmount failures. * References are now taken on the snapshot entries in the tree to ensure they always remain valid while a task is outstanding. * The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right after the auto-mount succeeds. This allows to kernel to unmount idle auto-mounted snapshots if needed removing the need for the zfsctl_unmount_snapshots() function. * Snapshots in active use will not be automatically unmounted. As long as at least one dentry is revalidated every zfs_expire_snapshot/2 seconds the auto-unmount expiration timer will be extended. * Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3589 Closes #3344 Closes #3295 Closes #3257 Closes #3243 Closes #3030 Closes #2841
* zfsctl: No need to sync ctldir inodesAndrey Vesnovaty2015-08-311-0/+3
| | | | | | | | | | There's no metadata to write to disk for ctldir inodes. So we check if a inode belongs to the ctldir in zpl_commit_metadata, and returns immediately if it is. Signed-off-by: Andrey Vesnovaty <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #2797
* Clear QUEUE_FLAG_ADD_RANDOM on zvolsRichard Yao2015-08-301-0/+3
| | | | | | | | | | | zvols should not be an entropy source for the kernel. Disable it to be consistent with the upstream kernel. torvalds/linux@b277da0a8a594308e17881f4926879bd5fca2a2d Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3713
* Fix small typololi10K2015-08-301-1/+1
| | | | | | | | | | Add a missing space to the zfs_vdev_sync_write_min_active module parameter description. Signed-off-by: loli10K <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3714
* Some OpenRC dependency logic belongs in mountRichard Yao2015-08-302-10/+12
| | | | | | | | | The dependencies for handling / on ZFS belong in the mount script, not the zed script. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3715
* Initialize the taskq entry embedded within struct vdevTim Chase2015-08-301-0/+1
| | | | | | | | | | | | | As part of the stack reduction effort in 50b25b2187134ac7b19cf93bd35a420223f1d343, a zio_t containing a taskq_ent was added to struct vdev_queue which itself is part of struct vdev. The taskq entry should be initialized as is currently done in zio_create() for newly-created bare zio_t object. The rationale is the same as is described in f467b05a265abcfb8e5a3269f79d08f36a58646a. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3709
* Add extra keyword 'slot' to vdev_id.confAndreas Buschmann2015-08-302-1/+40
| | | | | | | | | | | Add new keyword 'slot' to vdev_id.conf This selects from where to get the slot number for a SAS/SATA disk Needed to enable access to the physical position of a disk in a Supermicro 2027R-AR24NV . Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #3693
* Allow recovery from corrupted snapshot mapsTim Chase2015-08-281-0/+6
| | | | | | | | | | | | | | | | | | If the ZAP object containing a snapshot map is corrupted due to an unrecoverable checksum error or otherwise, dsl_dataset_name() will normally panic the system due to its VERIFY. This patch attempts to allow a recovery avenue from such situations by manufacturing a descriptive snapshot name and then ignoring the error. Scrubbing a pool with this type of corruption will then show the affected object in the error list rather than panicking. The recovery code is only enabled when the zfs_recover module parameter is set. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3705
* Check large block feature flag on volumesBrian Behlendorf2015-08-284-5/+36
| | | | | | | | | | | | | | | Since ZoL allows large blocks to be used by volumes, unlike upstream illumos, the feature flag must be checked prior to volume creation. This is critical because unlike filesystems, volumes will create a object which uses large blocks as part of the create. Therefore, it cannot be safely checked in zfs_check_settable() after the dataset can been created. In addition this patch updates the relevant error messages to use zfs_nicenum() to print the maximum blocksize. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3591
* Limit max_hw_sectors_kb to 16MBrian Behlendorf2015-08-281-1/+1
| | | | | | | | | | | | | | | | When support for large blocks was added DMU_MAX_ACCESS was increased to allow for blocks of up to 16M to fit in a transaction handle. This had the side effect of increasing the max_hw_sectors_kb for volumes, which are scaled off DMU_MAX_ACCESS, to 64M from 10M. This is an issue for volumes which by default use an 8K block size because it results in dmu_buf_hold_array_by_dnode() allocating a 64K array for the dbufs. The solution is to restore the maximum size to ~10M. This patch specifically changes it to 16M which is close enough. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3684
* Linux 4.1 compat: loop device on ZFSChunwei Chen2015-08-245-137/+162
| | | | | | | | | | | | | | | | | | | Starting from Linux 4.1 allows iov_iter with bio_vec to be passed into iter_read/iter_write. Notably, the loop device will pass bio_vec to backend filesystem. However, current ZFS code assumes iovec without any check, so it will always crash when using loop device. With the restructured uio_t, we can safely pass bio_vec in uio_t with UIO_BVEC set. The uio* functions are modified to handle bio_vec case separately. The const uio_iov causes some warning in xuio related stuff, so explicit convert them to non const. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3511 Closes #3640
* Add compatibility layer for {kmap,kunmap}_atomicChunwei Chen2015-08-244-1/+63
| | | | | | | | | Starting from linux-2.6.37, {kmap,kunmap}_atomic takes 1 argument instead of 2. We use zfs_{kmap,kunmap}_atomic as wrappers and always take 2 argument, but ignore the 2nd for newer kernel. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Linux 4.2 compat: vfs_rename()Brian Behlendorf2015-08-191-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The spa_config_write() function relies on the classic method of making sure updates to the /etc/zfs/zpool.cache file are atomic. It writes out a temporary version of the file and then uses vn_rename() to switch it in to place. This way there can never exist a partial version of the file, it's all or nothing. Conceptually this is a good strategy and it makes good sense for platforms where it's easy to do a rename within the kernel. Unfortunately, Linux is not one of those platforms. Even doing basic I/O to a file system from within the kernel is strongly discouraged. In order to support this at all the vn_rename() implementation ends up being complex and fragile. So fragile that recent Linux 4.2 changes have broken it. While it is possible to update vn_rename() to work with the latest kernels a better long term strategy is to stop using vn_rename() entirely. Then all this complex, fragile code can be removed. Achieving this is straight forward because config_write() is the only consumer of vn_rename(). This patch reworks spa_config_write() to update the cache file in place. The file will be truncated, written out, and then synced to disk. If an error is encountered the file will be unlinked leaving the system in a consistent state. This does expose a tiny tiny tiny window where a system could crash at exactly the wrong moment could leave a partially written cache file. However, this is highly unlikely because the cache file is 1) infrequently updated, 2) only a few kilobytes in size, and 3) written with a single vn_rdwr() call. If this were to somehow happen it poses no risk to pool. Simply removing the cache file will allow the pool to be imported cleanly. Going forward this will be even less of an issue as we intend to disable the use of a cache file by default. Bottom line not using vn_rename() allows us to make ZoL more robust against upstream kernel changes. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3653
* Handle zap_lookup() failure in ddt_object_load()Brian Behlendorf2015-08-191-3/+4
| | | | | | | | | Failing to lookup a name in the spa_ddt_stat_object should not result in a panic in ddt_object_load(). The error can be safely returned to the caller for handling resulting in a useful user error message. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3370
* Fix zvol detectionRichard Yao2015-08-191-0/+6
| | | | | | | | | The zpool create subcomand should not return an error on debug builds of the userland tools when given zvols. Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3595
* Add parenthesis to the ternary operatortuxoko2015-08-191-1/+1
| | | | | | | | | Without the parenthesis, this particular ASSERT will evaluate to "(RW_READER == (!zap->zap_ismicro && fatreader)) ? RW_READER : lti" Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3685
* Fix build failure with Linux 4.1 and FTRACEChris Dunlop2015-08-181-0/+3
| | | | | | | | See also #3546, commit c1718e9 Signed-off-by: Chris Dunlop <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3673
* Linux 4.1 compat: configure bdi_setup_and_register()Chris Dunlop2015-08-181-2/+2
| | | | | | | | | | Pull struct backing_dev_info off the stack: by linux-4.1 it's grown past our 1024 byte stack frame warning limit resulting in an incorrect configure result. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlop <[email protected]> Closes #3671
* ztest: display non-index properties properly at verbose level 6Tim Chase2015-07-301-3/+10
| | | | | | | | | | | | At verbosity levels of 6 or greater, ztest_dsl_prop_set_uint64() attempts to display the value of all properties as indexed values regardless of whether the property is an indexed value or simply an un-indexed integer. This patch causes the numeric value of the property to be displayed if zfs_prop_index_to_string() fails. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3649
* Rework zed_notify_email for configurable PROG/OPTSChris Dunlap2015-07-302-11/+51
| | | | | | | | | | | | | | | | | | | | | | | This commit reworks the zed_notify_email() function to allow configuration of the mail executable and command-line arguments. ZED_EMAIL_PROG specifies the name or path of the executable responsible for sending notifications via email. This variable defaults to "mail". ZED_EMAIL_OPTS specifies command-line options passed to ZED_EMAIL_PROG. The following keyword substitutions are performed: - @ADDRESS@ is replaced with the recipient email address(es) - @SUBJECT@ is replaced with the notification subject This variable defaults to "-s '@SUBJECT@' @ADDRESS@". ZED_EMAIL_ADDR replaces ZED_EMAIL (although the latter is retained for backward compatibility). This variable can contain multiple addresses as long as they are delimited by whitespace. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3634 Closes #3631
* Fix whitespace in zed_log_errChris Dunlap2015-07-301-1/+1
| | | | | | | | This commit fixes the two adjacent spaces that appear in zed_log_err() messages when ZEVENT_EID is undefined. Signed-off-by: Chris Dunlap <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]>
* Update arc_memory_throttle() to check pageoutBrian Behlendorf2015-07-302-13/+44
| | | | | | | | | | | | | | | | | | | | | | | | This brings the behavior of arc_memory_throttle() back in sync with illumos. The updated memory throttling policy roughly goes like this: * Never throttle if more than 10% of memory is free. This threshold is configurable with the zfs_arc_lotsfree_percent module option. * Minimize any throttling of kswapd even when free memory is below the set threshold. Allow it to write out pages as quickly as possible to help alleviate the memory pressure. * Delay all other threads when free memory is below the set threshold in order to avoid compounding the memory pressure. Buffers will be evicted from the ARC to reduce the issue. The Linux specific zfs_arc_memory_throttle_disable module option has been removed in favor of the existing zfs_arc_lotsfree_percent tuning. Setting zfs_arc_lotsfree_percent=0 will have the same effect as zfs_arc_memory_throttle_disable and it was therefore redundant. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3637
* Update arc_available_memory() to check freememBrian Behlendorf2015-07-302-31/+46
| | | | | | | | | | | | | | | | | | | | | | While Linux doesn't provide detailed information about the state of the VM it does provide us total free pages. This information should be incorporated in to the arc_available_memory() calculation rather than solely relying on a signal from direct reclaim. Conceptually this brings arc_available_memory() back in sync with illumos. It is also desirable that the target amount of free memory be tunable on a system. While the default values are expected to work well for most workloads there may be cases where custom values are needed. The zfs_arc_sys_free module option was added for this purpose. zfs_arc_sys_free - The target number of bytes the ARC should leave as free memory on the system. This value can checked in /proc/spl/kstat/zfs/arcstats and setting this module option will override the default value. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3637
* Bound zvol_threads module optionBrian Behlendorf2015-07-292-4/+5
| | | | | | | | | | The zvol_threads module option should be bounded to a reasonable range. The taskq must have at least 1 thread and shouldn't have more than 1,024 at most. The default value of 32 is a reasonable default. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3614
* Fix "BUG: Bad page state" caused by writeback flagChunwei Chen2015-07-291-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | Commit d958324 fixed the deadlock between page lock and range lock by unlocking the page lock before acquiring the range lock. However, this created a new issue #3075. The problem is that if we can't set the write back bit before releasing the page lock. Then other processes will be unaware that the page is under active write back. They may therefore truncate the page, invalidate the page, or not honor the sync semantics. To workaround this problem we re-dirty the page before dropping the page lock. While this doesn't prevent the page from being truncated it does ensure it won't be invalidated. Then the range lock and the page lock are reacquired in the correct deadlock-free order. Once both locks are safely held the page state can be rechecked. If all is well and the page is in the expect state the dirty bit can be removed, the write back bit set, and the page removed from the skip count. If not the page will be handled as appropriate. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3075
* Fix build failure with Linux 4.1 and FTRACEFrédéric VANNIÈRE2015-07-298-0/+24
| | | | | | Signed-off-by: Frédéric VANNIÈRE <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3546
* Align thread priority with Linux defaultsBrian Behlendorf2015-07-2812-20/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under Linux filesystem threads responsible for handling I/O are normally created with the maximum priority. Non-I/O filesystem processes run with the default priority. ZFS should adopt the same priority scheme under Linux to maintain good performance and so that it will complete fairly when other Linux filesystems are active. The priorities have been updated to the following: $ ps -eLo rtprio,cls,pid,pri,nice,cmd | egrep 'z_|spl_|zvol|arc|dbu|meta' - TS 10743 19 -20 [spl_kmem_cache] - TS 10744 19 -20 [spl_system_task] - TS 10745 19 -20 [spl_dynamic_tas] - TS 10764 19 0 [dbu_evict] - TS 10765 19 0 [arc_prune] - TS 10766 19 0 [arc_reclaim] - TS 10767 19 0 [arc_user_evicts] - TS 10768 19 0 [l2arc_feed] - TS 10769 39 0 [z_unmount] - TS 10770 39 -20 [zvol] - TS 11011 39 -20 [z_null_iss] - TS 11012 39 -20 [z_null_int] - TS 11013 39 -20 [z_rd_iss] - TS 11014 39 -20 [z_rd_int_0] - TS 11022 38 -19 [z_wr_iss] - TS 11023 39 -20 [z_wr_iss_h] - TS 11024 39 -20 [z_wr_int_0] - TS 11032 39 -20 [z_wr_int_h] - TS 11033 39 -20 [z_fr_iss_0] - TS 11041 39 -20 [z_fr_int] - TS 11042 39 -20 [z_cl_iss] - TS 11043 39 -20 [z_cl_int] - TS 11044 39 -20 [z_ioctl_iss] - TS 11045 39 -20 [z_ioctl_int] - TS 11046 39 -20 [metaslab_group_] - TS 11050 19 0 [z_iput] - TS 11121 38 -19 [z_wr_iss] Note that under Linux the meaning of a processes priority is inverted with respect to illumos. High values on Linux indicate a _low_ priority while high value on illumos indicate a _high_ priority. In order to preserve the logical meaning of the minclsyspri and maxclsyspri macros when they are used by the illumos wrapper functions their values have been inverted. This way when changes are merged from upstream illumos we won't need to remember to invert the macro. It could also lead to confusion. This patch depends on https://github.com/zfsonlinux/spl/pull/466. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #3607
* Check for NULL in dmu_free_long_range_impl()Brian Behlendorf2015-07-281-1/+5
| | | | | | | | | | A NULL should never be passed as the dnode_t pointer to the function dmu_free_long_range_impl(). Regardless, because we have a reported occurrence of this let's add some error handling to catch this. Better to report a reasonable error to caller than panic the system. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3445
* Make sure that POOL_IMPORTED is set, unset and checked where appropriate.Turbo Fredriksson2015-07-281-10/+17
| | | | | | | | | | | | | * If it's unset in find_rootfs(), no pool is imported so no point in looking for a rootfs. * If find_rootfs() couldn't find a rootfs, the pool is exported. Remember to unset POOL_IMPORTED after doing so. * Set POOL_IMPORTED if/when a pool have been imported in import_pool(). * Improve backup import (the one using cache file). Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3636
* Fix some minor issues with the SYSV init and initramfs scripts.Turbo Fredriksson2015-07-248-20/+27
| | | | | | | | | | | | | | | | | | | | | This is some minor fixes to commits 2cac7f5f11756663525a5d4604d9f0a3202d4024 and 2a34db1bdbcecf5019c4a59f2a44c92fe82010f2. * Make sure to alien'ate the new initramfs rpm package as well! The rpm package is build correctly, but alien isn't run on it to create the deb. * Before copying file from COPY_FILE_LIST, make sure the DESTDIR/dir exists. * Include /lib/udev/vdev_id file in the initrd. * Because the initrd needs to use '/sbin/modprobe' instead of 'modprobe', we need to use this in load_module() as well. * Make sure that load_module() can be used more globaly, instead of calling '/sbin/modprobe' all over the place. * Make sure that check_module_loaded() have a parameter - module to check. Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3626
* Minor style cleanupBrian Behlendorf2015-07-231-10/+7
| | | | | | | | | Address minor differences in style between upstream and ZoL. This patch contains no functional differences and is solely designed to minimize the delta from upstream. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3533
* Remove double counting HDR_L2ONLY_SIZEBrian Behlendorf2015-07-231-3/+0
| | | | | | | | | | | Commit d962d5d didn't quite properly resolve the HDR_L2ONLY_SIZE accounting. Accounting is now performed only in the constructor and destructor which is a nice simplification. It should have been removed the from create and destroy functions. This brings up back in sync with upstream. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3533
* Add hdr_recl() reclaim callbackBrian Behlendorf2015-07-231-2/+18
| | | | | | | | | | | Originally removed because it wasn't required under Linux. However, there may still be some utility in signaling the arc reclaim thread under Linux via reclaim. This should already have happened by other means but it's not harmless and reduces another point of divergence with upstream. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3533
* Reinstate zfs_arc_p_min_shiftBrian Behlendorf2015-07-232-2/+26
| | | | | | | | | | | | | | | Commit f521ce1 removed the minimum value for "arc_p" allowing it to drop to zero or grow to "arc_c". This was done to improve specific workload which constantly dirties new "metadata" but also frequently touches a "small" amount of mfu data (e.g. mkdir's). This change may still be desirable but it needs to be re-investigated. in the context of the recent ARC changes from upstream. Therefore this code is being restored to facilitate benchmarking. By setting "zfs_arc_p_min_shift=64" we easily compare the performance. Signed-off-by: Brian Behlendorf <[email protected]> Issue #3533
* Illumos 5817 - change type of arcs_size from uint64_t to refcount_tPrakash Surya2015-07-232-28/+113
| | | | | | | | | | | | | | | | | 5817 change type of arcs_size from uint64_t to refcount_t Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Adam Leventhal <[email protected]> Reviewed by: Alex Reece <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Garrett D'Amore <[email protected]> References: https://www.illumos.org/issues/5817 https://github.com/illumos/illumos-gate/commit/2fd872a Ported-by: Brian Behlendorf <[email protected]> Issue #3533
* Illumos 5445 - Add more visibility via arcstatsPrakash Surya2015-07-231-40/+151
| | | | | | | | | | | | | | | | | | | | | | | 5445 Add more visibility via arcstats; specifically arc_state_t stats and differentiate between "data" and "metadata" Reviewed by: Basil Crow <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Bayard Bell <[email protected]> Approved by: Robert Mustacchi <[email protected]> References: https://www.illumos.org/issues/5445 https://github.com/illumos/illumos-gate/commit/4076b1b Porting Notes: This patch is an improved version of cc7f677 which was previously merged in ZoL. This patch incorporates the additional improvements which were made upstream. Ported-by: Brian Behlendorf <[email protected]> Issue #3533
* Illumos 5376 - arc_kmem_reap_now() should not result in clearing arc_no_growMatthew Ahrens2015-07-233-157/+374
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5376 arc_kmem_reap_now() should not result in clearing arc_no_grow Reviewed by: Christopher Siden <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Steven Hartland <[email protected]> Reviewed by: Richard Elling <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/5376 https://github.com/illumos/illumos-gate/commit/2ec99e3 Porting Notes: The good news is that many of the recent changes made upstream to the ARC tackled issues previously observed by ZoL with similar solutions. The bad news is those solution weren't identical to the ones we applied. This patch is designed to split the difference and apply as much of the upstream work as possible. * The arc_available_memory() function was removed previous in ZoL but due to the upstream changes it makes sense to add it back. This function has been customized for Linux so that it can be used to determine a low memory. This provides the same basic functionality as the illumos version allowing us to minimize changes through the rest of the code base. The exact mechanism used to detect a low memory state remains unchanged so this change isn't a significant as it might first appear. * This patch includes the long standing fix for arc_shrink() which was originally proposed in #2167. Since there were related changes to this function it made sense to include that work. * The arc_init() function has been re-factored. As before it sets sane default values for the ARC but then calls arc_tuning_update() to apply user specific tuning made via module options. The arc_tuning_update() function is then called periodically by the arc_reclaim_thread() to apply changes to the tunings made during normal operation. Ported-by: Brian Behlendorf <[email protected]> Closes #3616 Closes #2167
* Set default _initconfdir directoryBrian Behlendorf2015-07-211-0/+5
| | | | | | | | | | | The _initconfdir macro is normally provided by global rpm macros file for use in the spec file. However, older distributions such as CentOS 6 do not define it. To prevent a build failure in this case the spec file has been updated to use a reasonable default when the value is undefined. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3617
* Add logic to try and recover an inode with an invalid modeBrian Behlendorf2015-07-171-4/+11
| | | | | | | | | | | | | | | | | | | | | | | When an inode is detected with invalid mode bits the safe thing to do is panic the system. This indicates a problem with the contents of a dnode and it should never be possible. This is the default behavior. Unfortunately, due to flaws in the system attribute (SA) implementation (on all platforms) it was possible that ZFS could create a damaged dnode. This was a rare issue which only impacted dnodes which used a spill block. Normally only symlinks and files with ACLs would require a spill block. However, if the dataset had the xattr=sa property set and extended attributes were used this problem could occur. As of the 0.6.4 tag the root cause of this issue has been fixed. For pools which are exhibiting this damage the 'zfs_recover=1' module option may be set. This will cause ZFS to interpret the dnode with invalid mode bits as a normal file. This may allow the files to be accessed for recovery purposes. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3548
* Support parallel build trees (VPATH builds)Turbo Fredriksson2015-07-1742-388/+476
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Build products from an out of tree build should be written relative to the build directory. Sources should be referred to by their locations in the source directory. This is accomplished by adding the 'src' and 'obj' variables for the module Makefile.am, using relative paths to reference source files, and by setting VPATH when source files are not co-located with the Makefile. This enables the following: $ mkdir build $ cd build $ ../configure \ --with-spl=$HOME/src/git/spl/ \ --with-spl-obj=$HOME/src/git/spl/build $ make -s This change also has the advantage of resolving the following warning which is generated by modern versions of automake. Makefile.am:00: warning: source file 'xxx' is in a subdirectory, Makefile.am:00: but option 'subdir-objects' is disabled Signed-off-by: Turbo Fredriksson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1082
* Update inode under range lockBrian Behlendorf2015-07-171-1/+1
| | | | | | | | | | | | After a successful write the inode must be updated under the range lock. If it is updated after dropping the lock there exists a race where the znode and inode wile disagree about the file size. This could result in narrow window of time where read(2) is able to access data beyond what fstat(2) reports as the file size. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Ned Bass <[email protected]> Closes #3601
* Linux 4.2 compat: follow_link() / put_link()Brian Behlendorf2015-07-176-8/+80
| | | | | | | | | | | | | | | | | | | | As of Linux 4.2 the kernel has completely retired the nameidata structure. One of the few remaining consumers of this interface were the follow_link() and put_link() callbacks. This patch adds the required checks to configure to detect the interface change and updates the functions accordingly. Migrating to the simple_follow_link() interface was considered but was decided against ironically due to the increased complexity. It also should be noted that the kernel follow_link() and put_link() interfaces changes several times after 4.1 and but before 4.2. This means there is a narrow range of kernel commits which never appear in an official tag of the Linux kernel which ZoL will not build. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Issue #3596