| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
6171 dsl_prop_unregister() slows down dataset eviction.
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Prakash Surya <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/6171
https://github.com/illumos/illumos-gate/commit/03bad06
Porting notes:
- Conflicts
- 3558fd7 Prototype/structure update for Linux
- 2cf7f52 Linux compat 2.6.39: mount_nodev()
- 13fe019 Illumos #3464
- 241b541 Illumos 5959 - clean up per-dataset feature count code
- dsl_prop_unregister() preserved until out of tree consumers
like Lustre can transition to dsl_prop_unregister_all().
- Fixing 'space or tab at end of line' in include/sys/dsl_dataset.h
Ported-by: kernelOfTruth [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5987 zfs prefetch code needs work
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Approved by: Gordon Ross <[email protected]>
References:
https://www.illumos.org/issues/5987 zfs prefetch code needs work
illumos/illumos-gate@cf6106c 5987 zfs prefetch code needs work
Porting notes:
- [module/zfs/dbuf.c]
- 5f6d0b6 Handle block pointers with a corrupt logical size
- [module/zfs/dmu_zfetch.c]
- c65aa5b Fix gcc missing parenthesis warnings
- 428870f Update core ZFS code from build 121 to build 141.
- 79c76d5 Change KM_PUSHPAGE -> KM_SLEEP
- b8d06fc Switch KM_SLEEP to KM_PUSHPAGE
- Account for ISO C90 - mixed declarations and code - warnings
- Module parameters (new/changed):
- Replaced zfetch_block_cap with zfetch_max_distance
(Max bytes to prefetch per stream (default 8MB; 8 * 1024 * 1024))
- Preserved zfs_prefetch_disable as 'int' for consistency with
existing Linux module options.
- [include/sys/trace_arc.h]
- Added new tracepoints
- DEFINE_ARC_BUF_HDR_EVENT(zfs_arc__sync__wait__for__async);
- DEFINE_ARC_BUF_HDR_EVENT(zfs_arc__demand__hit__predictive__prefetch);
- [man/man5/zfs-module-parameters.5]
- Updated man page
Ported-by: kernelOfTruth [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit b47637ecdc7b647ec5bd9dfca888179eecfaa72d which
introduced a regression in ztest.
$ ./cmd/ztest/ztest -V
5 vdevs, 7 datasets, 23 threads, 300 seconds...
*** Error in `/rpool/home/behlendo/src/git/zfs/cmd/ztest/.libs/lt-ztest':
double free or corruption (fasttop): 0x0000000000d339f0 ***
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4929 want prevsnap property
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Matt Amdur <[email protected]>
Reviewed by: Saso Kiselkov <[email protected]>
Reviewed by: Boris Protopopov <[email protected]>
Reviewed by: Richard Lowe <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/4929
https://github.com/illumos/illumos-gate/commit/b461c74
Porting notes:
- [include/sys/fs/zfs.h]
- f67d70 Create an 'overlay' property
- 11b9ec Add full SELinux support
- [fs/zfs/dsl_dataset.c]
- This increases the stack size of dsl_dataset_stats() but
nothing has been changed until this is shown to be an issue.
Ported-by: kernelOfTruth [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4891 want zdb option to dump all metadata
Reviewed by: Sonu Pillai <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Reviewed by: Richard Lowe <[email protected]>
Approved by: Garrett D'Amore <[email protected]>
We'd like a way for zdb to dump metadata in a machine-readable
format, so that we can bring that back from a customer site for
in-house diagnosis. Think of it as a crash dump for zpools,
which can be used for post-mortem analysis of a malfunctioning
pool
References:
https://www.illumos.org/issues/4891
https://github.com/illumos/illumos-gate/commit/df15e41
Porting notes:
- [cmd/zdb/zdb.c]
- a5778ea zdb: Introduce -V for verbatim import
- In main() getopt 'opt' variable removed and the code was
brought back in line with illumos.
- [lib/libzpool/kernel.c]
- 1e33ac1 Fix Solaris thread dependency by using pthreads
- f0e324f Update utsname support
- 4d58b69 Fix vn_open/vn_rdwr error handling
- In vn_open() allocate 'dumppath' on heap instead of stack
- Properly handle 'dump_fd == -1' error path
- Free 'realpath' after added vn_dumpdir_code block
Ported-by: kernelOfTruth [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
3749 zfs event processing should work on R/O root filesystems
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Eric Schrock <[email protected]>
Approved by: Christopher Siden <[email protected]>
References:
https://www.illumos.org/issues/3749
https://github.com/illumos/illumos-gate/commit/3cb69f7
Porting notes:
- [include/sys/spa_impl.h]
- ffe9d38 Add generic errata infrastructure
- 1421c89 Add visibility in to arc_read
- [include/sys/fm/fs/zfs.h]
- 2668527 Add linux events
- 6283f55 Support custom build directories and move includes
- [module/zfs/spa_config.c]
- Updated spa_config_sync() to match illumos with the exception
of a Linux specific block.
Ported-by: kernelOfTruth [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
References:
https://www.illumos.org/issues/5960
https://www.illumos.org/issues/5925
https://github.com/illumos/illumos-gate/commit/a2cdcdd
Porting notes:
- [lib/libzfs/libzfs_sendrecv.c]
- b8864a2 Fix gcc cast warnings
- 325f023 Add linux kernel device support
- 5c3f61e Increase Linux pipe buffer size on 'zfs receive'
- [module/zfs/zfs_vnops.c]
- 3558fd7 Prototype/structure update for Linux
- c12e3a5 Restructure zfs_readdir() to fix regressions
- [module/zfs/zvol.c]
- Function @zvol_map_block() isn't needed in ZoL
- 9965059 Prefetch start and end of volumes
- [module/zfs/dmu.c]
- Fixed ISO C90 - mixed declarations and code
- Function dmu_prefetch() 'int i' is initialized before
the following code block (c90 vs. c99)
- [module/zfs/dbuf.c]
- fc5bb51 Fix stack dbuf_hold_impl()
- 9b67f60 Illumos 4757, 4913
- 34229a2 Reduce stack usage for recursive traverse_visitbp()
- [module/zfs/dmu_send.c]
- Fixed ISO C90 - mixed declarations and code
- b58986e Use large stacks when available
- 241b541 Illumos 5959 - clean up per-dataset feature count code
- 77aef6f Use vmem_alloc() for nvlists
- 00b4602 Add linux kernel memory support
Ported-by: kernelOfTruth [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5746 more checksumming in zfs send
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Bayard Bell <[email protected]>
Approved by: Albert Lee <[email protected]>
References:
https://www.illumos.org/issues/5746
https://github.com/illumos/illumos-gate/commit/98110f0
https://github.com/zfsonlinux/zfs/issues/905
Porting notes:
- Minor conflicts due to:
- https://github.com/zfsonlinux/zfs/commit/2024041
- https://github.com/zfsonlinux/zfs/commit/044baf0
- https://github.com/zfsonlinux/zfs/commit/88904bb
- Fix ISO C90 warnings (-Werror=declaration-after-statement)
- arc_buf_t *abuf;
- dmu_buf_t *bonus;
- zio_cksum_t cksum_orig;
- zio_cksum_t *cksump;
- Fix format '%llx' format specifier warning
- Align message in zstreamdump safe_malloc() with upstream
Ported-by: kernelOfTruth [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3611
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to the pool. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.
The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on disk.
Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, remove the transaction abort, and call dmu_tx_commit()
unconditionally. This is consistent practice with other callers
of sa_update().
Signed-off-by: Ned Bass <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes #4150
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5745 zfs set allows only one dataset property to be set at a time
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Bayard Bell <[email protected]>
Reviewed by: Richard PALO <[email protected]>
Reviewed by: Steven Hartland <[email protected]>
Approved by: Rich Lowe <[email protected]>
References:
https://www.illumos.org/issues/5745
https://github.com/illumos/illumos-gate/commit/3092556
Porting notes:
- Fix the missing braces around initializer, zfs_cmd_t zc = {"\0"};
- Remove extra format argument in zfs_do_set()
- Declare at the top:
- zfs_prop_t prop;
- nvpair_t *elem;
- nvpair_t *next;
- int i;
- Additionally initialize:
- int added_resv = 0;
- zfs_prop_t prop = 0;
- Assign 0 install of NULL for uint64_t types.
- zc->zc_nvlist_conf = '\0';
- zc->zc_nvlist_src = '\0';
- zc->zc_nvlist_dst = '\0';
Ported-by: kernelOfTruth [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3574
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both lock types were introduced in SPL to allow some locks to be
taken/released with linux lockdep turned off. See SPL commit for
details.
Add the new lock types to zfs_context.h to allow user space compilation.
Depends on SPL commit 692ae8d
SPL pull request refs/pull/480/head
Signed-off-by: Olaf Faaland <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3895
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This deadlock may manifest itself in slightly different ways but
at the core it is caused by a memory allocation blocking on file-
system reclaim in the zio pipeline. This is normally impossible
because zio_execute() disables filesystem reclaim by setting
PF_FSTRANS on the thread. However, kmem cache allocations may
still indirectly block on file system reclaim while holding the
critical vq->vq_lock as shown below.
To resolve this issue zio_buf_alloc_flags() is introduced which
allocation flags to be passed. This can then be used in
vdev_queue_aggregate() with KM_NOSLEEP when allocating the
aggregate IO buffer. Since aggregating the IO is purely a
performance optimization we want this to either succeed or fail
quickly. Trying too hard to allocate this memory under the
vq->vq_lock can negatively impact performance and result in
this deadlock.
* z_wr_iss
zio_vdev_io_start
vdev_queue_io -> Takes vq->vq_lock
vdev_queue_io_to_issue
vdev_queue_aggregate
zio_buf_alloc -> Waiting on spl_kmem_cache process
* z_wr_int
zio_vdev_io_done
vdev_queue_io_done
mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss
* txg_sync
spa_sync
dsl_pool_sync
zio_wait -> Waiting on zio being handled by z_wr_int
* spl_kmem_cache
spl_cache_grow_work
kv_alloc
spl_vmalloc
...
evict
zpl_evict_inode
zfs_inactive
dmu_tx_wait
txg_wait_open -> Waiting on txg_sync
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #3808
Closes #3867
|
|
|
|
|
|
|
|
|
| |
Since uio now supports bvec, we can convert bio into uio and reuse
dmu_{read,write}_uio. This way, we can remove some duplicate code.
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #4078
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
objsetid is not unique across pool, so using it solely as key would cause
panic when automounting two snapshot on different pools with the same
objsetid. We fix this by adding spa pointer as additional key.
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Issue #3948
Issue #3786
Issue #3887
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Alex Reece <[email protected]>
Approved by: Richard Lowe <[email protected]>
References:
https://www.illumos.org/issues/5959
https://github.com/illumos/illumos-gate/commit/ca0cc39
Porting notes:
illumos code doesn't check for feature_get_refcount() returning
ENOTSUP (which means feature is disabled) in zdb. zfsonlinux added
a check in https://github.com/zfsonlinux/zfs/commit/784652c
due to #3468. The check was reintroduced here.
Ported-by: Witaut Bajaryn <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3965
|
|
|
|
|
|
|
|
|
|
|
| |
Provide a generic interface to prefetch ZAP entries by name. This
functionality is being added for external consumers such as Lustre.
It is based of the existing zap_prefetch_uint64() version which is
used by the deduplication code.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes #4061
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adding additional entries to the efi conversion array will help prevent
the overwriting of the GPTs of disks with in-use file systems in more
cases. Most notably, this adds partition type 8300 "Linux filesystem"
(0FC63DAF-8483-4772-8E79-3D69D8477DE4), which is often used for ext4 and
btrfs, among others.
This commit itself does nothing to address the underlying problematic
behavior that check_slice() isn't called on partitions of an
unrecognized type, even when they contain a currently mounted file
system.
The additional entries were derived from these two resources:
https://en.wikipedia.org/wiki/GUID_Partition_Table
http://sourceforge.net/p/gptfdisk/code/ci/master/tree/parttypes.cc
Signed-off-by: ilovezfs <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #4016
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewed by: Alexander Eremin <[email protected]>
Reviewed by: Garrett D'Amore <[email protected]>
Reviewed by: Andrew Stormont <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Approved by: Gordon Ross <[email protected]>
References:
https://www.illumos.org/issues/934
https://github.com/illumos/illumos-gate/commit/e21ea67
Ported-by: ilovezfs <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #4016
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the SET_ERROR tracepoint triggers regardless of whether there
is an error or not. On Illumos, SET_ERROR only triggers on an actual
error, which is avoids irrelevant noise. Linux 2.6.38 added support for
conditional tracepoints, so we modify SET_ERROR to use them when they
are avaliable for functionality equivalent to the Illumos functionality.
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #4043
|
|
|
|
|
|
|
|
|
|
| |
The xattr_hander->{list,get,set} were changed to take a xattr_handler,
and handler_flags argument was removed and should be accessed by
handler->flags.
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #4021
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
6267 dn_bonus evicted too early
Reviewed by: Richard Yao <[email protected]>
Reviewed by: Xin LI <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Approved by: Richard Lowe <[email protected]>
References:
https://www.illumos.org/issues/6267
https://github.com/illumos/illumos-gate/commit/d205810
Signed-off-by: Brian Behlendorf <[email protected]>
Ported-by: Ned Bass [email protected]
Issue #3865
Issue #3443
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit torvalds/linux@4246a0b63bd8f56a1469b12eafeb875b1041a451
("block: add a bi_error field to struct bio") dropped the error
argument from bio_endio in favor of newly introduced bio->bi_error.
This also replaces bio->bi_flags value BIO_UPTODATE.
bio_endio was a 3 argument function until Linux 2.6.24, which made it
a 2 argument function, and now the prototype has changed yet again to
a 1 argument function. Support for pre 2.6.24 kernels was already
dropped with 37f9dac592bf ("zvol processing should use struct bio")
which assumed the 2 argument version in zvol_request(). Remaining code
to support the 3 argument version is hereby removed.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
Issue #3799
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of the large block support effort, it makes sense to add
support for large blocks to **zpios(1)**. The specifying of a zfs
block size for zpios is optional and will default to 128K if the
block size is not specified.
`zpios ... -S size | --blocksize size ...`
This will use *size* ZFS blocks for each test, specified as a comma
delimited list with an optional unit suffix. The supported range is
powers of two from 128K through 16M. A range of block sizes can be
tested as follows: `-S 128K,256K,512K,1M`
Example run below
(non realistic results from a VM and output abbreviated for space)
```
--regioncount=750 --regionsize=8M --chunksize=1M --offset=4K
--threaddelay=0 --cleanup --human-readable --verbose --cleanup
--blocksize=128K,256K,512K,1M
th-cnt rg-cnt rg-sz ch-sz blksz wr-data wr-bw rd-data rd-bw
---------------------------------------------------------------------
4 750 8m 1m 128k 5g 90.06m 5g 93.37m
4 750 8m 1m 256k 5g 79.71m 5g 99.81m
4 750 8m 1m 512k 5g 42.20m 5g 93.14m
4 750 8m 1m 1m 5g 35.51m 5g 89.36m
8 750 8m 1m 128k 5g 85.49m 5g 90.81m
8 750 8m 1m 256k 5g 61.42m 5g 99.24m
8 750 8m 1m 512k 5g 49.09m 5g 108.78m
16 750 8m 1m 128k 5g 86.28m 5g 88.73m
16 750 8m 1m 256k 5g 64.34m 5g 93.47m
16 750 8m 1m 512k 5g 68.84m 5g 124.47m
16 750 8m 1m 1m 5g 53.97m 5g 97.20m
---------------------------------------------------------------------
```
Signed-off-by: Don Brady <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3795
Closes #2071
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZFS incorrectly uses directory-based extended attributes even when
xattr=sa is specified as a dataset property or mount option. Support to
honor temporary mount options including "xattr" was added in commit
0282c4137e7409e6d85289f4955adf07fac834f5. There are two issues with the
mount option handling:
* Libzfs has historically included "xattr" in its list of default mount
options. This overrides the dataset property, so the dataset is always
configured to use directory-based xattrs even when the xattr dataset
property is set to off or sa. Address this by removing "xattr" from
the set of default mount options in libzfs.
* There was no way to enable system attribute-based extended attributes
using temporary mount options. Add the mount options "saxattr" and
"dirxattr" which enable the xattr behavior their names suggest. This
approach has the advantages of mirroring the valid xattr dataset
property values and following existing conventions for mount option
names.
Signed-off-by: Ned Bass <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3787
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
6214 zpools going south
Reviewed by: Igor Kozhukhov <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Reviewed by: Saso Kiselkov <[email protected]>
References:
https://www.illumos.org/issues/6214
http://cr.illumos.org/~webrev/sensille/6214_zpools_going_south/
Porting Notes:
Reintroduce b_compress to the l2arc_buf_hdr_t. In commit b9541d6
the compression flags were moved to the generic b_flags in the
arc_buf_hdr_t. This is a problem because l2arc_compress_buf()
may manipulate the compression flags and this can only be done
safely under the hash lock which is not held. See Illumos 6214
for a detailed analysis of the race.
HDR_GET_COMPRESS() macro was removed from arc_buf_info().
Ported-by: Brian Behlendorf <[email protected]>
Closes #3757
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zfsonlinux/zfs@e20cd6f7a8922709b1aa2ecefd783390102d79e0 caused us to
lose IO accounting on zvols. When I originally wrote that last year, the
symbols we needed to maintain IO accounting were GPL exported, but
torvalds/linux@394ffa503bc40e32d7f54a9b817264e81ce131b4 provided
suitable symbols for restoring this functionality 4 months later. We
can call them to restore the IO accounting on Linux 3.19 and later as
well as any older kernels where that patch is backported.
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3741
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Internally ZFS keeps a small log to facilitate debugging. By default
the log is disabled, to enable it set zfs_dbgmsg_enable=1. The contents
of the log can be accessed by reading the /proc/spl/kstat/zfs/dbgmsg file.
Writing 0 to this proc file clears the log.
$ echo 1 >/sys/module/zfs/parameters/zfs_dbgmsg_enable
$ echo 0 >/proc/spl/kstat/zfs/dbgmsg
$ zpool import tank
$ cat /proc/spl/kstat/zfs/dbgmsg
1 0 0x01 -1 0 2492357525542 2525836565501
timestamp message
1441141408 spa=tank async request task=1
1441141408 txg 70 open pool version 5000; software version 5000/5; ...
1441141409 spa=tank async request task=32
1441141409 txg 72 import pool version 5000; software version 5000/5; ...
1441141414 command: lt-zpool import tank
Note the zfs_dbgmsg() and dprintf() functions are both now mapped to
the same log. As mentioned above the kernel debug log can be accessed
though the /proc/spl/kstat/zfs/dbgmsg kstat. For user space consumers
log messages are immediately written to stdout after applying the
ZFS_DEBUG environment variable.
$ ZFS_DEBUG=on ./cmd/ztest/ztest -V
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ned Bass <[email protected]>
Closes #3728
|
|\
| |
| |
| |
| |
| |
| |
| | |
Performance improvements for zvols.
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3720
|
| |
| |
| |
| | |
Signed-off-by: Richard Yao <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Richard Yao <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Richard Yao <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Richard Yao <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Richard Yao <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Richard Yao <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Richard Yao <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Internally, zvols are files exposed through the block device API. This
is intended to reduce overhead when things require block devices.
However, the ZoL zvol code emulates a traditional block device in that
it has a top half and a bottom half. This is an unnecessary source of
overhead that does not exist on any other OpenZFS platform does this.
This patch removes it. Early users of this patch reported double digit
performance gains in IOPS on zvols in the range of 50% to 80%.
Comments in the code suggest that the current implementation was done to
obtain IO merging from Linux's IO elevator. However, the DMU already
does write merging while arc_read() should implicitly merge read IOs
because only 1 thread is permitted to fetch the buffer into ARC. In
addition, commercial ZFSOnLinux distributions report that regular files
are more performant than zvols under the current implementation, and the
main consumers of zvols are VMs and iSCSI targets, which have their own
elevators to merge IOs.
Some minor refactoring allows us to register zfs_request() as our
->make_request() handler in place of the generic_make_request()
function. This eliminates the layer of code that broke IO requests on
zvols into a top half and a bottom half. This has several benefits:
1. No per zvol spinlocks.
2. No redundant IO elevator processing.
3. Interrupts are disabled only when actually necessary.
4. No redispatching of IOs when all taskq threads are busy.
5. Linux's page out routines will properly block.
6. Many autotools checks become obsolete.
An unfortunate consequence of eliminating the layer that
generic_make_request() is that we no longer calls the instrumentation
hooks for block IO accounting. Those hooks are GPL-exported, so we
cannot call them ourselves and consequently, we lose the ability to do
IO monitoring via iostat. Since zvols are internally files mapped as
block devices, this should be okay. Anyone who is willing to accept the
performance penalty for the block IO layer's accounting could use the
loop device in between the zvol and its consumer. Alternatively, perf
and ftrace likely could be used. Also, tools like latencytop will still
work. Tools such as latencytop sometimes provide a better view of
performance bottlenecks than the traditional block IO accounting tools
do.
Lastly, if direct reclaim occurs during spacemap loading and swap is on
a zvol, this code will deadlock. That deadlock could already occur with
sync=always on zvols. Given that swap on zvols is not yet production
ready, this is not a blocker.
Signed-off-by: Richard Yao <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pre-2.6.37 kernels support REQ_FUA in request flags, but not in BIO
flags. zvols are the only consumer of VDEV_REQ_FUA and since they are
passed requests, they should be obey the REQ_FUA flag like later
kernels. This optimization will only matter on 2.6.36 and 2.6.37 because
the zvol rework changes things to use bio, where we no longer are able
to distinguish on earlier kernels
Signed-off-by: Richard Yao <[email protected]>
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
Add the required kernel side infrastructure to parse arbitrary
mount options. This enables us to support temporary mount
options in largely the same way it is handled on other platforms.
See the 'Temporary Mount Point Properties' section of zfs(8)
for complete details.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #985
Closes #3351
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 49ddb315066e372f31bda29a5c546a9eccc8b418 added the
zfs_arc_average_blocksize parameter to allow control over the size of
the arc hash table. The dbuf hash table's size should be determined
similarly.
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3721
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The LBA weighting makes sense on rotational media where the outer tracks
have twice the bandwidth of the inner tracks. However, it is detrimental
on nonrotational media such as solid state disks, where the only effect
is to ensure that metaslabs enter the best-fit allocation behavior
sooner, which is detrimental to performance. It also makes no sense on
files where the underlying filesystem can arrange things however it
wants.
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3712
|
|
|
|
|
|
|
|
| |
This is needed for supporting kernels earlier than 2.6.30. Support for
those kernels was dropped, so we can safely remove this check.
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
| |
This is needed for supporting kernels earlier than 2.6.30. Support for
those kernels was dropped, so we can safely remove this check.
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-factor the .zfs/snapshot auto-mouting code to take in to account
changes made to the upstream kernels. And to lay the groundwork for
enabling access to .zfs snapshots via NFS clients. This patch makes
the following core improvements.
* All actively auto-mounted snapshots are now tracked in two global
trees which are indexed by snapshot name and objset id respectively.
This allows for fast lookups of any auto-mounted snapshot regardless
without needing access to the parent dataset.
* Snapshot entries are added to the tree in zfsctl_snapshot_mount().
However, they are now removed from the tree in the context of the
unmount process. This eliminates the need complicated error logic
in zfsctl_snapshot_unmount() to handle unmount failures.
* References are now taken on the snapshot entries in the tree to
ensure they always remain valid while a task is outstanding.
* The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right
after the auto-mount succeeds. This allows to kernel to unmount
idle auto-mounted snapshots if needed removing the need for the
zfsctl_unmount_snapshots() function.
* Snapshots in active use will not be automatically unmounted. As
long as at least one dentry is revalidated every zfs_expire_snapshot/2
seconds the auto-unmount expiration timer will be extended.
* Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS
to be immediately unmounted when the dentry was revalidated. This
was a consequence of ZFS invaliding all snapdir dentries to ensure that
negative dentries didn't mask new snapshots. This patch modifies the
behavior such that only negative dentries are invalidated. This solves
the issue and may result in a performance improvement.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3589
Closes #3344
Closes #3295
Closes #3257
Closes #3243
Closes #3030
Closes #2841
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since ZoL allows large blocks to be used by volumes, unlike upstream
illumos, the feature flag must be checked prior to volume creation.
This is critical because unlike filesystems, volumes will create a
object which uses large blocks as part of the create. Therefore, it
cannot be safely checked in zfs_check_settable() after the dataset
can been created.
In addition this patch updates the relevant error messages to use
zfs_nicenum() to print the maximum blocksize.
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3591
|
|
|
|
|
|
|
|
|
| |
Starting from linux-2.6.37, {kmap,kunmap}_atomic takes 1 argument instead of 2.
We use zfs_{kmap,kunmap}_atomic as wrappers and always take 2 argument, but
ignore the 2nd for newer kernel.
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
|
|
|
|
|
|
|
|
| |
See also #3546, commit c1718e9
Signed-off-by: Chris Dunlop <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3673
|
|
|
|
|
|
| |
Signed-off-by: Frédéric VANNIÈRE <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3546
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Under Linux filesystem threads responsible for handling I/O are
normally created with the maximum priority. Non-I/O filesystem
processes run with the default priority. ZFS should adopt the
same priority scheme under Linux to maintain good performance
and so that it will complete fairly when other Linux filesystems
are active. The priorities have been updated to the following:
$ ps -eLo rtprio,cls,pid,pri,nice,cmd | egrep 'z_|spl_|zvol|arc|dbu|meta'
- TS 10743 19 -20 [spl_kmem_cache]
- TS 10744 19 -20 [spl_system_task]
- TS 10745 19 -20 [spl_dynamic_tas]
- TS 10764 19 0 [dbu_evict]
- TS 10765 19 0 [arc_prune]
- TS 10766 19 0 [arc_reclaim]
- TS 10767 19 0 [arc_user_evicts]
- TS 10768 19 0 [l2arc_feed]
- TS 10769 39 0 [z_unmount]
- TS 10770 39 -20 [zvol]
- TS 11011 39 -20 [z_null_iss]
- TS 11012 39 -20 [z_null_int]
- TS 11013 39 -20 [z_rd_iss]
- TS 11014 39 -20 [z_rd_int_0]
- TS 11022 38 -19 [z_wr_iss]
- TS 11023 39 -20 [z_wr_iss_h]
- TS 11024 39 -20 [z_wr_int_0]
- TS 11032 39 -20 [z_wr_int_h]
- TS 11033 39 -20 [z_fr_iss_0]
- TS 11041 39 -20 [z_fr_int]
- TS 11042 39 -20 [z_cl_iss]
- TS 11043 39 -20 [z_cl_int]
- TS 11044 39 -20 [z_ioctl_iss]
- TS 11045 39 -20 [z_ioctl_int]
- TS 11046 39 -20 [metaslab_group_]
- TS 11050 19 0 [z_iput]
- TS 11121 38 -19 [z_wr_iss]
Note that under Linux the meaning of a processes priority is inverted
with respect to illumos. High values on Linux indicate a _low_ priority
while high value on illumos indicate a _high_ priority.
In order to preserve the logical meaning of the minclsyspri and
maxclsyspri macros when they are used by the illumos wrapper functions
their values have been inverted. This way when changes are merged
from upstream illumos we won't need to remember to invert the macro.
It could also lead to confusion.
This patch depends on https://github.com/zfsonlinux/spl/pull/466.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ned Bass <[email protected]>
Closes #3607
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5817 change type of arcs_size from uint64_t to refcount_t
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: Alex Reece <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Approved by: Garrett D'Amore <[email protected]>
References:
https://www.illumos.org/issues/5817
https://github.com/illumos/illumos-gate/commit/2fd872a
Ported-by: Brian Behlendorf <[email protected]>
Issue #3533
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5376 arc_kmem_reap_now() should not result in clearing arc_no_grow
Reviewed by: Christopher Siden <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Steven Hartland <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Approved by: Dan McDonald <[email protected]>
References:
https://www.illumos.org/issues/5376
https://github.com/illumos/illumos-gate/commit/2ec99e3
Porting Notes:
The good news is that many of the recent changes made upstream to the
ARC tackled issues previously observed by ZoL with similar solutions.
The bad news is those solution weren't identical to the ones we applied.
This patch is designed to split the difference and apply as much of the
upstream work as possible.
* The arc_available_memory() function was removed previous in ZoL but
due to the upstream changes it makes sense to add it back. This function
has been customized for Linux so that it can be used to determine a low
memory. This provides the same basic functionality as the illumos version
allowing us to minimize changes through the rest of the code base. The
exact mechanism used to detect a low memory state remains unchanged so
this change isn't a significant as it might first appear.
* This patch includes the long standing fix for arc_shrink() which was
originally proposed in #2167. Since there were related changes to this
function it made sense to include that work.
* The arc_init() function has been re-factored. As before it sets sane
default values for the ARC but then calls arc_tuning_update() to apply
user specific tuning made via module options. The arc_tuning_update()
function is then called periodically by the arc_reclaim_thread() to
apply changes to the tunings made during normal operation.
Ported-by: Brian Behlendorf <[email protected]>
Closes #3616
Closes #2167
|