summaryrefslogtreecommitdiffstats
path: root/module/zfs/abd.c
Commit message (Collapse)AuthorAgeFilesLines
* single-chunk scatter ABDs can be treated as linearMatthew Ahrens2019-06-111-41/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scatter ABD's are allocated from a number of pages. In contrast to linear ABD's, these pages are disjoint in the kernel's virtual address space, so they can't be accessed as a contiguous buffer. Therefore routines that need a linear buffer (e.g. abd_borrow_buf() and friends) must allocate a separate linear buffer (with zio_buf_alloc()), and copy the contents of the pages to/from the linear buffer. This can have a measurable performance overhead on some workloads. https://github.com/zfsonlinux/zfs/commit/87c25d567fb7969b44c7d8af63990e ("abd_alloc should use scatter for >1K allocations") increased the use of scatter ABD's, specifically switching 1.5K through 4K (inclusive) buffers from linear to scatter. For workloads that access blocks whose compressed sizes are in this range, that commit introduced an additional copy into the read code path. For example, the sequential_reads_arc_cached tests in the test suite were reduced by around 5% (this is doing reads of 8K-logical blocks, compressed to 3K, which are cached in the ARC). This commit treats single-chunk scattered buffers as linear buffers, because they are contiguous in the kernel's virtual address space. All single-page (4K) ABD's can be represented this way. Some multi-page ABD's can also be represented this way, if we were able to allocate a single "chunk" (higher-order "page" which represents a power-of-2 series of physically-contiguous pages). This is often the case for 2-page (8K) ABD's. Representing a single-entry scatter ABD as a linear ABD has the performance advantage of avoiding the copy (and allocation) in abd_borrow_buf_copy / abd_return_buf_copy. A performance increase of around 5% has been observed for ARC-cached reads (of small blocks which can take advantage of this), fixing the regression introduced by 87c25d567. Note that this optimization is only possible because all physical memory is always mapped into the kernel's address space. This is not the case for HIGHMEM pages, so the optimization can not be made on 32-bit systems. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8580
* abd_alloc should use scatter for >1K allocationsMatthew Ahrens2019-02-281-2/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | abd_alloc() normally does scatter allocations, thus solving the problem that ABD originally set out to: the bulk of ZFS's allocations are single pages, which are faster to allocate and free, and don't suffer from internal fragmentation (and the inability to reclaim memory because some buffers in the slab are still allocated). However, the current code does linear allocations for 4KB and smaller allocations, defeating the purpose of ABD. Scatter ABD's use at least one page each, so sub-page allocations waste some space when allocated as scatter (e.g. 2KB scatter allocation wastes half of each page). Using linear ABD's for small allocations means that they will be put on slabs which contain many allocations. This can improve memory efficiency, but it also makes it much harder for ARC evictions to actually free pages, because all the buffers on one slab need to be freed in order for the slab (and underlying pages) to be freed. Typically, 512B and 1KB kmem caches have 16 buffers per slab, so it's possible for them to actually waste more memory than scatter (one page per buf = wasting 3/4 or 7/8th; one buf per slab = wasting 15/16th). Spill blocks are typically 512B and are heavily used on systems running selinux with the default dnode size and the `xattr=sa` property set. By default we will use linear allocations for 512B and 1KB, and scatter allocations for larger (1.5KB and up). Reviewed-by: George Melikov <[email protected]> Reviewed-by: DHE <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Don Brady <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8455
* Prefix all refcount functions with zfs_Tim Schumacher2018-10-011-11/+11
| | | | | | | | | | | | Recent changes in the Linux kernel made it necessary to prefix the refcount_add() function with zfs_ due to a name collision. To bring the other functions in line with that and to avoid future collisions, prefix the other refcount functions as well. Reviewed by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Schumacher <[email protected]> Closes #7963
* Update build system and packagingBrian Behlendorf2018-05-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Minimal changes required to integrate the SPL sources in to the ZFS repository build infrastructure and packaging. Build system and packaging: * Renamed SPL_* autoconf m4 macros to ZFS_*. * Removed redundant SPL_* autoconf m4 macros. * Updated the RPM spec files to remove SPL package dependency. * The zfs package obsoletes the spl package, and the zfs-kmod package obsoletes the spl-kmod package. * The zfs-kmod-devel* packages were updated to add compatibility symlinks under /usr/src/spl-x.y.z until all dependent packages can be updated. They will be removed in a future release. * Updated copy-builtin script for in-kernel builds. * Updated DKMS package to include the spl.ko. * Updated stale AUTHORS file to include all contributors. * Updated stale COPYRIGHT and included the SPL as an exception. * Renamed README.markdown to README.md * Renamed OPENSOLARIS.LICENSE to LICENSE. * Renamed DISCLAIMER to NOTICE. Required code changes: * Removed redundant HAVE_SPL macro. * Removed _BOOT from nvpairs since it doesn't apply for Linux. * Initial header cleanup (removal of empty headers, refactoring). * Remove SPL repository clone/build from zimport.sh. * Use of DEFINE_RATELIMIT_STATE and DEFINE_SPINLOCK removed due to build issues when forcing C99 compilation. * Replaced legacy ACCESS_ONCE with READ_ONCE. * Include needed headers for `current` and `EXPORT_SYMBOL`. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> TEST_ZIMPORT_SKIP="yes" Closes #7556
* Update for cppcheck v1.80Brian Behlendorf2017-11-181-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resolve new warnings and errors from cppcheck v1.80. * [lib/libshare/libshare.c:543]: (warning) Possible null pointer dereference: protocol * [lib/libzfs/libzfs_dataset.c:2323]: (warning) Possible null pointer dereference: srctype * [lib/libzfs/libzfs_import.c:318]: (error) Uninitialized variable: link * [module/zfs/abd.c:353]: (error) Uninitialized variable: sg * [module/zfs/abd.c:353]: (error) Uninitialized variable: i * [module/zfs/abd.c:385]: (error) Uninitialized variable: sg * [module/zfs/abd.c:385]: (error) Uninitialized variable: i * [module/zfs/abd.c:553]: (error) Uninitialized variable: i * [module/zfs/abd.c:553]: (error) Uninitialized variable: sg * [module/zfs/abd.c:763]: (error) Uninitialized variable: i * [module/zfs/abd.c:763]: (error) Uninitialized variable: sg * [module/zfs/abd.c:305]: (error) Uninitialized variable: tmp_page * [module/zfs/zpl_xattr.c:342]: (warning) Possible null pointer dereference: value * [module/zfs/zvol.c:208]: (error) Uninitialized variable: p Convert the following suppression to inline. * [module/zfs/zfs_vnops.c:840]: (error) Possible null pointer dereference: aiov Exclude HAVE_UIO_ZEROCOPY and HAVE_DNLC from analysis since these macro's will never be defined until this functionality is implemented. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6879
* Undo c89 workarounds to match with upstreamDon Brady2017-11-041-9/+5
| | | | | | | | | With PR 5756 the zfs module now supports c99 and the remaining past c89 workarounds can be undone. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #6816
* Fix abdstats kstat on 32-bit systemsBrian Behlendorf2017-10-061-2/+2
| | | | | | | | When decrementing the struct_size and scatter_chunk_waste kstats the value needs to be cast to an int on 32-bit systems. Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6721
* minor improvement to abd_free_pages()jxiong2017-05-021-8/+6
| | | | | | | | | | | It doesn't need to have a loop to free page in a single scatterlist entry because it should be single or compound page. The pages can be freed in one invocation to __free_pages() for both cases. Reviewed-by: Gvozden Neskovic <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Yao <[email protected]> Signed-off-by: Jinshan Xiong <[email protected]> Closes #6057
* Remove dependency on linear ABDGvozden Neskovic2017-03-291-1/+1
| | | | | | | | | | | | | | | Wherever possible it's best to avoid depending on a linear ABD. Update the code accordingly in the following areas. - vdev_raidz - zio, zio_checksum - zfs_fm - change abd_alloc_for_io() to use abd_alloc() Reviewed-by: David Quigley <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Gvozden Neskovic <[email protected]> Closes #5668
* codebase style improvements for OpenZFS 6459 portGeorge Melikov2017-01-221-8/+9
|
* Use cstyle -cpP in `make cstyle` checkBrian Behlendorf2016-12-121-3/+4
| | | | | | | | | | | | | | | | | | | | | | | Enable picky cstyle checks and resolve the new warnings. The vast majority of the changes needed were to handle minor issues with whitespace formatting. This patch contains no functional changes. Non-whitespace changes are as follows: * 8 times ; to { } in for/while loop * fix missing ; in cmd/zed/agents/zfs_diagnosis.c * comment (confim -> confirm) * change endline , to ; in cmd/zpool/zpool_main.c * a number of /* BEGIN CSTYLED */ /* END CSTYLED */ blocks * /* CSTYLED */ markers * change == 0 to ! * ulong to unsigned long in module/zfs/dsl_scan.c * rearrangement of module_param lines in module/zfs/metaslab.c * add { } block around statement after for_each_online_node Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: HÃ¥kan Johansson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #5465
* Fix coverity defects: CID 154617luozhengzheng2016-12-081-1/+1
| | | | | | | | | | | | | | CID 154617: Memory - illegal accesses (UNINIT) The value here just needs to be initialized to make Coverity happy. When dsize == 0, then value of daiter.iter_mapaddr is irrelevant. That address won't be accessed, it's only used for some arithmetic. dsize can be zero either if dabd is null, or if code column is longer than the current data column. Reviewed-by: Gvozden Neskovic <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5437
* Fix incorrect operator in abd_alloc_sametype()luozhengzheng2016-12-011-1/+1
| | | | | | | | This should be & and not | so is_metadata is set correctly. Reviewed-by: Dan Kimmel <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5438
* ABD optimized page allocation codeChunwei Chen2016-11-291-131/+412
| | | | | | | | | | | | | | | | | | | | | | | | | | * Convert ABD to use the Linux Kernel scatterlist implementation instead of the hand rolled one from illumos. * Scatter ABDs are preferentially populated with higher order compound pages from a single zone. Allocation size is progressively decreased until it can be satisfied without performing reclaim or compaction. * An alternate page allocator is provided for kernels older than 3.6 and for CONFIG_HIGHMEM systems. This allocator is designed as a fallback for maximum compatibility. * Extended abdstats to provide visibility in the the allocator. * Add cached value for PAGESIZE in userspace. Contributions-by: Chunwei Chen <[email protected]> Gvozden Neskovic <[email protected]> Jinshan Xiong <[email protected]> Isaac Huang <[email protected]> David Quigley <[email protected]> Brian Behlendorf <[email protected]>
* ABD kmap to kmap_atomicChunwei Chen2016-11-291-35/+49
| | | | | Convert usage of kmap to kmap_atomic while correctly saving off irq state.
* ABD changes for vectorized RAIDZGvozden Neskovic2016-11-291-8/+190
| | | | | | | | | | * userspace: aligned buffers. Minimum of 32B alignment is needed for AVX2. Kernel buffers are aligned 512B or more. * add abd_get_offset_size() interface * abd_iter_map(): fix calculation of iter_mapsize * add abd_raidz_gen_iterate() and abd_raidz_rec_iterate() Signed-off-by: Gvozden Neskovic <[email protected]>
* ABD page support to vdev_disk.cIsaac Huang2016-11-291-1/+59
| | | | Signed-off-by: Isaac Huang <[email protected]>
* DLPX-44812 integrate EP-220 large memory scalabilityDavid Quigley2016-11-291-0/+1008