summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* OpenZFS 6328 - Fix cstyle errors in zfs codebaseGeorge Melikov2017-01-1216-45/+40
| | | | | | | | | | | | | | Authored by: Paul Dagnelie <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Alex Reece <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Jorgen Lundman <[email protected]> Approved by: Robert Mustacchi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: George Melikov <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/6328 OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/9a686fb Closes #5579
* Further work on Github usability (issue templates)George Melikov2017-01-031-3/+12
| | | | | | | | Make issue template more obvious about importance of searching the issue tracker first, and wrap logs appropriately. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Closes #5542
* Fix TypeError: unorderable types: str() > int() in arc_summary.pyJohnny Stenback2017-01-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | Running arc_summary.py with a l2arc cache device around produces the following error: Traceback (most recent call last): File "/usr/bin/arc_summary.py", line 1148, in <module> main() File "/usr/bin/arc_summary.py", line 1144, in main page(Kstat) File "/usr/bin/arc_summary.py", line 724, in _l2arc_summary arc["l2_arc_evicts"]["reading"] > 0: TypeError: unorderable types: str() > int() This is due to arc["l2_arc_evicts"]['lock_retries'] and arc["l2_arc_evicts"]["reading"] both being strings, returned from fHits() earlier. Rather than adding them up and checking if the result is > 0, this checks if either string is != '0'. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Closes #5538
* OpenZFS 7259 - DS_FIELD_LARGE_BLOCKS is unusedGeorge Melikov2017-01-031-7/+0
| | | | | | | | | | | | | | | | | | Authored by: Matthew Ahrens <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Dan McDonald <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: George Melikov <[email protected]> The DS_FIELD_LARGE_BLOCKS macro has been unused since the integration of this patch: 241b541 Illumos 5959 - clean up per-dataset feature count code. This patch simply removes this macro from dsl_dataset.h. OpenZFS-issue: https://www.illumos.org/issues/7259 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/faa8036 Closes #5544
* Fix spellingka72017-01-03132-170/+170
| | | | | | | | | Reviewed-by: Brian Behlendorf <[email protected] Reviewed-by: Giuseppe Di Natale <[email protected]>> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Haakan T Johansson <[email protected]> Closes #5547 Closes #5543
* 4.10 compat - BIO flag changes and othersTim Chase2016-12-306-19/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [bio] The req_op enum was changed to req_opf. Update the "Linux 4.8 API" autotools checks to use an int to determine whether the various REQ_OP values are defined. This should work properly on kernels >= 4.8. [bio] bio_set_op_attrs() is now an inline function and can't be detected with #ifdef. Add a configure check to determine whether bio_set_op_attrs() is defined. Move the local definition of it from vdev_disk.c to blkdev_compat.h for consistency with other related compability shims. [bio] The read/write flags and their modifiers, including WRITE_FLUSH, WRITE_FUA and WRITE_FLUSH_FUA have been removed from fs.h. Add the new bio_set_flush() compatibility wrapper to replace VDEV_WRITE_FLUSH_FUA and set the flags appropriately for each supported kernel version. [vfs] The generic_readlink() function has been made static. If .readlink in inode_operations is NULL, generic_readlink() is used. [zol typo] Completely unrelated to 4.10 compat, fix a typo in the check for REQ_OP_SECURE_ERASE so that the proper macro is defined: s/HAVE_REQ_OP_SECURE_DISCARD/HAVE_REQ_OP_SECURE_ERASE/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #5499
* Don't persist temporary pool name on devicesLOLi2016-12-224-2/+94
| | | | | | | | | | | Fix a regression accidentally introduced by e0ab3ab. Additionally, add a new script zpool_import_014_pos.ksh to the ZFS test suite to exercise 'zpool import -t' functionality. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #5466 Closes #5515
* Fix coverity defects: CID 147587GeLiXin2016-12-211-0/+7
| | | | | | | | | | | | CID 147587: Out-of-bounds read Future changes may cause an array overrun of 4096 bytes at byte offset 4096 by dereferencing pointer dstp. Adding this additional check ensures correctness. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: GeLiXin <[email protected]> Closes #5297
* Remove extra + from zfs man pagebunder20152016-12-211-1/+1
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: bunder2015 <[email protected]> Closes #5508
* Use a dedicated taskq for vdev_fileChunwei Chen2016-12-213-2/+24
| | | | | | | | | | | | | | | | | The introduction of parallel zvol prefetch causes deadlock when using vdev_file. spa_async->(spa_namespace_lock)->txg_wait_synced->(wait for txg_sync) txg_sync->zio_wait->(wait for vdev_file_io_fsync on system_taskq) zvol_prefetch_minors_impl (on system_taskq)->spa_open_common->(wait for spa_namespace_lock) We fix this by using dedicated taskq for vdev_file. This same change was originally made in commit bc25c93 but reverted in commit aa9af22 when dynamic taskqs were added. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #5506 Closes #5495
* Fix dsl_props_set_sync_impl to work with nested nvlistLOLi2016-12-204-7/+134
| | | | | | | | | | | | | | | When iterating over the input nvlist in dsl_props_set_sync_impl() when we don't preserve the nvpair name before looking up ZPROP_VALUE, so when we later go to process it nvpair_name() is always "value" and not the actual property name. This fixes a couple of bugs in zfs_ioc_recv(): * Received properties were not restored correctly when failing to receive an incremental send stream * Received properties were not completely replaced by the new ones when successfully receiving an incremental send stream Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #5497
* Fix file attributesBrian Behlendorf2016-12-1913-18/+309
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This branch contains the following fixes/improvements. * Fix setting i_flags * Fix wrong operator in xvattr.h * Fix fchange macro in zpl_ioctl_setflags() * Added configure check to use inode_set_flags() * Added a test case for chattr for better test coverage Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #5486 Closes #5470 Closes #5469
| * Add test for chattrChunwei Chen2016-12-168-0/+248
| | | | | | | | Signed-off-by: Chunwei Chen <[email protected]>
| * Use inode_set_flags when availableChunwei Chen2016-12-163-0/+28
| | | | | | | | Signed-off-by: Chunwei Chen <[email protected]>
| * Fix fchange in zpl_ioctl_setflagsChunwei Chen2016-12-161-2/+1
| | | | | | | | | | | | | | | | The fchange in zpl_ioctl_setflags was for detecting flag change. However it was incorrect and would always fail to detect a flag change from set to unset, causing users without CAP_LINUX_IMMUTABLE to be able to unset flags. Signed-off-by: Chunwei Chen <[email protected]>
| * Fix wrong operator in xvattr.hChunwei Chen2016-12-141-5/+5
| | | | | | | | Signed-off-by: Chunwei Chen <[email protected]>
| * Fix i_flags issue caused by 64c688dChunwei Chen2016-12-141-11/+27
| | | | | | | | | | | | | | | | | | Fix zfs_xvattr_set to set S_IMMUTABLE and S_APPEND flags correctly. Reinstate zfs_set_inode_flags and use it when zfs_xvatter_set and also when setting up inode in zfs_znode_alloc and zfs_rezget. Signed-off-by: Chunwei Chen <[email protected]>
* | Fix coverity defects: CID 155008cao2016-12-191-2/+5
| | | | | | | | | | | | | | | | | | | | CID 155008: Resource leaks (RESOURCE_LEAK) Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Gvozden Neskovic <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5500
* | Fix zmo leak when zfs_sb_create failsChunwei Chen2016-12-191-10/+9
| | | | | | | | | | | | | | | | | | zfs_sb_create would normally takes ownership of zmo, and it will be freed in zfs_sb_free. However, when zfs_sb_create fails we need to explicit free it. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #5490 Closes #5496
* | Don't run 'zpool iostat -c CMD' command on all vdevs, if vdevs specifiedTony Hutter2016-12-165-19/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | zpool iostat allows you to specify only certain vdevs to display. Currently, if you run 'zpool iostat -c CMD vdev1 vdev2 ...' on specific vdevs, it will actually run the command on *all* vdevs, and just display the results for the vdevs you specify. This patch corrects the behavior to only run the command on the specified vdevs, and also enables the zpool_iostat_005_pos.ksh tests. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #5443
* | Fix coverity defects: CID 147534cao2016-12-161-2/+2
| | | | | | | | | | | | | | | | CID 147534: Negative array index read Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5467
* | ABD: Adapt avx512bw raidz assemblyGvozden Neskovic2016-12-152-113/+79
|/ | | | | | | | | Adapt avx512bw implementation for use with abd buffers. Mul2 implementation is rewritten to take advantage of the BW instruction set. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Romain Dolbeau <[email protected]> Signed-off-by: Gvozden Neskovic <[email protected]> Closes #5477
* Add ida_destroy in zvol_fini to fix memleakChunwei Chen2016-12-141-0/+2
| | | | | | | | User of ida needs to call ida_destroy after using it. Otherwise ida->free_bitmap and/or other stuff may leak. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #5484
* Skip xfstests on Ubuntu 16.04 and CentOS 7Brian Behlendorf2016-12-141-1/+12
| | | | | | | | | | | | The ZFS enabled versions of xfstests fails to build cleanly on Ubuntu 16.04 and CentOS 7. This issue should be resolved by rebasing the ZFS patches against the latest xfstests and pushing those patches upstream. This would allow us to use an unmodified xfstests. Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #5481 Closes #5482
* Skip slow tests when kmemleak is enabledBrian Behlendorf2016-12-146-1/+38
| | | | | | | | | | | | | | | | When running the ZFS Test Suite with a kmemleak enabled kernel the following test cases run far slower than usual and may hit their timeout threshold. Skip the following test cases. Test: cli_root/zfs_get/zfs_get_009_pos (run as root) [55:43] Test: cli_root/zpool_clear/zpool_clear_001_pos (run as root) [11:32] Test: cli_root/zpool_create/zpool_create_024_pos (run as root) [11:01] Test: features/async_destroy/async_destroy_001_pos (run as root) [41:15] Test: inheritance/inherit_001_pos (run as root) [09:08] Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #5479 Closes #5480
* Fix typos in dbuf.cbunder20152016-12-131-5/+5
| | | | | | | | This removes two large whitespaces in "modinfo zfs" as well as correcting a couple typos. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: bunder2015 <[email protected]> Closes #5475
* Use cstyle -cpP in `make cstyle` checkBrian Behlendorf2016-12-1296-478/+501
| | | | | | | | | | | | | | | | | | | | | | | Enable picky cstyle checks and resolve the new warnings. The vast majority of the changes needed were to handle minor issues with whitespace formatting. This patch contains no functional changes. Non-whitespace changes are as follows: * 8 times ; to { } in for/while loop * fix missing ; in cmd/zed/agents/zfs_diagnosis.c * comment (confim -> confirm) * change endline , to ; in cmd/zpool/zpool_main.c * a number of /* BEGIN CSTYLED */ /* END CSTYLED */ blocks * /* CSTYLED */ markers * change == 0 to ! * ulong to unsigned long in module/zfs/dsl_scan.c * rearrangement of module_param lines in module/zfs/metaslab.c * add { } block around statement after for_each_online_node Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Håkan Johansson <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #5465
* Add CONTRIBUTING information and templatesGeorge Melikov2016-12-094-4/+234
| | | | | | | | | | | | Guidelines for developers and users describing how they can participle in the project. Reviewed-by: Manuel Mendez <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #672 Closes #4776 Closes #5361
* Fix coverity defects: CID 147475liaoyuxiangqin2016-12-091-7/+8
| | | | | | | | CID 147475: Logically dead code (DEADCODE) Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: yuxiang <[email protected]> Closes #5421
* Don't count '@' for dataset namelen if not a snapshotChunwei Chen2016-12-094-3/+67
| | | | | | | | | | | | | | Don't count '@' for dataset namelen if not a snapshot. This fixes making a pool unimportable when the dataset namelen is 255. Add test file for zfs create name length 255. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #5432 Closes #5456
* Fix coverity defects: CID 154617luozhengzheng2016-12-081-1/+1
| | | | | | | | | | | | | | CID 154617: Memory - illegal accesses (UNINIT) The value here just needs to be initialized to make Coverity happy. When dsize == 0, then value of daiter.iter_mapaddr is irrelevant. That address won't be accessed, it's only used for some arithmetic. dsize can be zero either if dabd is null, or if code column is longer than the current data column. Reviewed-by: Gvozden Neskovic <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5437
* Speed up zvol import and export speedBrian Behlendorf2016-12-088-86/+201
|\ | | | | | | | | | | | | | | | | | | | | | | | | Speed up import and export speed by: * Add system delay taskq * Parallel prefetch zvol dnodes during zvol_create_minors * Parallel zvol_free during zvol_remove_minors * Reduce list linear search using ida and hash Reviewed-by: Boris Protopopov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #5433
| * zvol_remove_minors do parallel zvol_freeChunwei Chen2016-12-021-6/+29
| | | | | | | | | | | | | | | | | | On some kernel version, blk_cleanup_queue and put_disk will wait for more then 10ms. So a pool with a lot of zvols will easily wait for more then 1 min if we do zvol_free sequentially. Signed-off-by: Chunwei Chen <[email protected]> Requires-spl: refs/pull/588/head
| * zpool_create_minors parallel prefetchChunwei Chen2016-12-021-10/+89
| | | | | | | | | | | | | | | | Do parallel prefetch all zvol dnodes before actually creating each individual. This will greatly reduce the import time when having a lot of zvols and disk is slow. Signed-off-by: Chunwei Chen <[email protected]>
| * zvol: reduce linear list searchChunwei Chen2016-12-011-49/+65
| | | | | | | | | | | | | | Use kernel ida to generate minor number, and use hash table to find zvol with name. Signed-off-by: Chunwei Chen <[email protected]>
| * Use system_delay_taskq for long delay tasksChunwei Chen2016-12-017-21/+18
| | | | | | | | | | | | | | | | | | Use it for spa_deadman, zpl_posix_acl_free, snapentry_expire. This free system_taskq from the above long delay tasks, and allow us to do taskq_wait_outstanding on system_taskq without being blocked forever, making system_taskq more generic and useful. Signed-off-by: Chunwei Chen <[email protected]>
* | Revert "Disable zio_dva_throttle_enabled by default"Brian Behlendorf2016-12-082-2/+2
| | | | | | | | | | | | | | | | | | | | | | Enable zio_dva_throttle_enabled=1 by default. Subsequent testing has been unable to reproduce the suspected regression. Tested-by: kernelOfTruth [email protected] Reviewed-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf [email protected] Reverts #5335 Closes #5289 Closes #5457
* | Cache ddt_get_dedup_dspace() value if there was no ddt changesGvozden Neskovic2016-12-023-2/+15
| | | | | | | | | | | | | | | | Save and reuse ddt dspace calculation when there have been no ddt changes. This avoids unnecessary traversal of 168KiB of ddt histograms. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Gvozden Neskovic <[email protected]> Closes #5425
* | Refactor txg history kstatBrian Behlendorf2016-12-023-34/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It was observed that even when the txg history is disabled by setting `zfs_txg_history=0` the txg_sync thread still fetches the vdev stats unnecessarily. This patch refactors the code such that vdev_get_stats() is no longer called when `zfs_txg_history=0`. And it further reduces the differences between upstream and the ZoL txg_sync_thread() function. Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #5412
* | Enable mountpoint_003_posChaoyuZhang2016-12-022-21/+33
| | | | | | | | | | | | | | | | Update the test case to correctly interpret how Linux reports the mount options. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: ChaoyuZhang <[email protected]> Closes #5410
* | Skip zpool_scrub_004_pos on 32-bit systemsBrian Behlendorf2016-12-021-0/+5
| | | | | | | | | | | | | | | | | | The zpool_scrub_004_pos test case currently fails when testing on a 32-bit system. Conditionally skip this test case on 32-bit systems until the root cause is identified and resolved. Signed-off-by: Brian Behlendorf <[email protected]> Issue #5444 Closes #5445
* | OpenZFS 7143 - dbuf_read() creates unnecessary zio_root() for bonus bufBrian Behlendorf2016-12-011-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dbuf_read() creates a zio_root() to track and wait for all the zio's that may happen as part of this call. However, if the blkptr_t for this buffer is NULL or a hole, we will not create any more zio's, so this zio_root() is unnecessary. This is always the case when calling dbuf_read() on a bonus buffer, because it has no blkptr (it's part of the containing dnode). For workloads that read a lot of bonus buffers (e.g. file creation and removal), creating and destroying these unnecessary zio's can decrease performance by around 3%. The fix is to only create/destroy the zio_root() in dbuf_read() if the blkptr is not NULL and not a hole. Changes sponsored by Intel Corp. Authored by: Matthew Ahrens <[email protected]> Reviewed-by: Alex Zhuravlev <[email protected]> Ported-by: Brian Behlendorf <[email protected]> Issue openzfs/openzfs#137 Closes #4803 Closes #5382
* | Fix incorrect operator in abd_alloc_sametype()luozhengzheng2016-12-011-1/+1
| | | | | | | | | | | | | | | | This should be & and not | so is_metadata is set correctly. Reviewed-by: Dan Kimmel <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5438
* | Remove unused sa_update_from_cb()cao2016-12-012-23/+0
| | | | | | | | | | | | | | | | | | | | It looks like this was functionality which was added in the original SA implementation and then never needed. It can be safely removed now and easily added back if we find a use for it. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5440
* | Compile zio.h and zio_impl.h mutual includecao2016-12-012-4/+1
| | | | | | | | | | | | | | | | | | zio.h includes zio_impl.h but zio_impl.h also includes zio.h, so the header files to contain each other. Get rid of the zio_impl.h include in zio.h and update zio_inject.c to include zio.h instead of zio_impl.h. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5439
* | Do not force VDEV_NAME_TYPE_ID in max_width()Håkan Johansson2016-11-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not force VDEV_NAME_TYPE_ID in max_width(), instead add it in the relevant calls to max_width(). The first location of max_width() where VDEV_NAME_TYPE_ID is now added in show_import() is followed by print_import_config() and print_logs(). Both these print children vdev names that have been retrieved using an explicit VDEV_NAME_TYPE_ID added. The second location is in status_callback(). This is followed by print_status_config(), print_logs(), print_l2cache(), and print_spares(). For l2cache and spares it should not matter as there are no mirror-X or raidz-X involved. print_status_config() as above retrieves the name using explicit VDEV_NAME_TYPE_ID before calling itself to print children. The call of max_width() in get_namewidth() is not changed, as this is used by zpool_do_iostat(), followed by print_iostat(), which does not add VDEV_NAME_TYPE_ID. Overall, we should consider adding VDEV_NAME_TYPE_ID to the relevant name_flags / cb_name_flags fields, and remove the explicit adding in called routines. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Haakan T Johansson <[email protected]> Closes #5401
* | Convert zio_buf_alloc() consumersBrian Behlendorf2016-11-305-18/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In multiple cases zio_buf_alloc() was used instead of kmem_alloc() or vmem_alloc(). This was often done because the allocations could be large and it was easy to use zfs_buf_alloc() for them. But this isn't ideal for allocations which are small or short lived. In these cases it is better to use kmem_alloc() or vmem_alloc(). If possible we want to avoid the case where we have slabs allocated for kmem caches which are rarely used. Note for small allocations vmem_alloc() will be internally converted to kmem_alloc(). Therefore as long as large allocations are infrequent and short lived the penalty for using vmem_alloc() is small. Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #5409
* | Introduce ARC Buffer Data (ABD)Brian Behlendorf2016-11-3059-2245/+5011
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ZFS currently uses ARC buffers which are backed by virtual memory. While functional, there are some major problems with this approach which can be observed on all OpenZFS platforms. ABD was designed to address these issues and includes contributions from OpenZFS developers from multiple platforms. While all OpenZFS platforms will benefit from ABD this functionality is critical for Linux. Unlike the other OpenZFS platforms the Linux kernel discourages extensive use of virtual memory. The provided interfaces are not optimized for frequent allocations from the virtual address space. To maintain good performance a kmem cache is used which contains relatively long lived slabs backed by virtual memory. The downside to the approach is that those slabs can become highly fragmented resulting in an inefficient use of memory. Another issue is that on 32-bit systems the available virtual address space in the kernel is only a small fraction of total system memory. This means the ARC size is highly constrained which hurts performance and make allocating memory difficult and OOMs more likely. ABD is designed to address these issues by using scatter lists of pages for data buffers. This removes the need for slabs which resolves the fragmentation issue. It also allows high memory pages to be allocated which alleviates the virtual address space pressure on 32-bit systems. For metadata buffers, which are small, linear ABDs are allocated from the slab. This is preferable because there are many places in the code which expect to be able to read from a given offset in the buffer. Using linear ABDs means none of that code needs to be modified. The majority of these buffers are allocated with kmalloc so there's minimal impact of the virtual address space. Tested-by: Kash Pande <[email protected]> Tested-by: kernelOfTruth <[email protected]> Tested-by: RageLtMan <rageltman@sempervictus> Tested-by: DHE <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Dan Kimmel <[email protected]> Reviewed-by: David Quigley <[email protected]> Reviewed-by: Gvozden Neskovic <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Isaac Huang <[email protected]> Reviewed-by: Jinshan Xiong <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3441 Closes #5135
| * | ABD optimized page allocation codeChunwei Chen2016-11-295-136/+457
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Convert ABD to use the Linux Kernel scatterlist implementation instead of the hand rolled one from illumos. * Scatter ABDs are preferentially populated with higher order compound pages from a single zone. Allocation size is progressively decreased until it can be satisfied without performing reclaim or compaction. * An alternate page allocator is provided for kernels older than 3.6 and for CONFIG_HIGHMEM systems. This allocator is designed as a fallback for maximum compatibility. * Extended abdstats to provide visibility in the the allocator. * Add cached value for PAGESIZE in userspace. Contributions-by: Chunwei Chen <[email protected]> Gvozden Neskovic <[email protected]> Jinshan Xiong <[email protected]> Isaac Huang <[email protected]> David Quigley <[email protected]> Brian Behlendorf <[email protected]>
| * | ABD kmap to kmap_atomicChunwei Chen2016-11-291-35/+49
| | | | | | | | | | | | | | | Convert usage of kmap to kmap_atomic while correctly saving off irq state.