diff options
author | Brian Behlendorf <[email protected]> | 2016-11-30 14:48:16 -0700 |
---|---|---|
committer | GitHub <[email protected]> | 2016-11-30 14:48:16 -0700 |
commit | 7657defc48b7c47a8bf0c8f21c78783d293dc5ed (patch) | |
tree | ec6ebdcc7289bc707076205314737cf04fc3bfc0 /module/zfs/dbuf.c | |
parent | ce43e88dd65509a4cf62c4acc76619e571d8518a (diff) | |
parent | 982957483450d53683681f456d1c84cfeb56afad (diff) |
Introduce ARC Buffer Data (ABD)
ZFS currently uses ARC buffers which are backed by virtual memory.
While functional, there are some major problems with this approach
which can be observed on all OpenZFS platforms. ABD was designed
to address these issues and includes contributions from OpenZFS
developers from multiple platforms.
While all OpenZFS platforms will benefit from ABD this functionality
is critical for Linux. Unlike the other OpenZFS platforms the Linux
kernel discourages extensive use of virtual memory. The provided
interfaces are not optimized for frequent allocations from the virtual
address space. To maintain good performance a kmem cache is
used which contains relatively long lived slabs backed by virtual
memory. The downside to the approach is that those slabs can
become highly fragmented resulting in an inefficient use of memory.
Another issue is that on 32-bit systems the available virtual
address space in the kernel is only a small fraction of total
system memory. This means the ARC size is highly constrained
which hurts performance and make allocating memory difficult
and OOMs more likely.
ABD is designed to address these issues by using scatter lists
of pages for data buffers. This removes the need for slabs
which resolves the fragmentation issue. It also allows high
memory pages to be allocated which alleviates the virtual
address space pressure on 32-bit systems.
For metadata buffers, which are small, linear ABDs are allocated
from the slab. This is preferable because there are many places
in the code which expect to be able to read from a given offset
in the buffer. Using linear ABDs means none of that code needs
to be modified. The majority of these buffers are allocated with
kmalloc so there's minimal impact of the virtual address space.
Tested-by: Kash Pande <[email protected]>
Tested-by: kernelOfTruth <[email protected]>
Tested-by: RageLtMan <rageltman@sempervictus>
Tested-by: DHE <[email protected]>
Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Dan Kimmel <[email protected]>
Reviewed-by: David Quigley <[email protected]>
Reviewed-by: Gvozden Neskovic <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Isaac Huang <[email protected]>
Reviewed-by: Jinshan Xiong <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3441
Closes #5135
Diffstat (limited to 'module/zfs/dbuf.c')
-rw-r--r-- | module/zfs/dbuf.c | 7 |
1 files changed, 6 insertions, 1 deletions
diff --git a/module/zfs/dbuf.c b/module/zfs/dbuf.c index 1d8c0518a..6e7a5a0fb 100644 --- a/module/zfs/dbuf.c +++ b/module/zfs/dbuf.c @@ -46,6 +46,7 @@ #include <sys/range_tree.h> #include <sys/trace_dbuf.h> #include <sys/callb.h> +#include <sys/abd.h> struct dbuf_hold_impl_data { /* Function arguments */ @@ -3709,6 +3710,9 @@ dbuf_write_override_done(zio_t *zio) mutex_exit(&db->db_mtx); dbuf_write_done(zio, NULL, db); + + if (zio->io_abd != NULL) + abd_put(zio->io_abd); } /* Issue I/O to commit a dirty buffer to disk. */ @@ -3801,7 +3805,8 @@ dbuf_write(dbuf_dirty_record_t *dr, arc_buf_t *data, dmu_tx_t *tx) * The BP for this block has been provided by open context * (by dmu_sync() or dmu_buf_write_embedded()). */ - void *contents = (data != NULL) ? data->b_data : NULL; + abd_t *contents = (data != NULL) ? + abd_get_from_buf(data->b_data, arc_buf_size(data)) : NULL; dr->dr_zio = zio_write(zio, os->os_spa, txg, &dr->dr_bp_copy, contents, db->db.db_size, db->db.db_size, |