diff options
author | Brian Behlendorf <[email protected]> | 2019-03-29 09:13:20 -0700 |
---|---|---|
committer | GitHub <[email protected]> | 2019-03-29 09:13:20 -0700 |
commit | 1b939560be5c51deecf875af9dada9d094633bf7 (patch) | |
tree | 2a780b838134636ddbc65f89d227e37c74abe17b /config | |
parent | f94b3cbf43d62f4962e71cfe7ba8c6f0602e2a45 (diff) |
Add TRIM support
UNMAP/TRIM support is a frequently-requested feature to help
prevent performance from degrading on SSDs and on various other
SAN-like storage back-ends. By issuing UNMAP/TRIM commands for
sectors which are no longer allocated the underlying device can
often more efficiently manage itself.
This TRIM implementation is modeled on the `zpool initialize`
feature which writes a pattern to all unallocated space in the
pool. The new `zpool trim` command uses the same vdev_xlate()
code to calculate what sectors are unallocated, the same per-
vdev TRIM thread model and locking, and the same basic CLI for
a consistent user experience. The core difference is that
instead of writing a pattern it will issue UNMAP/TRIM commands
for those extents.
The zio pipeline was updated to accommodate this by adding a new
ZIO_TYPE_TRIM type and associated spa taskq. This new type makes
is straight forward to add the platform specific TRIM/UNMAP calls
to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are
handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs.
This makes it possible to largely avoid changing the pipieline,
one exception is that TRIM zio's may exceed the 16M block size
limit since they contain no data.
In addition to the manual `zpool trim` command, a background
automatic TRIM was added and is controlled by the 'autotrim'
property. It relies on the exact same infrastructure as the
manual TRIM. However, instead of relying on the extents in a
metaslab's ms_allocatable range tree, a ms_trim tree is kept
per metaslab. When 'autotrim=on', ranges added back to the
ms_allocatable tree are also added to the ms_free tree. The
ms_free tree is then periodically consumed by an autotrim
thread which systematically walks a top level vdev's metaslabs.
Since the automatic TRIM will skip ranges it considers too small
there is value in occasionally running a full `zpool trim`. This
may occur when the freed blocks are small and not enough time
was allowed to aggregate them. An automatic TRIM and a manual
`zpool trim` may be run concurrently, in which case the automatic
TRIM will yield to the manual TRIM.
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Contributions-by: Saso Kiselkov <[email protected]>
Contributions-by: Tim Chase <[email protected]>
Contributions-by: Chunwei Chen <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #8419
Closes #598
Diffstat (limited to 'config')
-rw-r--r-- | config/kernel-blk-queue-discard.m4 | 65 | ||||
-rw-r--r-- | config/kernel.m4 | 2 |
2 files changed, 67 insertions, 0 deletions
diff --git a/config/kernel-blk-queue-discard.m4 b/config/kernel-blk-queue-discard.m4 new file mode 100644 index 000000000..addbba814 --- /dev/null +++ b/config/kernel-blk-queue-discard.m4 @@ -0,0 +1,65 @@ +dnl # +dnl # 2.6.32 - 4.x API, +dnl # blk_queue_discard() +dnl # +AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_DISCARD], [ + AC_MSG_CHECKING([whether blk_queue_discard() is available]) + ZFS_LINUX_TRY_COMPILE([ + #include <linux/blkdev.h> + ],[ + struct request_queue *q __attribute__ ((unused)) = NULL; + int value __attribute__ ((unused)); + + value = blk_queue_discard(q); + ],[ + AC_MSG_RESULT(yes) + AC_DEFINE(HAVE_BLK_QUEUE_DISCARD, 1, + [blk_queue_discard() is available]) + ],[ + AC_MSG_RESULT(no) + ]) +]) + +dnl # +dnl # 4.8 - 4.x API, +dnl # blk_queue_secure_erase() +dnl # +dnl # 2.6.36 - 4.7 API, +dnl # blk_queue_secdiscard() +dnl # +dnl # 2.6.x - 2.6.35 API, +dnl # Unsupported by kernel +dnl # +AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_SECURE_ERASE], [ + AC_MSG_CHECKING([whether blk_queue_secure_erase() is available]) + ZFS_LINUX_TRY_COMPILE([ + #include <linux/blkdev.h> + ],[ + struct request_queue *q __attribute__ ((unused)) = NULL; + int value __attribute__ ((unused)); + + value = blk_queue_secure_erase(q); + ],[ + AC_MSG_RESULT(yes) + AC_DEFINE(HAVE_BLK_QUEUE_SECURE_ERASE, 1, + [blk_queue_secure_erase() is available]) + ],[ + AC_MSG_RESULT(no) + + AC_MSG_CHECKING([whether blk_queue_secdiscard() is available]) + ZFS_LINUX_TRY_COMPILE([ + #include <linux/blkdev.h> + ],[ + struct request_queue *q __attribute__ ((unused)) = NULL; + int value __attribute__ ((unused)); + + value = blk_queue_secdiscard(q); + ],[ + AC_MSG_RESULT(yes) + AC_DEFINE(HAVE_BLK_QUEUE_SECDISCARD, 1, + [blk_queue_secdiscard() is available]) + ],[ + AC_MSG_RESULT(no) + ]) + ]) +]) diff --git a/config/kernel.m4 b/config/kernel.m4 index e4d0e3393..f7d657e0c 100644 --- a/config/kernel.m4 +++ b/config/kernel.m4 @@ -164,6 +164,8 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [ ZFS_AC_KERNEL_IN_COMPAT_SYSCALL ZFS_AC_KERNEL_KTIME_GET_COARSE_REAL_TS64 ZFS_AC_KERNEL_TOTALRAM_PAGES_FUNC + ZFS_AC_KERNEL_BLK_QUEUE_DISCARD + ZFS_AC_KERNEL_BLK_QUEUE_SECURE_ERASE AS_IF([test "$LINUX_OBJ" != "$LINUX"], [ KERNEL_MAKE="$KERNEL_MAKE O=$LINUX_OBJ" |