FreeBSD r256956: Improve ZFS N-way mirror read performance by using load and locality information.

The existing algorithm selects a preferred leaf vdev based on offset of the zio request modulo the number of members in the mirror. It assumes the devices are of equal performance and that spreading the requests randomly over both drives will be sufficient to saturate them. In practice this results in the leaf vdevs being under utilized. The new algorithm takes into the following additional factors: * Load of the vdevs (number outstanding I/O requests) * The locality of last queued I/O vs the new I/O request. Within the locality calculation additional knowledge about the underlying vdev is considered such as; is the device backing the vdev a rotating media device. This results in performance increases across the board as well as significant increases for predominantly streaming loads and for configurations which don't have evenly performing devices. The following are results from a setup with 3 Way Mirror with 2 x HD's and 1 x SSD from a basic test running multiple parrallel dd's. With pre-fetch disabled (vfs.zfs.prefetch_disable=1): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s With pre-fetch enabled (vfs.zfs.prefetch_disable=0): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s In addition to the performance changes the code was also restructured, with the help of Justin Gibbs, to provide a more logical flow which also ensures vdevs loads are only calculated from the set of valid candidates. The following additional sysctls where added to allow the administrator to tune the behaviour of the load algorithm: * vfs.zfs.vdev.mirror.rotating_inc * vfs.zfs.vdev.mirror.rotating_seek_inc * vfs.zfs.vdev.mirror.rotating_seek_offset * vfs.zfs.vdev.mirror.non_rotating_inc * vfs.zfs.vdev.mirror.non_rotating_seek_inc These changes where based on work started by the zfsonlinux developers: https://github.com/zfsonlinux/zfs/pull/1487 Reviewed by: gibbs, mav, will MFC after: 2 weeks Sponsored by: Multiplay References: https://github.com/freebsd/freebsd@5c7a6f5d https://github.com/freebsd/freebsd@31b7f68d https://github.com/freebsd/freebsd@e186f564 Performance Testing: https://github.com/zfsonlinux/zfs/pull/4334#issuecomment-189057141 Porting notes: - The tunables were adjusted to have ZoL-style names. - The code was modified to use ZoL's vd_nonrot. - Fixes were done to make cstyle.pl happy - Merge conflicts were handled manually - freebsd/freebsd@e186f564bc946f82c76e0b34c2f0370ed9aea022 by my collegue Andriy Gapon has been included. It applied perfectly, but added a cstyle regression. - This replaces 556011dbec2d10579819078559a77630fc559112 entirely. - A typo "IO'a" has been corrected to say "IO's" - Descriptions of new tunables were added to man/man5/zfs-module-parameters.5. Ported-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4334
author: smh <[email protected]> 2016-02-12 20:47:22 -0500
committer: Brian Behlendorf <[email protected]> 2016-02-26 11:24:35 -0800
commit: 9f500936c82137ef3a57c53013894f622dcec14e (patch)
tree: e7594e8f711c255d4bfe699e3fe74bab3acb12aa /module/zfs/vdev_queue.c
parent: a77f29f93c8d016f17d9b77f39662e311774aaae (diff)
1 files changed, 26 insertions, 0 deletions
diff --git a/module/zfs/vdev_queue.c b/module/zfs/vdev_queue.c
index e828ce917..af8af67de 100644
--- a/module/zfs/vdev_queue.c
+++ b/module/zfs/vdev_queue.c
@@ -374,6 +374,8 @@ vdev_queue_init(vdev_t *vd)
 		avl_create(vdev_queue_class_tree(vq, p), compfn,
 			sizeof (zio_t), offsetof(struct zio, io_queue_node));
 	}
+
+	vq->vq_lastoffset = 0;
 }
 
 void
@@ -776,6 +778,30 @@ vdev_queue_io_done(zio_t *zio)
 	mutex_exit(&vq->vq_lock);
 }
 
+/*
+ * As these three methods are only used for load calculations we're not
+ * concerned if we get an incorrect value on 32bit platforms due to lack of
+ * vq_lock mutex use here, instead we prefer to keep it lock free for
+ * performance.
+ */
+int
+vdev_queue_length(vdev_t *vd)
+{
+	return (avl_numnodes(&vd->vdev_queue.vq_active_tree));
+}
+
+uint64_t
+vdev_queue_lastoffset(vdev_t *vd)
+{
+	return (vd->vdev_queue.vq_lastoffset);
+}
+
+void
+vdev_queue_register_lastoffset(vdev_t *vd, zio_t *zio)
+{
+	vd->vdev_queue.vq_lastoffset = zio->io_offset + zio->io_size;
+}
+
 #if defined(_KERNEL) && defined(HAVE_SPL)
 module_param(zfs_vdev_aggregation_limit, int, 0644);
 MODULE_PARM_DESC(zfs_vdev_aggregation_limit, "Max vdev I/O aggregation size");
author	smh <[email protected]>	2016-02-12 20:47:22 -0500
committer	Brian Behlendorf <[email protected]>	2016-02-26 11:24:35 -0800
commit	9f500936c82137ef3a57c53013894f622dcec14e (patch)
tree	e7594e8f711c255d4bfe699e3fe74bab3acb12aa /module/zfs/vdev_queue.c
parent	a77f29f93c8d016f17d9b77f39662e311774aaae (diff)