Illumos 5008 - lock contention (rrw_exit) while running a read only load

5008 lock contention (rrw_exit) while running a read only load Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Alex Reece <[email protected]> Reviewed by: Christopher Siden <[email protected]> Reviewed by: Richard Yao <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Approved by: Garrett D'Amore <[email protected]> Porting notes: This patch ported perfectly cleanly to ZoL. During testing 100% cached small-block reads, extreme contention was noticed on rrl->rr_lock from rrw_exit() due to the frequent entering and leaving ZPL. Illumos picked up this patch from FreeBSD and it also helps under Linux. On a 1-minute 4K cached read test with 10 fio processes pinned to a single socket on a 4-socket (10 thread per socket) NUMA system, contentions on rrl->rr_lock were reduced from 508799 to 43085. Ported-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3555
author: Alexander Motin <[email protected]> 2014-07-18 08:53:38 -0800
committer: Brian Behlendorf <[email protected]> 2015-07-06 09:34:13 -0700
commit: e16b3fcc610fab2dcf3381486b2640dc2a2213cb (patch)
tree: 8571cd7ae1db3137b7f36ae93ff07f447a16fecc /module/zfs/rrwlock.c
parent: 4bda3bd0e72d582a785b6552ce16b99e04414fbe (diff)
1 files changed, 88 insertions, 0 deletions
diff --git a/module/zfs/rrwlock.c b/module/zfs/rrwlock.c
index 29a22534e..51394c01c 100644
--- a/module/zfs/rrwlock.c
+++ b/module/zfs/rrwlock.c
@@ -305,3 +305,91 @@ rrw_tsd_destroy(void *arg)
 		    (void *)curthread, (void *)rn->rn_rrl);
 	}
 }
+
+/*
+ * A reader-mostly lock implementation, tuning above reader-writer locks
+ * for hightly parallel read acquisitions, while pessimizing writes.
+ *
+ * The idea is to split single busy lock into array of locks, so that
+ * each reader can lock only one of them for read, depending on result
+ * of simple hash function.  That proportionally reduces lock congestion.
+ * Writer same time has to sequentially aquire write on all the locks.
+ * That makes write aquisition proportionally slower, but in places where
+ * it is used (filesystem unmount) performance is not critical.
+ *
+ * All the functions below are direct wrappers around functions above.
+ */
+void
+rrm_init(rrmlock_t *rrl, boolean_t track_all)
+{
+	int i;
+
+	for (i = 0; i < RRM_NUM_LOCKS; i++)
+		rrw_init(&rrl->locks[i], track_all);
+}
+
+void
+rrm_destroy(rrmlock_t *rrl)
+{
+	int i;
+
+	for (i = 0; i < RRM_NUM_LOCKS; i++)
+		rrw_destroy(&rrl->locks[i]);
+}
+
+void
+rrm_enter(rrmlock_t *rrl, krw_t rw, void *tag)
+{
+	if (rw == RW_READER)
+		rrm_enter_read(rrl, tag);
+	else
+		rrm_enter_write(rrl);
+}
+
+/*
+ * This maps the current thread to a specific lock.  Note that the lock
+ * must be released by the same thread that acquired it.  We do this
+ * mapping by taking the thread pointer mod a prime number.  We examine
+ * only the low 32 bits of the thread pointer, because 32-bit division
+ * is faster than 64-bit division, and the high 32 bits have little
+ * entropy anyway.
+ */
+#define	RRM_TD_LOCK()	(((uint32_t)(uintptr_t)(curthread)) % RRM_NUM_LOCKS)
+
+void
+rrm_enter_read(rrmlock_t *rrl, void *tag)
+{
+	rrw_enter_read(&rrl->locks[RRM_TD_LOCK()], tag);
+}
+
+void
+rrm_enter_write(rrmlock_t *rrl)
+{
+	int i;
+
+	for (i = 0; i < RRM_NUM_LOCKS; i++)
+		rrw_enter_write(&rrl->locks[i]);
+}
+
+void
+rrm_exit(rrmlock_t *rrl, void *tag)
+{
+	int i;
+
+	if (rrl->locks[0].rr_writer == curthread) {
+		for (i = 0; i < RRM_NUM_LOCKS; i++)
+			rrw_exit(&rrl->locks[i], tag);
+	} else {
+		rrw_exit(&rrl->locks[RRM_TD_LOCK()], tag);
+	}
+}
+
+boolean_t
+rrm_held(rrmlock_t *rrl, krw_t rw)
+{
+	if (rw == RW_WRITER) {
+		return (rrw_held(&rrl->locks[0], rw));
+	} else {
+		return (rrw_held(&rrl->locks[RRM_TD_LOCK()], rw));
+	}
+}
author	Alexander Motin <[email protected]>	2014-07-18 08:53:38 -0800
committer	Brian Behlendorf <[email protected]>	2015-07-06 09:34:13 -0700
commit	e16b3fcc610fab2dcf3381486b2640dc2a2213cb (patch)
tree	8571cd7ae1db3137b7f36ae93ff07f447a16fecc /module/zfs/rrwlock.c
parent	4bda3bd0e72d582a785b6552ce16b99e04414fbe (diff)