Missed wakeup when growing kmem cache

When growing the size of a (VMEM or KVMEM) kmem cache, spl_cache_grow() always does taskq_dispatch(spl_cache_grow_work), and then waits for the KMC_BIT_GROWING to be cleared by the taskq thread. The taskq thread (spl_cache_grow_work()) does: 1. allocate new slab and add to list 2. wake_up_all(skc_waitq) 3. clear_bit(KMC_BIT_GROWING) Therefore, the waiting thread can wake up before GROWING has been cleared. It will see that the growing has not yet completed, and go back to sleep until it hits the 100ms timeout. This can have an extreme performance impact on workloads that alloc/free more than fits in the (statically-sized) magazines. These workloads allocate and free slabs with high frequency. The problem can be observed with `funclatency spl_cache_grow`, which on some workloads shows that 99.5% of the time it takes <64us to allocate slabs, but we spend ~70% of our time in outliers, waiting for the 100ms timeout. The fix is to do `clear_bit(KMC_BIT_GROWING)` before `wake_up_all(skc_waitq)`. A future investigation should evaluate if we still actually need to taskq_dispatch() at all, and if so on which kernel versions. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Wilson <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #9989
author: Matthew Ahrens <[email protected]> 2020-02-13 11:23:02 -0800
committer: GitHub <[email protected]> 2020-02-13 11:23:02 -0800
commit: 2adc6b35aebc722d83329f0e340833263c6415aa (patch)
tree: 2ea41f78abee2bf088434f7915192f6914d6b720 /module
parent: 465e4e795ee3cbdc5de862b26d81b2f1116733df (diff)
1 files changed, 6 insertions, 3 deletions
diff --git a/module/os/linux/spl/spl-kmem-cache.c b/module/os/linux/spl/spl-kmem-cache.c
index 7dd8e8543..2ebbb43d1 100644
--- a/module/os/linux/spl/spl-kmem-cache.c
+++ b/module/os/linux/spl/spl-kmem-cache.c
@@ -1185,7 +1185,6 @@ __spl_cache_grow(spl_kmem_cache_t *skc, int flags)
 		smp_mb__before_atomic();
 		clear_bit(KMC_BIT_DEADLOCKED, &skc->skc_flags);
 		smp_mb__after_atomic();
-		wake_up_all(&skc->skc_waitq);
 	}
 	spin_unlock(&skc->skc_lock);
 
@@ -1198,12 +1197,14 @@ spl_cache_grow_work(void *data)
 	spl_kmem_alloc_t *ska = (spl_kmem_alloc_t *)data;
 	spl_kmem_cache_t *skc = ska->ska_cache;
 
-	(void) __spl_cache_grow(skc, ska->ska_flags);
+	int error = __spl_cache_grow(skc, ska->ska_flags);
 
 	atomic_dec(&skc->skc_ref);
 	smp_mb__before_atomic();
 	clear_bit(KMC_BIT_GROWING, &skc->skc_flags);
 	smp_mb__after_atomic();
+	if (error == 0)
+		wake_up_all(&skc->skc_waitq);
 
 	kfree(ska);
 }
@@ -1254,8 +1255,10 @@ spl_cache_grow(spl_kmem_cache_t *skc, int flags, void **obj)
 	 */
 	if (!(skc->skc_flags & KMC_VMEM) && !(skc->skc_flags & KMC_KVMEM)) {
 		rc = __spl_cache_grow(skc, flags | KM_NOSLEEP);
-		if (rc == 0)
+		if (rc == 0) {
+			wake_up_all(&skc->skc_waitq);
 			return (0);
+		}
 	}
 
 	/*
author	Matthew Ahrens <[email protected]>	2020-02-13 11:23:02 -0800
committer	GitHub <[email protected]>	2020-02-13 11:23:02 -0800
commit	2adc6b35aebc722d83329f0e340833263c6415aa (patch)
tree	2ea41f78abee2bf088434f7915192f6914d6b720 /module
parent	465e4e795ee3cbdc5de862b26d81b2f1116733df (diff)