summaryrefslogtreecommitdiffstats
path: root/module/zfs/zfs_log.c
diff options
context:
space:
mode:
authorMatthew Ahrens <[email protected]>2019-06-10 11:48:42 -0700
committerBrian Behlendorf <[email protected]>2019-06-10 11:48:42 -0700
commitb8738257c2607c73c731ce8e0fd73282b266d6ef (patch)
treed73e90809b9f413b8894d8ee7fc6ad8a11bff7fe /module/zfs/zfs_log.c
parent5a902f5aaa1fbf6f7e459ec29f6d1d988ec78b0a (diff)
make zil max block size tunable
We've observed that on some highly fragmented pools, most metaslab allocations are small (~2-8KB), but there are some large, 128K allocations. The large allocations are for ZIL blocks. If there is a lot of fragmentation, the large allocations can be hard to satisfy. The most common impact of this is that we need to check (and thus load) lots of metaslabs from the ZIL allocation code path, causing sync writes to wait for metaslabs to load, which can take a second or more. In the worst case, we may not be able to satisfy the allocation, in which case the ZIL will resort to txg_wait_synced() to ensure the change is on disk. To provide a workaround for this, this change adds a tunable that can reduce the size of ZIL blocks. External-issue: DLPX-61719 Reviewed-by: George Wilson <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #8865
Diffstat (limited to 'module/zfs/zfs_log.c')
-rw-r--r--module/zfs/zfs_log.c11
1 files changed, 9 insertions, 2 deletions
diff --git a/module/zfs/zfs_log.c b/module/zfs/zfs_log.c
index 15c396ce0..ad5b5cf30 100644
--- a/module/zfs/zfs_log.c
+++ b/module/zfs/zfs_log.c
@@ -20,7 +20,7 @@
*/
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
- * Copyright (c) 2015 by Delphix. All rights reserved.
+ * Copyright (c) 2015, 2018 by Delphix. All rights reserved.
*/
@@ -528,7 +528,14 @@ zfs_log_write(zilog_t *zilog, dmu_tx_t *tx, int txtype,
itx_wr_state_t wr_state = write_state;
ssize_t len = resid;
- if (wr_state == WR_COPIED && resid > ZIL_MAX_COPIED_DATA)
+ /*
+ * A WR_COPIED record must fit entirely in one log block.
+ * Large writes can use WR_NEED_COPY, which the ZIL will
+ * split into multiple records across several log blocks
+ * if necessary.
+ */
+ if (wr_state == WR_COPIED &&
+ resid > zil_max_copied_data(zilog))
wr_state = WR_NEED_COPY;
else if (wr_state == WR_INDIRECT)
len = MIN(blocksize - P2PHASE(off, blocksize), resid);