diff options
author | Prakash Surya <[email protected]> | 2018-01-08 13:45:53 -0800 |
---|---|---|
committer | Brian Behlendorf <[email protected]> | 2018-01-26 20:19:46 -0800 |
commit | 0735ecb33485e91a78357a274e47c2782858d8b9 (patch) | |
tree | eecf628f46b2e009dba2ec542be7a624c7a6b7e8 /module/zfs/zil.c | |
parent | 522db29275b81c18c2bf53a95efa1aedeb13b428 (diff) |
OpenZFS 8997 - ztest assertion failure in zil_lwb_write_issue
PROBLEM
=======
When `dmu_tx_assign` is called from `zil_lwb_write_issue`, it's possible
for either `ERESTART` or `EIO` to be returned.
If `ERESTART` is returned, this will cause an assertion to fail directly
in `zil_lwb_write_issue`, where the code assumes the return value is
`EIO` if `dmu_tx_assign` returns a non-zero value. This can occur if the
SPA is suspended when `dmu_tx_assign` is called, and most often occurs
when running `zloop`.
If `EIO` is returned, this can cause assertions to fail elsewhere in the
ZIL code. For example, `zil_commit_waiter_timeout` contains the
following logic:
lwb_t *nlwb = zil_lwb_write_issue(zilog, lwb);
ASSERT3S(lwb->lwb_state, !=, LWB_STATE_OPENED);
In this case, if `dmu_tx_assign` returned `EIO` from within
`zil_lwb_write_issue`, the `lwb` variable passed in will not be issued
to disk. Thus, it's `lwb_state` field will remain `LWB_STATE_OPENED` and
this assertion will fail. `zil_commit_waiter_timeout` assumes that after
it calls `zil_lwb_write_issue`, the `lwb` will be issued to disk, and
doesn't handle the case where this is not true; i.e. it doesn't handle
the case where `dmu_tx_assign` returns `EIO`.
SOLUTION
========
This change modifies the `dmu_tx_assign` function such that `txg_how` is
a bitmask, rather than of the `txg_how_t` enum type. Now, the previous
`TXG_WAITED` semantics can be used via `TXG_NOTHROTTLE`, along with
specifying either `TXG_NOWAIT` or `TXG_WAIT` semantics.
Previously, when `TXG_WAITED` was specified, `TXG_NOWAIT` semantics was
automatically invoked. This was not ideal when using `TXG_WAITED` within
`zil_lwb_write_issued`, leading the problem described above. Rather, we
want to achieve the semantics of `TXG_WAIT`, while also preventing the
`tx` from being penalized via the dirty delay throttling.
With this change, `zil_lwb_write_issued` can acheive the semtantics that
it requires by passing in the value `TXG_WAIT | TXG_NOTHROTTLE` to
`dmu_tx_assign`.
Further, consumers of `dmu_tx_assign` wishing to achieve the old
`TXG_WAITED` semantics can pass in the value `TXG_NOWAIT | TXG_NOTHROTTLE`.
Authored by: Prakash Surya <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Andriy Gapon <[email protected]>
Ported-by: Brian Behlendorf <[email protected]>
Porting Notes:
- Additionally updated `zfs_tmpfile` to use `TXG_NOTHROTTLE`
OpenZFS-issue: https://www.illumos.org/issues/8997
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/19ea6cb0f9
Closes #7084
Diffstat (limited to 'module/zfs/zil.c')
-rw-r--r-- | module/zfs/zil.c | 21 |
1 files changed, 6 insertions, 15 deletions
diff --git a/module/zfs/zil.c b/module/zfs/zil.c index 81bc6de41..73744ffcf 100644 --- a/module/zfs/zil.c +++ b/module/zfs/zil.c @@ -1269,22 +1269,13 @@ zil_lwb_write_issue(zilog_t *zilog, lwb_t *lwb) tx = dmu_tx_create(zilog->zl_os); /* - * Since we are not going to create any new dirty data and we can even - * help with clearing the existing dirty data, we should not be subject - * to the dirty data based delays. - * We (ab)use TXG_WAITED to bypass the delay mechanism. - * One side effect from using TXG_WAITED is that dmu_tx_assign() can - * fail if the pool is suspended. Those are dramatic circumstances, - * so we return NULL to signal that the normal ZIL processing is not - * possible and txg_wait_synced() should be used to ensure that the data - * is on disk. + * Since we are not going to create any new dirty data, and we + * can even help with clearing the existing dirty data, we + * should not be subject to the dirty data based delays. We + * use TXG_NOTHROTTLE to bypass the delay mechanism. */ - error = dmu_tx_assign(tx, TXG_WAITED); - if (error != 0) { - ASSERT(error == EIO || error == ERESTART); - dmu_tx_abort(tx); - return (NULL); - } + VERIFY0(dmu_tx_assign(tx, TXG_WAIT | TXG_NOTHROTTLE)); + dsl_dataset_dirty(dmu_objset_ds(zilog->zl_os), tx); txg = dmu_tx_get_txg(tx); |