diff options
author | Matthew Macy <[email protected]> | 2020-04-08 10:30:27 -0700 |
---|---|---|
committer | GitHub <[email protected]> | 2020-04-08 10:30:27 -0700 |
commit | 01c4f2bf2933319e876e8f695e97d0719a9db0ed (patch) | |
tree | 176c587c66d2c36e08112109839c061e388d4b65 /module | |
parent | 7e3df9db128722143734a9459771365ea19c1c40 (diff) |
Use vn_io_fault_uiomove on FreeBSD to avoid potential deadlock
Added to prevent a possible deadlock, the following comments from
FreeBSD explain the issue. The comment describing vn_io_fault_uiomove:
/*
* Helper function to perform the requested uiomove operation using
* the held pages for io->uio_iov[0].iov_base buffer instead of
* copyin/copyout. Access to the pages with uiomove_fromphys()
* instead of iov_base prevents page faults that could occur due to
* pmap_collect() invalidating the mapping created by
* vm_fault_quick_hold_pages(), or pageout daemon, page laundry or
* object cleanup revoking the write access from page mappings.
*
* Filesystems specified MNTK_NO_IOPF shall use vn_io_fault_uiomove()
* instead of plain uiomove().
*/
This used for vn_io_fault which has the following motivation:
/*
* The vn_io_fault() is a wrapper around vn_read() and vn_write() to
* prevent the following deadlock:
*
* Assume that the thread A reads from the vnode vp1 into userspace
* buffer buf1 backed by the pages of vnode vp2. If a page in buf1 is
* currently not resident, then system ends up with the call chain
* vn_read() -> VOP_READ(vp1) -> uiomove() -> [Page Fault] ->
* vm_fault(buf1) -> vnode_pager_getpages(vp2) -> VOP_GETPAGES(vp2)
* which establishes lock order vp1->vn_lock, then vp2->vn_lock.
* If, at the same time, thread B reads from vnode vp2 into buffer buf2
* backed by the pages of vnode vp1, and some page in buf2 is not
* resident, we get a reversed order vp2->vn_lock, then vp1->vn_lock.
*
* To prevent the lock order reversal and deadlock, vn_io_fault() does
* not allow page faults to happen during VOP_READ() or VOP_WRITE().
* Instead, it first tries to do the whole range i/o with pagefaults
* disabled. If all pages in the i/o buffer are resident and mapped,
* VOP will succeed (ignoring the genuine filesystem errors).
* Otherwise, we get back EFAULT, and vn_io_fault() falls back to do
* i/o in chunks, with all pages in the chunk prefaulted and held
* using vm_fault_quick_hold_pages().
*
* Filesystems using this deadlock avoidance scheme should use the
* array of the held pages from uio, saved in the curthread->td_ma,
* instead of doing uiomove(). A helper function
* vn_io_fault_uiomove() converts uiomove request into
* uiomove_fromphys() over td_ma array.
*
* Since vnode locks do not cover the whole i/o anymore, rangelocks
* make the current i/o request atomic with respect to other i/os and
* truncations.
*/
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #10177
Diffstat (limited to 'module')
-rw-r--r-- | module/zfs/dmu.c | 11 |
1 files changed, 10 insertions, 1 deletions
diff --git a/module/zfs/dmu.c b/module/zfs/dmu.c index aa392b177..6eb935720 100644 --- a/module/zfs/dmu.c +++ b/module/zfs/dmu.c @@ -1359,8 +1359,13 @@ dmu_read_uio_dnode(dnode_t *dn, uio_t *uio, uint64_t size) XUIOSTAT_BUMP(xuiostat_rbuf_copied); } else #endif +#ifdef __FreeBSD__ + err = vn_io_fault_uiomove((char *)db->db_data + bufoff, + tocpy, uio); +#else err = uiomove((char *)db->db_data + bufoff, tocpy, UIO_READ, uio); +#endif if (err) break; @@ -1459,9 +1464,13 @@ dmu_write_uio_dnode(dnode_t *dn, uio_t *uio, uint64_t size, dmu_tx_t *tx) * to lock the pages in memory, so that uiomove won't * block. */ +#ifdef __FreeBSD__ + err = vn_io_fault_uiomove((char *)db->db_data + bufoff, + tocpy, uio); +#else err = uiomove((char *)db->db_data + bufoff, tocpy, UIO_WRITE, uio); - +#endif if (tocpy == db->db_size) dmu_buf_fill_done(db, tx); |