diff options
author | George Wilson <[email protected]> | 2020-09-17 22:03:10 -0500 |
---|---|---|
committer | GitHub <[email protected]> | 2020-09-17 20:03:10 -0700 |
commit | 8e82ffba7be213373eb7805cadfb6fe3260f0cdd (patch) | |
tree | ed98ca1757820497eb23cd41cc00577357dda7b1 /module | |
parent | a57f954226ce1111a022d1fa615e555c5e735a35 (diff) |
pool may become suspended during device expansion
When expanding a device zfs needs to rescan the partition table to
get the correct size. This can only happen when we're in the kernel
and requires the device to be closed. As part of the rescan, udev is
notified and the device links are removed and recreated. This leave a
window where the vdev code may try to reopen the device before udev
has recreated the link. If that happens, then the pool may end up in
a suspended state.
To correct this, we leverage the BLKPG_RESIZE_PARTITION ioctl which
allows the partition information to be modified even while it's in use.
This ioctl also does not remove the device link associated with the zfs
data partition so it eliminates the race condition that can occur in
the kernel.
Reviewed-by: Pavel Zakharov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Wilson <[email protected]>
Closes #10897
Diffstat (limited to 'module')
-rw-r--r-- | module/os/linux/zfs/vdev_disk.c | 22 |
1 files changed, 20 insertions, 2 deletions
diff --git a/module/os/linux/zfs/vdev_disk.c b/module/os/linux/zfs/vdev_disk.c index 5a2245436..85daef43b 100644 --- a/module/os/linux/zfs/vdev_disk.c +++ b/module/os/linux/zfs/vdev_disk.c @@ -34,6 +34,7 @@ #include <sys/abd.h> #include <sys/fs/zfs.h> #include <sys/zio.h> +#include <linux/blkpg.h> #include <linux/msdos_fs.h> #include <linux/vfs_compat.h> @@ -175,7 +176,8 @@ vdev_disk_open(vdev_t *v, uint64_t *psize, uint64_t *max_psize, /* * Reopen the device if it is currently open. When expanding a - * partition force re-scanning the partition table while closed + * partition force re-scanning the partition table if userland + * did not take care of this already. We need to do this while closed * in order to get an accurate updated block device size. Then * since udev may need to recreate the device links increase the * open retry timeout before reporting the device as unavailable. @@ -192,7 +194,23 @@ vdev_disk_open(vdev_t *v, uint64_t *psize, uint64_t *max_psize, if (bdev) { if (v->vdev_expanding && bdev != bdev->bd_contains) { bdevname(bdev->bd_contains, disk_name + 5); - reread_part = B_TRUE; + /* + * If userland has BLKPG_RESIZE_PARTITION, + * then it should have updated the partition + * table already. We can detect this by + * comparing our current physical size + * with that of the device. If they are + * the same, then we must not have + * BLKPG_RESIZE_PARTITION or it failed to + * update the partition table online. We + * fallback to rescanning the partition + * table from the kernel below. However, + * if the capacity already reflects the + * updated partition, then we skip + * rescanning the partition table here. + */ + if (v->vdev_psize == bdev_capacity(bdev)) + reread_part = B_TRUE; } blkdev_put(bdev, mode | FMODE_EXCL); |