aboutsummaryrefslogtreecommitdiffstats
path: root/module
diff options
context:
space:
mode:
authorGeorge Wilson <[email protected]>2020-09-17 22:03:10 -0500
committerGitHub <[email protected]>2020-09-17 20:03:10 -0700
commit8e82ffba7be213373eb7805cadfb6fe3260f0cdd (patch)
treeed98ca1757820497eb23cd41cc00577357dda7b1 /module
parenta57f954226ce1111a022d1fa615e555c5e735a35 (diff)
pool may become suspended during device expansion
When expanding a device zfs needs to rescan the partition table to get the correct size. This can only happen when we're in the kernel and requires the device to be closed. As part of the rescan, udev is notified and the device links are removed and recreated. This leave a window where the vdev code may try to reopen the device before udev has recreated the link. If that happens, then the pool may end up in a suspended state. To correct this, we leverage the BLKPG_RESIZE_PARTITION ioctl which allows the partition information to be modified even while it's in use. This ioctl also does not remove the device link associated with the zfs data partition so it eliminates the race condition that can occur in the kernel. Reviewed-by: Pavel Zakharov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Wilson <[email protected]> Closes #10897
Diffstat (limited to 'module')
-rw-r--r--module/os/linux/zfs/vdev_disk.c22
1 files changed, 20 insertions, 2 deletions
diff --git a/module/os/linux/zfs/vdev_disk.c b/module/os/linux/zfs/vdev_disk.c
index 5a2245436..85daef43b 100644
--- a/module/os/linux/zfs/vdev_disk.c
+++ b/module/os/linux/zfs/vdev_disk.c
@@ -34,6 +34,7 @@
#include <sys/abd.h>
#include <sys/fs/zfs.h>
#include <sys/zio.h>
+#include <linux/blkpg.h>
#include <linux/msdos_fs.h>
#include <linux/vfs_compat.h>
@@ -175,7 +176,8 @@ vdev_disk_open(vdev_t *v, uint64_t *psize, uint64_t *max_psize,
/*
* Reopen the device if it is currently open. When expanding a
- * partition force re-scanning the partition table while closed
+ * partition force re-scanning the partition table if userland
+ * did not take care of this already. We need to do this while closed
* in order to get an accurate updated block device size. Then
* since udev may need to recreate the device links increase the
* open retry timeout before reporting the device as unavailable.
@@ -192,7 +194,23 @@ vdev_disk_open(vdev_t *v, uint64_t *psize, uint64_t *max_psize,
if (bdev) {
if (v->vdev_expanding && bdev != bdev->bd_contains) {
bdevname(bdev->bd_contains, disk_name + 5);
- reread_part = B_TRUE;
+ /*
+ * If userland has BLKPG_RESIZE_PARTITION,
+ * then it should have updated the partition
+ * table already. We can detect this by
+ * comparing our current physical size
+ * with that of the device. If they are
+ * the same, then we must not have
+ * BLKPG_RESIZE_PARTITION or it failed to
+ * update the partition table online. We
+ * fallback to rescanning the partition
+ * table from the kernel below. However,
+ * if the capacity already reflects the
+ * updated partition, then we skip
+ * rescanning the partition table here.
+ */
+ if (v->vdev_psize == bdev_capacity(bdev))
+ reread_part = B_TRUE;
}
blkdev_put(bdev, mode | FMODE_EXCL);