diff options
author | Brian Behlendorf <[email protected]> | 2016-04-19 11:19:12 -0700 |
---|---|---|
committer | Brian Behlendorf <[email protected]> | 2016-04-25 11:13:20 -0700 |
commit | 2d82ea8b111103b28b8c9ad0f69dd88736248804 (patch) | |
tree | c132339412ce0ed9b8f04c6ccb6471317755f59c /module/zfs | |
parent | 5b4136bd499a892f65c86af8fd39fa21e05c9148 (diff) |
Use udev for partition detection
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks. This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc. Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.
Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable. At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.
In order to address this the zpool_label_disk_wait() function
has been updated to use libudev. Until the registered system
device acknowledges that it in fully initialized the function
will wait. Once fully initialized all device links are checked
and allowed to settle for 50ms. This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.
For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time. In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked. However, if the rare event it is needed
it will prevent a failure.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes #4523
Closes #3708
Closes #4077
Closes #4144
Closes #4214
Closes #4517
Diffstat (limited to 'module/zfs')
-rw-r--r-- | module/zfs/vdev_disk.c | 29 |
1 files changed, 24 insertions, 5 deletions
diff --git a/module/zfs/vdev_disk.c b/module/zfs/vdev_disk.c index cdb8f78e2..9b51ecc1d 100644 --- a/module/zfs/vdev_disk.c +++ b/module/zfs/vdev_disk.c @@ -244,12 +244,12 @@ vdev_disk_open(vdev_t *v, uint64_t *psize, uint64_t *max_psize, { struct block_device *bdev = ERR_PTR(-ENXIO); vdev_disk_t *vd; - int mode, block_size; + int count = 0, mode, block_size; /* Must have a pathname and it must be absolute. */ if (v->vdev_path == NULL || v->vdev_path[0] != '/') { v->vdev_stat.vs_aux = VDEV_AUX_BAD_LABEL; - return (EINVAL); + return (SET_ERROR(EINVAL)); } /* @@ -264,7 +264,7 @@ vdev_disk_open(vdev_t *v, uint64_t *psize, uint64_t *max_psize, vd = kmem_zalloc(sizeof (vdev_disk_t), KM_SLEEP); if (vd == NULL) - return (ENOMEM); + return (SET_ERROR(ENOMEM)); /* * Devices are always opened by the path provided at configuration @@ -279,16 +279,35 @@ vdev_disk_open(vdev_t *v, uint64_t *psize, uint64_t *max_psize, * /dev/[hd]d devices which may be reordered due to probing order. * Devices in the wrong locations will be detected by the higher * level vdev validation. + * + * The specified paths may be briefly removed and recreated in + * response to udev events. This should be exceptionally unlikely + * because the zpool command makes every effort to verify these paths + * have already settled prior to reaching this point. Therefore, + * a ENOENT failure at this point is highly likely to be transient + * and it is reasonable to sleep and retry before giving up. In + * practice delays have been observed to be on the order of 100ms. */ mode = spa_mode(v->vdev_spa); if (v->vdev_wholedisk && v->vdev_expanding) bdev = vdev_disk_rrpart(v->vdev_path, mode, vd); - if (IS_ERR(bdev)) + + while (IS_ERR(bdev) && count < 50) { bdev = vdev_bdev_open(v->vdev_path, vdev_bdev_mode(mode), zfs_vdev_holder); + if (unlikely(PTR_ERR(bdev) == -ENOENT)) { + msleep(10); + count++; + } else if (IS_ERR(bdev)) { + break; + } + } + if (IS_ERR(bdev)) { + dprintf("failed open v->vdev_path=%s, error=%d count=%d\n", + v->vdev_path, -PTR_ERR(bdev), count); kmem_free(vd, sizeof (vdev_disk_t)); - return (-PTR_ERR(bdev)); + return (SET_ERROR(-PTR_ERR(bdev))); } v->vdev_tsd = vd; |