From 13ada1be201eb14ff8295a17194de8db5cdccd7f Mon Sep 17 00:00:00 2001 From: Lars Ellenberg Date: Wed, 2 Oct 2024 14:34:02 +0200 Subject: [PATCH 32/32] drbd: open: do not delay open() if already Primary Since 48376549f (drbd: When a remote state change is active to not touch the open_counts, 2017-10-30) if a remote state change is pending when someone tries to open() a DRBD volume, we wait, interruptible, for "auto-promote-timeout", until that state change is finalized (committed or aborted), or give up with EAGAIN if the auto-promote timeout is reached. auto-promote-timeout by default is much smaller than twopc-timeout, so we may get spurious open() failures. This could be mitigated with auto-promote-timeout > twopc-timeout. But we can just ignore the pending state change, if changing the open_cnt won't make a difference: if we are already Primary, or we already have openers anyways. If - we have some remote state change pending, - and we are not Primary already, - and we do not have any openers, or this is an open with write intent, we reject NDELAY openers immediately. Normal openers wait for the state change to be finalized (or give up after auto-promote-timeout). We do not need to wait if: - there is no remote state change pending, - or we are already Primary anyways, - or we are Secondary, this is a read-only open, and we have openers already. Note: we may still want to immediately reject NDELAY open if there is a remote state change pending, even if we have an open count != 0. These are typically short lived openers triggered via udev. If they overlap (new open comes in before previous close), these may still accumulate enough time to mess with state changes. For now, I decide to allow them. --- drbd/drbd_main.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/drbd/drbd_main.c b/drbd/drbd_main.c index a216b725e66c..258be3b9c10d 100644 --- a/drbd/drbd_main.c +++ b/drbd/drbd_main.c @@ -2583,10 +2583,21 @@ enum ioc_rv { IOC_ABORT = 2, }; +/* If we are in the middle of a cluster wide state change, we don't want + * to change (open_cnt == 0), as that then could cause a failure to commit + * some already promised peer auto-promote locally. + * So we wait until the pending remote_state_change is finalized, + * or give up when the timeout is reached. + * + * But we don't want to fail an open on a Primary just because it happens + * during some unrelated remote state change. + * If we are already Primary, or already have an open count != 0, + * we don't need to wait, it won't change anything. + */ static enum ioc_rv inc_open_count(struct drbd_device *device, blk_mode_t mode) { struct drbd_resource *resource = device->resource; - enum ioc_rv r = mode & BLK_OPEN_NDELAY ? IOC_ABORT : IOC_SLEEP; + enum ioc_rv r; if (test_bit(DOWN_IN_PROGRESS, &resource->flags)) return IOC_ABORT; @@ -2594,7 +2605,14 @@ static enum ioc_rv inc_open_count(struct drbd_device *device, blk_mode_t mode) read_lock_irq(&resource->state_rwlock); if (test_bit(UNREGISTERED, &device->flags)) r = IOC_ABORT; - else if (!resource->remote_state_change) { + else if (resource->remote_state_change && + resource->role[NOW] != R_PRIMARY && + (device->open_cnt == 0 || mode & BLK_OPEN_WRITE)) { + if (mode & BLK_OPEN_NDELAY) + r = IOC_ABORT; + else + r = IOC_SLEEP; + } else { r = IOC_OK; device->open_cnt++; if (mode & BLK_OPEN_WRITE) -- 2.35.3