From 21d14dbe7389c2d0cc8778476ba5c71ad5ad4406 Mon Sep 17 00:00:00 2001 From: Vincent Untz Date: Wed, 13 Dec 2017 12:34:31 +0100 Subject: [PATCH] OCF RA: Do not consider local failures as remote node problems In is_clustered_with(), commands that we run to check if the node is clustered with us, or partitioned with us may fail. When they fail, it actually doesn't tell us anything about the remote node. Until now, we were considering such failures as hints that the remote node is not in a sane state with us. But doing so has pretty negative impact, as it can cause rabbitmq to get restarted on the remote node, causing quite some disruption. So instead of doing this, ignore the error (it's still logged). There was a comment in the code wondering what is the best behavior; based on experience, I think preferring stability is the slightly more acceptable poison between the two options. --- scripts/rabbitmq-server-ha.ocf | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/scripts/rabbitmq-server-ha.ocf b/scripts/rabbitmq-server-ha.ocf index 87bb7d4..bc6a538 100755 --- a/scripts/rabbitmq-server-ha.ocf +++ b/scripts/rabbitmq-server-ha.ocf @@ -870,8 +870,8 @@ is_clustered_with() rc=$? if [ "$rc" -ne 0 ]; then ocf_log err "${LH} Failed to check whether '$node_name' is considered running by us" - # XXX Or should we give remote node benefit of a doubt? - return 1 + # We had a transient local error; that doesn't mean the remote node is + # not part of the cluster, so ignore this elif [ "$seen_as_running" != true ]; then ocf_log info "${LH} Node $node_name is not running, considering it not clustered with us" return 1 @@ -882,8 +882,8 @@ is_clustered_with() rc=$? if [ "$rc" -ne 0 ]; then ocf_log err "${LH} Failed to check whether '$node_name' is partitioned with us" - # XXX Or should we give remote node benefit of a doubt? - return 1 + # We had a transient local error; that doesn't mean the remote node is + # not partitioned with us, so ignore this elif [ "$seen_as_partitioned" != false ]; then ocf_log info "${LH} Node $node_name is partitioned from us" return 1