51 lines
2.2 KiB
Diff
51 lines
2.2 KiB
Diff
|
From 21d14dbe7389c2d0cc8778476ba5c71ad5ad4406 Mon Sep 17 00:00:00 2001
|
||
|
From: Vincent Untz <vuntz@suse.com>
|
||
|
Date: Wed, 13 Dec 2017 12:34:31 +0100
|
||
|
Subject: [PATCH] OCF RA: Do not consider local failures as remote node
|
||
|
problems
|
||
|
|
||
|
In is_clustered_with(), commands that we run to check if the node is
|
||
|
clustered with us, or partitioned with us may fail. When they fail, it
|
||
|
actually doesn't tell us anything about the remote node.
|
||
|
|
||
|
Until now, we were considering such failures as hints that the remote
|
||
|
node is not in a sane state with us. But doing so has pretty negative
|
||
|
impact, as it can cause rabbitmq to get restarted on the remote node,
|
||
|
causing quite some disruption.
|
||
|
|
||
|
So instead of doing this, ignore the error (it's still logged).
|
||
|
|
||
|
There was a comment in the code wondering what is the best behavior;
|
||
|
based on experience, I think preferring stability is the slightly more
|
||
|
acceptable poison between the two options.
|
||
|
---
|
||
|
scripts/rabbitmq-server-ha.ocf | 8 ++++----
|
||
|
1 file changed, 4 insertions(+), 4 deletions(-)
|
||
|
|
||
|
diff --git a/scripts/rabbitmq-server-ha.ocf b/scripts/rabbitmq-server-ha.ocf
|
||
|
index 87bb7d4..bc6a538 100755
|
||
|
--- a/scripts/rabbitmq-server-ha.ocf
|
||
|
+++ b/scripts/rabbitmq-server-ha.ocf
|
||
|
@@ -870,8 +870,8 @@ is_clustered_with()
|
||
|
rc=$?
|
||
|
if [ "$rc" -ne 0 ]; then
|
||
|
ocf_log err "${LH} Failed to check whether '$node_name' is considered running by us"
|
||
|
- # XXX Or should we give remote node benefit of a doubt?
|
||
|
- return 1
|
||
|
+ # We had a transient local error; that doesn't mean the remote node is
|
||
|
+ # not part of the cluster, so ignore this
|
||
|
elif [ "$seen_as_running" != true ]; then
|
||
|
ocf_log info "${LH} Node $node_name is not running, considering it not clustered with us"
|
||
|
return 1
|
||
|
@@ -882,8 +882,8 @@ is_clustered_with()
|
||
|
rc=$?
|
||
|
if [ "$rc" -ne 0 ]; then
|
||
|
ocf_log err "${LH} Failed to check whether '$node_name' is partitioned with us"
|
||
|
- # XXX Or should we give remote node benefit of a doubt?
|
||
|
- return 1
|
||
|
+ # We had a transient local error; that doesn't mean the remote node is
|
||
|
+ # not partitioned with us, so ignore this
|
||
|
elif [ "$seen_as_partitioned" != false ]; then
|
||
|
ocf_log info "${LH} Node $node_name is partitioned from us"
|
||
|
return 1
|