rabbitmq-server/ocf-pull-request-66.patch
Dirk Mueller 6f2036b23f Accepting request 556717 from home:vuntz:branches:network:messaging:amqp
- Add ocf-pull-request-63.patch and ocf-pull-request-64.patch:
  fixes to avoid moving master unnecessarily, and to make start
  notification handler more reliable.
- Add ocf-pull-request-66.patch: do not consider transient local
  failures as failures of remote nodes.

OBS-URL: https://build.opensuse.org/request/show/556717
OBS-URL: https://build.opensuse.org/package/show/network:messaging:amqp/rabbitmq-server?expand=0&rev=84
2017-12-13 16:09:08 +00:00

51 lines
2.2 KiB
Diff

From 21d14dbe7389c2d0cc8778476ba5c71ad5ad4406 Mon Sep 17 00:00:00 2001
From: Vincent Untz <vuntz@suse.com>
Date: Wed, 13 Dec 2017 12:34:31 +0100
Subject: [PATCH] OCF RA: Do not consider local failures as remote node
problems
In is_clustered_with(), commands that we run to check if the node is
clustered with us, or partitioned with us may fail. When they fail, it
actually doesn't tell us anything about the remote node.
Until now, we were considering such failures as hints that the remote
node is not in a sane state with us. But doing so has pretty negative
impact, as it can cause rabbitmq to get restarted on the remote node,
causing quite some disruption.
So instead of doing this, ignore the error (it's still logged).
There was a comment in the code wondering what is the best behavior;
based on experience, I think preferring stability is the slightly more
acceptable poison between the two options.
---
scripts/rabbitmq-server-ha.ocf | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/scripts/rabbitmq-server-ha.ocf b/scripts/rabbitmq-server-ha.ocf
index 87bb7d4..bc6a538 100755
--- a/scripts/rabbitmq-server-ha.ocf
+++ b/scripts/rabbitmq-server-ha.ocf
@@ -870,8 +870,8 @@ is_clustered_with()
rc=$?
if [ "$rc" -ne 0 ]; then
ocf_log err "${LH} Failed to check whether '$node_name' is considered running by us"
- # XXX Or should we give remote node benefit of a doubt?
- return 1
+ # We had a transient local error; that doesn't mean the remote node is
+ # not part of the cluster, so ignore this
elif [ "$seen_as_running" != true ]; then
ocf_log info "${LH} Node $node_name is not running, considering it not clustered with us"
return 1
@@ -882,8 +882,8 @@ is_clustered_with()
rc=$?
if [ "$rc" -ne 0 ]; then
ocf_log err "${LH} Failed to check whether '$node_name' is partitioned with us"
- # XXX Or should we give remote node benefit of a doubt?
- return 1
+ # We had a transient local error; that doesn't mean the remote node is
+ # not partitioned with us, so ignore this
elif [ "$seen_as_partitioned" != false ]; then
ocf_log info "${LH} Node $node_name is partitioned from us"
return 1