From patchwork Thu May 17 10:04:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Numan Siddique X-Patchwork-Id: 915289 X-Patchwork-Delegate: rbryant@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40mn2y3w0Wz9s1B for ; Thu, 17 May 2018 20:04:26 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 9CA65E1F; Thu, 17 May 2018 10:04:23 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 466B1982 for ; Thu, 17 May 2018 10:04:22 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mx1.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id E143B6D3 for ; Thu, 17 May 2018 10:04:21 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 02BB4BB40D for ; Thu, 17 May 2018 10:04:21 +0000 (UTC) Received: from nusiddiq.redhat (ovpn-116-23.sin2.redhat.com [10.67.116.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id E28AB10B2B4D; Thu, 17 May 2018 10:04:18 +0000 (UTC) From: nusiddiq@redhat.com To: dev@openvswitch.org Date: Thu, 17 May 2018 15:34:09 +0530 Message-Id: <20180517100409.834-1-nusiddiq@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Thu, 17 May 2018 10:04:21 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Thu, 17 May 2018 10:04:21 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'nusiddiq@redhat.com' RCPT:'' X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH] ovn pacemaker: Fix the promotion issue in other cluster nodes when the master node is reset X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Numan Siddique When a node 'A' in the pacemaker cluster running OVN db servers in master is brought down ungracefully ('echo b > /proc/sysrq_trigger' for example), pacemaker is not able to promote any other node to master in the cluster. When pacemaker selects a node B for instance to promote, it moves the IPAddr2 resource (i.e the master ip) to node 'B'. As soon the node is configured with the IP address, when the issue is seen, the OVN db servers which were running as standy earlier, transitions to active. Ideally this should not have happened. The ovsdb-servers are expected to remain in standby until there are promoted. (This needs separate investigation). When the pacemaker calls the OVN OCF script's promote action, the ovsdb_server_promot function returns almost immediately without recording the present master. And later in the notify action it demotes back the OVN db servers since the last known master doesn't match with node 'B's hostname. This results in pacemaker promoting/demoting in a loop. This patch fixes the issue by not returning immediately when promote action is called if the OVN db servers are running as active. Now it would continue with the ovsdb_server_promot function and records the new master by setting proper master score ($CRM_MASTER -N $host_name -v ${master_score}) This issue is not seen when a node is brought down gracefully as pacemaker before promoting a node, calls stop, start and then promote actions. Not sure why pacemaker doesn't call stop, start and promote actions when a node is reset ungracefully. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1579025 Signed-off-by: Numan Siddique --- ovn/utilities/ovndb-servers.ocf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers.ocf index 164b6bce6..23dc70056 100755 --- a/ovn/utilities/ovndb-servers.ocf +++ b/ovn/utilities/ovndb-servers.ocf @@ -409,7 +409,7 @@ ovsdb_server_promote() { rc=$? case $rc in ${OCF_SUCCESS}) ;; - ${OCF_RUNNING_MASTER}) return ${OCF_SUCCESS};; + ${OCF_RUNNING_MASTER}) ;; *) ovsdb_server_master_update $OCF_RUNNING_MASTER return ${rc}