From patchwork Tue Jul 9 07:02:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michele Baldessari X-Patchwork-Id: 1129538 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=acksyn.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=acksyn.org header.i=@acksyn.org header.b="fxpRrpVb"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45jYZP0s7hz9sML for ; Tue, 9 Jul 2019 17:18:19 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id A64622F4A; Tue, 9 Jul 2019 07:18:15 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 91C3A2F30 for ; Tue, 9 Jul 2019 07:08:26 +0000 (UTC) X-Greylist: delayed 00:05:16 by SQLgrey-1.7.6 Received: from palahniuk.acksyn.org (palahniuk.acksyn.org [5.9.7.26]) by smtp1.linuxfoundation.org (Postfix) with ESMTP id E5E2A67F for ; Tue, 9 Jul 2019 07:08:25 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by palahniuk.acksyn.org (Postfix) with ESMTP id 1BF4D2E8ED; Tue, 9 Jul 2019 03:03:08 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=acksyn.org; h= content-transfer-encoding:mime-version:x-mailer:message-id:date :date:subject:subject:from:from:received:received; s=2010; t= 1562655787; bh=fDNwNU2slBRZa6FL1PArVazpYgiVEYA/O8vccmtn/AM=; b=f xpRrpVbNiqLbhp3LhGn7zi8qeCACb4G/5dbhdh1UCoJWlxf968zc/QKik6bZpnQB ql4dP5uQIttkeGgarkj/dBWC2howH3MZCHjjFnEYlIbCSz87SYbDG55rTl6b1Tli dZ19nPHeKOKKR7CiQM1QmesQfaU/0e8yAP5ET7KglI= Received: from palahniuk.acksyn.org ([127.0.0.1]) by localhost (mail.acksyn.org [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id K8ChlezZ9TYU; Tue, 9 Jul 2019 03:03:07 -0400 (EDT) Received: from localhost (host5-246-dynamic.20-87-r.retail.telecomitalia.it [87.20.246.5]) by palahniuk.acksyn.org (Postfix) with ESMTPSA id 063852E8DA; Tue, 9 Jul 2019 03:03:06 -0400 (EDT) From: Michele Baldessari To: dev@openvswitch.org Date: Tue, 9 Jul 2019 09:02:57 +0200 Message-Id: <20190709070257.11955-1-michele@acksyn.org> X-Mailer: git-send-email 2.21.0 MIME-Version: 1.0 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Michele Baldessari Subject: [ovs-dev] [PATCH] OVN resource agent - make promotion synchronous X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Currently inside the ovsdb_server_promote() function we call 'promote_ovnnb' and 'promote_ovnsb' and then just record the new master state in the CIB. This creates a race because those two promote commands are asynchronous so when we exit the ovsdb_server_promote() function the underlying DBs are not guaranteed to be in master state. That means that clients might connect to an instance that is in read-only mode. We add a simple sleep loop where we wait for the underlying DB state to confirm the master state. We do not need to add a timeout loop because in case of an issue the resource timeout set within pacemaker will kick in and the resource agent script will be killed by pacemaker. Tested this within an openstack environment using ovn with roughly ~20 reboots and was unable to trigger the issue (before the patch we would trigger the issue after a couple of reboots tops). Signed-off-by: Michele Baldessari Acked-By: Daniel Alvarez Acked-by: Numan Siddique Acked-By: Lucas Alvares Gomes --- ovn/utilities/ovndb-servers.ocf | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers.ocf index 10313304cb7c..cd47426689ef 100755 --- a/ovn/utilities/ovndb-servers.ocf +++ b/ovn/utilities/ovndb-servers.ocf @@ -516,6 +516,8 @@ ovsdb_server_stop() { } ovsdb_server_promote() { + local state + ovsdb_server_check_status ignore_northd rc=$? case $rc in @@ -540,7 +542,15 @@ ovsdb_server_promote() { ${OVN_CTL} --ovn-manage-ovsdb=no start_northd fi - ocf_log debug "ovndb_servers: Promoting $host_name as the master" + ocf_log debug "ovndb_servers: Waiting for promotion $host_name as master to complete" + ovsdb_server_check_status + state=$? + while [ "$state" != "$OCF_RUNNING_MASTER" ]; do + sleep 1 + ovsdb_server_check_status + state=$? + done + ocf_log debug "ovndb_servers: Promotion of $host_name as the master completed" # Record ourselves so that the agent has a better chance of doing # the right thing at startup ${CRM_ATTR_REPL_INFO} -v "$host_name"