From patchwork Tue Sep 11 17:29:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Numan Siddique X-Patchwork-Id: 968638 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 428sPH3jqHz9s2P for ; Wed, 12 Sep 2018 03:30:11 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 0EC60FCB; Tue, 11 Sep 2018 17:30:10 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id A3E82F70 for ; Tue, 11 Sep 2018 17:30:08 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mx1.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 26806793 for ; Tue, 11 Sep 2018 17:30:08 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2CAF240241D3 for ; Tue, 11 Sep 2018 17:30:07 +0000 (UTC) Received: from nusiddiq.redhat (unknown [10.67.116.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6D9582157F4A; Tue, 11 Sep 2018 17:30:04 +0000 (UTC) From: nusiddiq@redhat.com To: dev@openvswitch.org Date: Tue, 11 Sep 2018 22:59:58 +0530 Message-Id: <20180911172958.25307-1-nusiddiq@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Tue, 11 Sep 2018 17:30:07 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Tue, 11 Sep 2018 17:30:07 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'nusiddiq@redhat.com' RCPT:'' X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v2] ovsdb-server: Alleviate the possible data loss in an active/standby setup X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Numan Siddique The present code resets the database when it is in the state - 'RPL_S_SCHEMA_REQUESTED' and repopulates the database when it receives the monitor reply when it is in the state - 'RPL_S_MONITOR_REQUESTED'. If however, it goes to active mode before it processes the monitor reply, the whole data is lost. This patch alleviates the problem by resetting the database when it receives the monitor reply (before processing it). So that reset and repopulation of the db happens in the same state. This approach still has a window for data loss if the function process_notification() when processing the monitor reply fails for some reason or ovsdb-server crashes for some reason during process_notification(). Reported-by: Han Zhou Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-August/047161.html Tested-by: aginwala Acked-by: Han Zhou Signed-off-by: Numan Siddique --- ovsdb/replication.c | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-) v1 -> v2 -------- * Updated the commit message as per Han's suggestion * Added few comments in the code where it resets the db. diff --git a/ovsdb/replication.c b/ovsdb/replication.c index 2b9ae2f83..752b3c89c 100644 --- a/ovsdb/replication.c +++ b/ovsdb/replication.c @@ -299,19 +299,7 @@ replication_run(void) /* After receiving schemas, reset the local databases that * will be monitored and send out monitor requests for them. */ if (hmap_is_empty(&request_ids)) { - struct shash_node *node, *next; - - SHASH_FOR_EACH_SAFE (node, next, replication_dbs) { - db = node->data; - error = reset_database(db); - if (error) { - const char *db_name = db->schema->name; - shash_find_and_delete(replication_dbs, db_name); - ovsdb_error_assert(error); - VLOG_WARN("Failed to reset database, " - "%s not replicated.", db_name); - } - } + struct shash_node *node; if (shash_is_empty(replication_dbs)) { VLOG_WARN("Nothing to replicate."); @@ -335,7 +323,16 @@ replication_run(void) case RPL_S_MONITOR_REQUESTED: { /* Reply to monitor requests. */ struct ovsdb_error *error; - error = process_notification(msg->result, db); + VLOG_INFO("Monitor request received. Resetting the database"); + /* Resetting the database here has few risks. If the + * process_notification() fails, the database is completely + * lost locally. In case that node becomes active, then + * there is a chance of complete data loss in the active/standy + * cluster. */ + error = reset_database(db); + if (!error) { + error = process_notification(msg->result, db); + } if (error) { ovsdb_error_assert(error); state = RPL_S_ERR;