From patchwork Fri Jan 28 18:51:21 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ilya Maximets <i.maximets@ovn.org>
X-Patchwork-Id: 1585850
Return-Path: <ovs-dev-bounces@openvswitch.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org
 (client-ip=140.211.166.136; helo=smtp3.osuosl.org;
 envelope-from=ovs-dev-bounces@openvswitch.org; receiver=<UNKNOWN>)
Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest
 SHA256)
	(No client certificate requested)
	by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JlmlD1xBwz9t56
	for <incoming@patchwork.ozlabs.org>; Sat, 29 Jan 2022 05:51:35 +1100 (AEDT)
Received: from localhost (localhost [127.0.0.1])
	by smtp3.osuosl.org (Postfix) with ESMTP id 3961C61162;
	Fri, 28 Jan 2022 18:51:34 +0000 (UTC)
X-Virus-Scanned: amavisd-new at osuosl.org
Received: from smtp3.osuosl.org ([127.0.0.1])
	by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id bEH4r8PMll9p; Fri, 28 Jan 2022 18:51:33 +0000 (UTC)
Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56])
	by smtp3.osuosl.org (Postfix) with ESMTPS id 498A661157;
	Fri, 28 Jan 2022 18:51:32 +0000 (UTC)
Received: from lf-lists.osuosl.org (localhost [127.0.0.1])
	by lists.linuxfoundation.org (Postfix) with ESMTP id 16CC3C001A;
	Fri, 28 Jan 2022 18:51:32 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@lists.linuxfoundation.org
Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133])
 by lists.linuxfoundation.org (Postfix) with ESMTP id F2AB3C000B
 for <ovs-dev@openvswitch.org>; Fri, 28 Jan 2022 18:51:30 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by smtp2.osuosl.org (Postfix) with ESMTP id CB5C8405B5
 for <ovs-dev@openvswitch.org>; Fri, 28 Jan 2022 18:51:30 +0000 (UTC)
X-Virus-Scanned: amavisd-new at osuosl.org
Received: from smtp2.osuosl.org ([127.0.0.1])
 by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id ot3Kp7Nt_kIR for <ovs-dev@openvswitch.org>;
 Fri, 28 Jan 2022 18:51:29 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0
Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net
 [217.70.183.195])
 by smtp2.osuosl.org (Postfix) with ESMTPS id F3C53405B4
 for <ovs-dev@openvswitch.org>; Fri, 28 Jan 2022 18:51:28 +0000 (UTC)
Received: (Authenticated sender: i.maximets@ovn.org)
 by mail.gandi.net (Postfix) with ESMTPSA id 9011860006;
 Fri, 28 Jan 2022 18:51:24 +0000 (UTC)
From: Ilya Maximets <i.maximets@ovn.org>
To: ovs-dev@openvswitch.org
Date: Fri, 28 Jan 2022 19:51:21 +0100
Message-Id: <20220128185121.434055-1-i.maximets@ovn.org>
X-Mailer: git-send-email 2.34.1
MIME-Version: 1.0
Cc: Ilya Maximets <i.maximets@ovn.org>, Dumitru Ceara <dceara@redhat.com>
Subject: [ovs-dev] [PATCH] ovsdb: raft: Fix inability to join the cluster
	after interrupted attempt.
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <ovs-dev.openvswitch.org>
List-Unsubscribe: <https://mail.openvswitch.org/mailman/options/ovs-dev>,
 <mailto:ovs-dev-request@openvswitch.org?subject=unsubscribe>
List-Archive: <http://mail.openvswitch.org/pipermail/ovs-dev/>
List-Post: <mailto:ovs-dev@openvswitch.org>
List-Help: <mailto:ovs-dev-request@openvswitch.org?subject=help>
List-Subscribe: <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>,
 <mailto:ovs-dev-request@openvswitch.org?subject=subscribe>
Errors-To: ovs-dev-bounces@openvswitch.org
Sender: "dev" <ovs-dev-bounces@openvswitch.org>

If the joining server re-connects while catching up (e.g. if it crashed
or connection got closed due to inactivity), the data we sent might be
lost, so the server will never reply to append request or a snapshot
installation request.  At the same time, leader will decline all the
subsequent requests to join from that server with the 'in progress'
resolution.  At this point the new server will never be able to join
the cluster, because it will never receive the raft log while leader
thinks that it was already sent.

This happened in practice when one of the servers got preempted for a
few seconds, so the leader closed connection due to inactivity.

Destroying the joining server if disconnection detected.  This will
allow to start the joining from scratch when the server re-connects
and sends the new join request.

We can't track re-connection in the raft_conn_run(), because it's
incoming connection and the jsonrpc will not keep it alive or
try to reconnect.  Next time the server re-connects it will be an
entirely new raft conn.

Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Reported-at: https://bugzilla.redhat.com/2033514
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
---
 ovsdb/raft.c           | 38 +++++++++++++++++++++++------
 tests/ovsdb-cluster.at | 55 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 86 insertions(+), 7 deletions(-)
diff --git a/ovsdb/raft.c b/ovsdb/raft.c
index 1a3447a8d..855404808 100644
--- a/ovsdb/raft.c
+++ b/ovsdb/raft.c
@@ -74,6 +74,7 @@ enum raft_failure_test {
     FT_CRASH_BEFORE_SEND_EXEC_REQ,
     FT_CRASH_AFTER_SEND_EXEC_REQ,
     FT_CRASH_AFTER_RECV_APPEND_REQ_UPDATE,
+    FT_CRASH_BEFORE_SEND_SNAPSHOT_REP,
     FT_DELAY_ELECTION,
     FT_DONT_SEND_VOTE_REQUEST,
     FT_STOP_RAFT_RPC,
@@ -379,12 +380,19 @@ static bool raft_handle_write_error(struct raft *, struct ovsdb_error *);
 static void raft_run_reconfigure(struct raft *);
 
 static void raft_set_leader(struct raft *, const struct uuid *sid);
+
 static struct raft_server *
 raft_find_server(const struct raft *raft, const struct uuid *sid)
 {
     return raft_server_find(&raft->servers, sid);
 }
 
+static struct raft_server *
+raft_find_new_server(struct raft *raft, const struct uuid *uuid)
+{
+    return raft_server_find(&raft->add_servers, uuid);
+}
+
 static char *
 raft_make_address_passive(const char *address_)
 {
@@ -1867,6 +1875,8 @@ raft_open_conn(struct raft *raft, const char *address, const struct uuid *sid)
 static void
 raft_conn_close(struct raft_conn *conn)
 {
+    VLOG_DBG("closing connection to server %s (%s)",
+             conn->nickname, jsonrpc_session_get_name(conn->js));
     jsonrpc_session_close(conn->js);
     ovs_list_remove(&conn->list_node);
     free(conn->nickname);
@@ -1957,16 +1967,30 @@ raft_run(struct raft *raft)
     }
 
     /* Close unneeded sessions. */
+    struct raft_server *server;
     struct raft_conn *next;
     LIST_FOR_EACH_SAFE (conn, next, list_node, &raft->conns) {
         if (!raft_conn_should_stay_open(raft, conn)) {
+            server = raft_find_new_server(raft, &conn->sid);
+            if (server) {
+                /* We only have one incoming connection from joining servers,
+                 * so if it's closed, we need to destroy the record about the
+                 * server.  This way the process can be started over on the
+                 * next join request. */
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5);
+                VLOG_INFO_RL(&rl, "cluster "CID_FMT": server %s (%s) "
+                                  "disconnected while joining",
+                                  CID_ARGS(&raft->cid),
+                                  server->nickname, server->address);
+                hmap_remove(&raft->add_servers, &server->hmap_node);
+                raft_server_destroy(server);
+            }
             raft->n_disconnections++;
             raft_conn_close(conn);
         }
     }
 
     /* Open needed sessions. */
-    struct raft_server *server;
     HMAP_FOR_EACH (server, hmap_node, &raft->servers) {
         raft_open_conn(raft, server->address, &server->sid);
     }
@@ -3354,12 +3378,6 @@ raft_find_peer(struct raft *raft, const struct uuid *uuid)
     return s && !uuid_equals(&raft->sid, &s->sid) ? s : NULL;
 }
 
-static struct raft_server *
-raft_find_new_server(struct raft *raft, const struct uuid *uuid)
-{
-    return raft_server_find(&raft->add_servers, uuid);
-}
-
 /* Figure 3.1: "If there exists an N such that N > commitIndex, a
  * majority of matchIndex[i] >= N, and log[N].term == currentTerm, set
  * commitIndex = N (sections 3.5 and 3.6)." */
@@ -4142,6 +4160,10 @@ static void
 raft_handle_install_snapshot_request(
     struct raft *raft, const struct raft_install_snapshot_request *rq)
 {
+    if (failure_test == FT_CRASH_BEFORE_SEND_SNAPSHOT_REP) {
+        ovs_fatal(0, "Raft test: crash before sending install_snapshot_reply");
+    }
+
     if (raft_handle_install_snapshot_request__(raft, rq)) {
         union raft_rpc rpy = {
             .install_snapshot_reply = {
@@ -4940,6 +4962,8 @@ raft_unixctl_failure_test(struct unixctl_conn *conn OVS_UNUSED,
         failure_test = FT_CRASH_AFTER_SEND_EXEC_REQ;
     } else if (!strcmp(test, "crash-after-receiving-append-request-update")) {
         failure_test = FT_CRASH_AFTER_RECV_APPEND_REQ_UPDATE;
+    } else if (!strcmp(test, "crash-before-sending-install-snapshot-reply")) {
+        failure_test = FT_CRASH_BEFORE_SEND_SNAPSHOT_REP;
     } else if (!strcmp(test, "delay-election")) {
         failure_test = FT_DELAY_ELECTION;
         struct raft *raft;
diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at
index fc6253cfe..07af1160f 100644
--- a/tests/ovsdb-cluster.at
+++ b/tests/ovsdb-cluster.at
@@ -400,6 +400,61 @@ done
 
 AT_CLEANUP
 
+AT_BANNER([OVSDB - cluster failure while joining])
+AT_SETUP([OVSDB cluster - follower crash while joining])
+AT_KEYWORDS([ovsdb server negative unix cluster join])
+
+n=3
+schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema`
+ordinal_schema > schema
+AT_CHECK([ovsdb-tool '-vPATTERN:console:%c|%p|%m' create-cluster s1.db dnl
+              $abs_srcdir/idltest.ovsschema unix:s1.raft], [0], [], [stderr])
+cid=`ovsdb-tool db-cid s1.db`
+schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema`
+for i in `seq 2 $n`; do
+    AT_CHECK([ovsdb-tool join-cluster s$i.db $schema_name unix:s$i.raft unix:s1.raft])
+done
+
+on_exit 'kill `cat *.pid`'
+
+dnl Starting followers first, so we can configure them to crash on join.
+for j in `seq $n`; do
+    i=$(($n + 1 - $j))
+    AT_CHECK([ovsdb-server -v -vconsole:off -vsyslog:off dnl
+                           --detach --no-chdir --log-file=s$i.log dnl
+                           --pidfile=s$i.pid --unixctl=s$i dnl
+                           --remote=punix:s$i.ovsdb s$i.db])
+    if test $i != 1; then
+        OVS_WAIT_UNTIL([ovs-appctl -t "`pwd`"/s$i dnl
+                            cluster/failure-test crash-before-sending-install-snapshot-reply dnl
+                            | grep -q "engaged"])
+    fi
+done
+
+dnl Make sure that followers really crashed.
+for i in `seq 2 $n`; do
+    OVS_WAIT_WHILE([test -s s$i.pid])
+done
+
+dnl Bring them back.
+for i in `seq 2 $n`; do
+    AT_CHECK([ovsdb-server -v -vconsole:off -vsyslog:off dnl
+                           --detach --no-chdir --log-file=s$i.log dnl
+                           --pidfile=s$i.pid --unixctl=s$i dnl
+                           --remote=punix:s$i.ovsdb s$i.db])
+done
+
+dnl Make sure that all servers joined the cluster.
+for i in `seq $n`; do
+    AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected])
+done
+
+for i in `seq $n`; do
+    OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid])
+done
+
+AT_CLEANUP
+
 
 
 OVS_START_SHELL_HELPERS