From patchwork Fri Mar 15 20:14:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 1912698 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TxFr63r1Rz1yWn for ; Sat, 16 Mar 2024 07:16:06 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id AA41C41899; Fri, 15 Mar 2024 20:16:04 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id G4_7cZ3FmscN; Fri, 15 Mar 2024 20:16:00 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org D74ED41825 Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTPS id D74ED41825; Fri, 15 Mar 2024 20:15:58 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id A7958C0072; Fri, 15 Mar 2024 20:15:58 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 107FAC0037 for ; Fri, 15 Mar 2024 20:15:58 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id A3D9D82362 for ; Fri, 15 Mar 2024 20:15:57 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Qgq0zQSgsZkR for ; Fri, 15 Mar 2024 20:15:55 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=217.70.183.198; helo=relay6-d.mail.gandi.net; envelope-from=i.maximets@ovn.org; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp1.osuosl.org DEE9D82499 Authentication-Results: smtp1.osuosl.org; dmarc=none (p=none dis=none) header.from=ovn.org DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org DEE9D82499 Received: from relay6-d.mail.gandi.net (relay6-d.mail.gandi.net [217.70.183.198]) by smtp1.osuosl.org (Postfix) with ESMTPS id DEE9D82499 for ; Fri, 15 Mar 2024 20:15:49 +0000 (UTC) Received: by mail.gandi.net (Postfix) with ESMTPSA id 7EE10C0003; Fri, 15 Mar 2024 20:15:47 +0000 (UTC) From: Ilya Maximets To: ovs-dev@openvswitch.org Date: Fri, 15 Mar 2024 21:14:53 +0100 Message-ID: <20240315201614.236523-6-i.maximets@ovn.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240315201614.236523-1-i.maximets@ovn.org> References: <20240315201614.236523-1-i.maximets@ovn.org> MIME-Version: 1.0 X-GND-Sasl: i.maximets@ovn.org Cc: Ilya Maximets , Dumitru Ceara Subject: [ovs-dev] [PATCH 5/5] ovsdb: raft: Fix inability to join after leadership change round trip. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Consider the following sequence of events: 1. Cluster with 2 nodes - A and B. A is a leader. 2. C connects to A and sends a join request. 3. A sends an append request to C. C is in CATCHUP phase for A. 4. A looses leadership to B. Sends join failure notification to C. 5. C sends append reply to A. 6. A discards append reply (not leader). 7. B looses leadership back to A. 8. C sends a new join request to A. 9. A replies with failure (already in progress). 10. GoTo step 8. At this point A is waiting for an append reply that it already discarded at step 6 and fails all the new attempts of C to join with 'already in progress' verdict. C stays forever in a joining state and in a CATCHUP phase from A's perspective. This is a similar case to a sudden disconnect from a leader fixed in commit 999ba294fb4f ("ovsdb: raft: Fix inability to join the cluster after interrupted attempt."), but since we're not disconnecting, the servers are not getting destroyed. Fix that by destroying all the servers that are not yet part of the configuration after leadership is lost. This way, server C will be able to simply re-start the joining process from scratch. New failure test command is added in order to simulate leadership change before we receive the append reply, so it gets discarded. New cluster test is added to exercise this scenario. Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.") Reported-at: https://github.com/ovn-org/ovn/issues/235 Signed-off-by: Ilya Maximets Acked-by: Han Zhou --- ovsdb/raft.c | 16 ++++++++++++- tests/ovsdb-cluster.at | 53 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 68 insertions(+), 1 deletion(-) diff --git a/ovsdb/raft.c b/ovsdb/raft.c index c41419052..f9e760a08 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -81,6 +81,7 @@ enum raft_failure_test { FT_STOP_RAFT_RPC, FT_TRANSFER_LEADERSHIP, FT_TRANSFER_LEADERSHIP_AFTER_SEND_APPEND_REQ, + FT_TRANSFER_LEADERSHIP_AFTER_STARTING_TO_ADD, }; static enum raft_failure_test failure_test; @@ -2702,15 +2703,22 @@ raft_become_follower(struct raft *raft) * new configuration. Our AppendEntries processing will properly update * the server configuration later, if necessary. * + * However, since we're sending replies about a failure to add, those new + * servers has to be cleaned up. Otherwise, they will stuck in a 'CATCHUP' + * phase in case this server regains leadership before they join through + * the current new leader. They are not yet in 'raft->servers', so not + * part of the shared configuration. + * * Also we do not complete commands here, as they can still be completed * if their log entries have already been replicated to other servers. * If the entries were actually committed according to the new leader, our * AppendEntries processing will complete the corresponding commands. */ struct raft_server *s; - HMAP_FOR_EACH (s, hmap_node, &raft->add_servers) { + HMAP_FOR_EACH_POP (s, hmap_node, &raft->add_servers) { raft_send_add_server_reply__(raft, &s->sid, s->address, false, RAFT_SERVER_LOST_LEADERSHIP); + raft_server_destroy(s); } if (raft->remove_server) { raft_send_remove_server_reply__(raft, &raft->remove_server->sid, @@ -3985,6 +3993,10 @@ raft_handle_add_server_request(struct raft *raft, "to cluster "CID_FMT, s->nickname, SID_ARGS(&s->sid), rq->address, CID_ARGS(&raft->cid)); raft_send_append_request(raft, s, 0, "initialize new server"); + + if (failure_test == FT_TRANSFER_LEADERSHIP_AFTER_STARTING_TO_ADD) { + failure_test = FT_TRANSFER_LEADERSHIP; + } } static void @@ -5110,6 +5122,8 @@ raft_unixctl_failure_test(struct unixctl_conn *conn OVS_UNUSED, } else if (!strcmp(test, "transfer-leadership-after-sending-append-request")) { failure_test = FT_TRANSFER_LEADERSHIP_AFTER_SEND_APPEND_REQ; + } else if (!strcmp(test, "transfer-leadership-after-starting-to-add")) { + failure_test = FT_TRANSFER_LEADERSHIP_AFTER_STARTING_TO_ADD; } else if (!strcmp(test, "transfer-leadership")) { failure_test = FT_TRANSFER_LEADERSHIP; } else if (!strcmp(test, "clear")) { diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at index 482e4e02d..9d8b4d06a 100644 --- a/tests/ovsdb-cluster.at +++ b/tests/ovsdb-cluster.at @@ -525,6 +525,59 @@ for i in $(seq $n); do OVS_APP_EXIT_AND_WAIT_BY_TARGET([$(pwd)/s$i], [s$i.pid]) done +AT_CLEANUP + +AT_SETUP([OVSDB cluster - leadership change before replication while joining]) +AT_KEYWORDS([ovsdb server negative unix cluster join]) + +n=5 +AT_CHECK([ovsdb-tool '-vPATTERN:console:%c|%p|%m' create-cluster s1.db dnl + $abs_srcdir/idltest.ovsschema unix:s1.raft], [0], [], [stderr]) +cid=$(ovsdb-tool db-cid s1.db) +schema_name=$(ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema) +for i in $(seq 2 $n); do + AT_CHECK([ovsdb-tool join-cluster s$i.db $schema_name unix:s$i.raft unix:s1.raft]) +done + +on_exit 'kill $(cat *.pid)' +on_exit " + for i in \$(ls $(pwd)/s[[0-$n]]); do + ovs-appctl --timeout 1 -t \$i cluster/status $schema_name; + done +" + +dnl Starting servers one by one asking all exisitng servers to transfer +dnl leadership right after starting to add a server. Joining server will +dnl need to find a new leader that will also transfer leadership. +dnl This will continue until the same server will not become a leader +dnl for the second time and will be able to add a new server. +for i in $(seq $n); do + dnl Make sure that all already started servers joined the cluster. + for j in $(seq $((i - 1)) ); do + AT_CHECK([ovsdb_client_wait unix:s$j.ovsdb $schema_name connected]) + done + for j in $(seq $((i - 1)) ); do + OVS_WAIT_UNTIL([ovs-appctl -t "$(pwd)"/s$j \ + cluster/failure-test \ + transfer-leadership-after-starting-to-add \ + | grep -q "engaged"]) + done + + AT_CHECK([ovsdb-server -v -vconsole:off -vsyslog:off \ + --detach --no-chdir --log-file=s$i.log \ + --pidfile=s$i.pid --unixctl=s$i \ + --remote=punix:s$i.ovsdb s$i.db]) +done + +dnl Make sure that all servers joined the cluster. +for i in $(seq $n); do + AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected]) +done + +for i in $(seq $n); do + OVS_APP_EXIT_AND_WAIT_BY_TARGET([$(pwd)/s$i], [s$i.pid]) +done + AT_CLEANUP