From patchwork Mon Aug 19 16:29:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Zhou X-Patchwork-Id: 1149395 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="YVebRPc+"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46Bzwd0Fdhz9sNk for ; Tue, 20 Aug 2019 02:32:17 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id A3206CC7; Mon, 19 Aug 2019 16:30:17 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id C9689CC1 for ; Mon, 19 Aug 2019 16:30:14 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id B6E1267F for ; Mon, 19 Aug 2019 16:30:13 +0000 (UTC) Received: by mail-pf1-f195.google.com with SMTP id w2so1463090pfi.3 for ; Mon, 19 Aug 2019 09:30:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=aQ0t9NBIUphWG+NQkYMyFN33Mtj/467bkSvQ0sZRG+M=; b=YVebRPc+U/gUajsBF/2h7FGsLukC8jQ9upFbWgaCYpjZ20i0UvYSL2lOAzm8a7HlId n47yg7BnLsfVjSo0FvPZmfaPYlR6dkX2xVAILcWCG12TYBWxNGQDuZ4D2z/BfH1gGkV1 SMKV8LQV9d4ghQVpsM8rdj17Wzx4NijMHQ3Ii5xZqqHte2UpRMUToUad3s1F4z7ewA6+ Y4w6TifLoJyAWqu4yXizgAku/+13MCP14Z3pc0hsC4Y6RT2aSjH/Jg9yR06xT0KYmcB6 b1fpbEtzfqKpoiJIbVT3SMhzt7WoHqBQmUrLM50vB9oukvH4IActda59+axEbbl8bpJr 680w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=aQ0t9NBIUphWG+NQkYMyFN33Mtj/467bkSvQ0sZRG+M=; b=ThqiLHt2KHk/6JAcZtf6Y3NnCqY5cNVsUybRPUir1/PehXyGf4pflnfdhurpKqiMGl WNKNuCMC9o6nbYHetyqUncgnT9xBHrZYlD6ZLdK0/8R1y4GENk5Yqyj/mYOREfRaxuav cshtIsHfdEvawvDnKZM/u4p8g8UUCMcHuoMGwlk6VU3DLrO+xclg9s1lXwyjV1rDqK8d RbuX3vyUuzp3A+GfKNFGdQg6R3j/0wo7xsp293hBXY31r0OO+WnV3XR0eE74uqE8swXV bIHeeHwGfugXbBhU+75IN7Z5MGA7nWkdn1GNe3jlJYR5Ya+LkOx/lsCe7bz+i4R4tmWW 5W1Q== X-Gm-Message-State: APjAAAXPEHqXBZC/XjMfhyi8DYiucTJkMU1EJASaTliAJyCN50Y+Cny8 oo52Vd4ZWlcc26e+HeENJtS5rIp6SbE= X-Google-Smtp-Source: APXvYqzg/rQ3S9H5/6Tl1bBWi6UMaCGj7mlJp7czxmzEEhviugv9KnNYidzRj9BlUAzB8RiDy2JOWg== X-Received: by 2002:a62:8246:: with SMTP id w67mr25583376pfd.226.1566232212794; Mon, 19 Aug 2019 09:30:12 -0700 (PDT) Received: from localhost.localdomain.localdomain ([73.241.94.255]) by smtp.gmail.com with ESMTPSA id a5sm12285418pjs.31.2019.08.19.09.30.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 19 Aug 2019 09:30:12 -0700 (PDT) From: Han Zhou X-Google-Original-From: Han Zhou To: dev@openvswitch.org Date: Mon, 19 Aug 2019 09:29:59 -0700 Message-Id: <1566232200-35099-4-git-send-email-hzhou8@ebay.com> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1566232200-35099-1-git-send-email-hzhou8@ebay.com> References: <1566232200-35099-1-git-send-email-hzhou8@ebay.com> X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v3 4/5] raft.c: Set candidate_retrying if no leader elected since last election. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Han Zhou candiate_retrying is used to determine if the current node is disconnected from the cluster when the node is in candiate role. However, a node can flap between candidate and follower role before a leader is elected when majority of the cluster is down, so is_connected() will flap, too, which confuses clients. This patch avoids the flapping with the help of a new member had_leader, so that if no leader was elected since last election, we know we are still retrying, and keep as disconnected from the cluster. Signed-off-by: Han Zhou --- v1 -> v2: Fixed the condition in raft_is_connected(), and added a test case. Moved this patch from 3/4 to 4/4 in the series. v2 -> v3: Minor fix in test case. ovsdb/raft.c | 31 +++++++++++++++++++----- tests/ovsdb-cluster.at | 64 ++++++++++++++++++++++++++++++++++++++++---------- 2 files changed, 76 insertions(+), 19 deletions(-) diff --git a/ovsdb/raft.c b/ovsdb/raft.c index 71c5a7e..ca86830 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -285,8 +285,13 @@ struct raft { /* Candidates only. Reinitialized at start of election. */ int n_votes; /* Number of votes for me. */ - bool candidate_retrying; /* The first round of election timed-out and it - is now retrying. */ + + /* Followers and candidates only. */ + bool candidate_retrying; /* The earlier election timed-out and we are + now retrying. */ + bool had_leader; /* There has been leader elected since last + election initiated. This is to help setting + candidate_retrying. */ }; /* All Raft structures. */ @@ -344,6 +349,7 @@ static bool raft_handle_write_error(struct raft *, struct ovsdb_error *); static void raft_run_reconfigure(struct raft *); +static void raft_set_leader(struct raft *, const struct uuid *sid); static struct raft_server * raft_find_server(const struct raft *raft, const struct uuid *sid) { @@ -996,11 +1002,13 @@ raft_get_sid(const struct raft *raft) bool raft_is_connected(const struct raft *raft) { - return (!(raft->role == RAFT_CANDIDATE && raft->candidate_retrying) + bool ret = (!raft->candidate_retrying && !raft->joining && !raft->leaving && !raft->left && !raft->failed); + VLOG_DBG("raft_is_connected: %s\n", ret? "true": "false"); + return ret; } /* Returns true if 'raft' is the cluster leader. */ @@ -1615,8 +1623,11 @@ raft_start_election(struct raft *raft, bool leadership_transfer) } ovs_assert(raft->role != RAFT_LEADER); - raft->candidate_retrying = (raft->role == RAFT_CANDIDATE); raft->role = RAFT_CANDIDATE; + /* If there was no leader elected since last election, we know we are + * retrying now. */ + raft->candidate_retrying = !raft->had_leader; + raft->had_leader = false; raft->n_votes = 0; @@ -2486,6 +2497,14 @@ raft_server_init_leader(struct raft *raft, struct raft_server *s) } static void +raft_set_leader(struct raft *raft, const struct uuid *sid) +{ + raft->leader_sid = *sid; + raft->had_leader = true; + raft->candidate_retrying = false; +} + +static void raft_become_leader(struct raft *raft) { log_all_commands(raft); @@ -2497,7 +2516,7 @@ raft_become_leader(struct raft *raft) ovs_assert(raft->role != RAFT_LEADER); raft->role = RAFT_LEADER; - raft->leader_sid = raft->sid; + raft_set_leader(raft, &raft->sid); raft_reset_election_timer(raft); raft_reset_ping_timer(raft); @@ -2891,7 +2910,7 @@ raft_update_leader(struct raft *raft, const struct uuid *sid) raft_get_nickname(raft, sid, buf, sizeof buf), raft->term); } - raft->leader_sid = *sid; + raft_set_leader(raft, sid); /* Record the leader to the log. This is not used by the algorithm * (although it could be, for quick restart), but it is used for diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at index 0fbb268..c8532fa 100644 --- a/tests/ovsdb-cluster.at +++ b/tests/ovsdb-cluster.at @@ -66,23 +66,30 @@ EXECUTION_EXAMPLES AT_BANNER([OVSDB - disconnect from cluster]) OVS_START_SHELL_HELPERS -# ovsdb_test_cluster_disconnect LEADER_OR_FOLLOWER +# ovsdb_test_cluster_disconnect N_SERVERS LEADER_OR_FOLLOWER [CHECK_FLAPPING] +# Test server disconnected from the cluster. +# N_SERVERS: Number of servers in the cluster. +# LEADER_OR_FOLLOWER: The role of the server that is disconnected from the +# cluster: "leader" or "follower". +# CHECK_FLAPPING: Whether to check if is_disconnected flapped. "yes", "no". ovsdb_test_cluster_disconnect () { - leader_or_follower=$1 + n=$1 + leader_or_follower=$2 + check_flapping=$3 schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema` ordinal_schema > schema AT_CHECK([ovsdb-tool '-vPATTERN:console:%c|%p|%m' create-cluster s1.db $abs_srcdir/idltest.ovsschema unix:s1.raft], [0], [], [stderr]) cid=`ovsdb-tool db-cid s1.db` schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema` - for i in `seq 2 3`; do + for i in `seq 2 $n`; do AT_CHECK([ovsdb-tool join-cluster s$i.db $schema_name unix:s$i.raft unix:s1.raft]) done on_exit 'kill `cat *.pid`' - for i in `seq 3`; do + for i in `seq $n`; do AT_CHECK([ovsdb-server -v -vconsole:off -vsyslog:off --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb s$i.db]) done - for i in `seq 3`; do + for i in `seq $n`; do AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected]) done @@ -96,14 +103,18 @@ ovsdb_test_cluster_disconnect () { # a VIP on a load-balance. So we use single remote to test here. if test $leader_or_follower == "leader"; then target=1 - shutdown="2 3" + shutdown=`seq $(($n/2 + 1)) $n` + cleanup=`seq $(($n/2))` else - target=3 + target=$n - # shutdown follower before the leader so that there is no chance for s3 - # become leader during the process. - shutdown="2 1" + # shutdown followers before the leader (s1) so that there is no chance for + # s$n to become leader during the process. + shutdown="`seq 2 $(($n/2 + 1))` 1" + cleanup=`seq $(($n/2 + 2)) $n` fi + echo shutdown=$shutdown + echo cleanup=$cleanup # Connect to $target. Use "wait" to trigger a non-op transaction so # that test-ovsdb will not quit. @@ -119,6 +130,11 @@ ovsdb_test_cluster_disconnect () { OVS_WAIT_UNTIL([grep "000: i=1" test-ovsdb.log]) + # Start collecting raft_is_connected logs for $target before shutting down + # any servers. + tail -f s$target.log > raft_is_connected.log & + echo $! > tail.pid + # Shutdown the other servers so that $target is disconnected from the cluster. for i in $shutdown; do OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid]) @@ -127,18 +143,40 @@ ovsdb_test_cluster_disconnect () { # The test-ovsdb should detect the disconnect and retry. OVS_WAIT_UNTIL([grep disconnect test-ovsdb.log]) - OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$target], [s$target.pid]) + # The $target debug log should show raft_is_connected: false. + OVS_WAIT_UNTIL([grep "raft_is_connected: false" raft_is_connected.log]) + + # Save the current count of "raft_is_connected: true" + count_old=`grep "raft_is_connected: true" raft_is_connected.log | wc -l` + echo count_old $count_old + + if test X$check_flapping == X"yes"; then + sleep 10 + fi + # Make sure raft_is_connected didn't flap from false to true. + count_new=`grep "raft_is_connected: true" raft_is_connected.log | wc -l` + echo count_new $count_new + AT_CHECK([test $count_new == $count_old]) + + for i in $cleanup; do + OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid]) + done } OVS_END_SHELL_HELPERS AT_SETUP([OVSDB cluster - follower disconnect from cluster, single remote]) AT_KEYWORDS([ovsdb server negative unix cluster disconnect]) -ovsdb_test_cluster_disconnect follower +ovsdb_test_cluster_disconnect 3 follower AT_CLEANUP AT_SETUP([OVSDB cluster - leader disconnect from cluster, single remote]) AT_KEYWORDS([ovsdb server negative unix cluster disconnect]) -ovsdb_test_cluster_disconnect leader +ovsdb_test_cluster_disconnect 3 leader +AT_CLEANUP + +AT_SETUP([OVSDB cluster - leader disconnect from cluster, check flapping]) +AT_KEYWORDS([ovsdb server negative unix cluster disconnect]) +ovsdb_test_cluster_disconnect 5 leader yes AT_CLEANUP