From patchwork Wed Aug 14 22:37:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Zhou X-Patchwork-Id: 1147277 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="n9j2RRsr"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4684JG1b7mz9sN1 for ; Thu, 15 Aug 2019 08:39:10 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 2B7BBE37; Wed, 14 Aug 2019 22:37:31 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id D569FE2C for ; Wed, 14 Aug 2019 22:37:28 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-ed1-f68.google.com (mail-ed1-f68.google.com [209.85.208.68]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id D496F67F for ; Wed, 14 Aug 2019 22:37:27 +0000 (UTC) Received: by mail-ed1-f68.google.com with SMTP id r12so617231edo.5 for ; Wed, 14 Aug 2019 15:37:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=+fFYWAQjDJNKAIAx6q4iSc+i6sRTde0otCZHs1nsH/8=; b=n9j2RRsro8kaObPWWTaAifcpIJeNRJ62ha4jsK7AOF23jwt0HtgtRB3fB0zSzoVSv9 caf8k1SZbQ5y7XakoV2TcNzI9ouoZ07Gyk4lbrf1L/1wZlrEo25PADSQx+nQFDx6YONw JrV998Fqxt1a0AJwT7OADw/bS9m2m4v6klZ7Z8c6+kN2HeGP3ffgFFYIRiVFPx+pp/te EwR8vT55jWnpeuC15lRQpKHHGIWrmpRVuHwOKQ8E9ZV+SsYZa5Ns7cUXLnHKollZSaS+ k9SVF8kXnf3EOI5ZDRM8w+fjakqMV958fX/O8mMY6y53gaVeCO7dh2xjuSI7jH7KwHGk nVYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=+fFYWAQjDJNKAIAx6q4iSc+i6sRTde0otCZHs1nsH/8=; b=qjOxA8um7FpDlyeIEx5lGeJN67wZfLS30XfNfL6RjrPJzidb2bLXI8MwfABC8FmMwX tmBPJOpDEO7JfvQWGj3mQJU1YwPGQZFR/hXIj1IZQqwXTYlOYE6OMi9CjenjWWgSQqX4 OzJeT0N2pcvnC1CeZ3/p/SqP8ZjwrXhwICPHIM7a26Dpj8gqN3uE8IPXu+SWe1hdFhhk ihqGwhAt0dcn92V6IBKukjeD/ThAr0yoOvwpguL+Rghvhw/RGlILbdWcVa+UZcC/oLeo PrshhpkUaG0ru4C/7ytz6/oVpLpv55WxDx+gzFuDic0zDd1FMdSGtnvV1qWxdYwFgjum Ao7w== X-Gm-Message-State: APjAAAVlnwmm6it3Ud52IyASUoeaSZKqknhEZ93AOUqJ9by+c1uOy5cs 4+bA6pIULQF/iyaq17i6nwjgehNHhNU= X-Google-Smtp-Source: APXvYqwBXI9dgsg8IefDuBF7X1r3vDNSpFgn1VOMhWqbhP2O1C60yNtJgcJUVurnWfZxw++5+T/elQ== X-Received: by 2002:a50:fd10:: with SMTP id i16mr2092191eds.97.1565822246179; Wed, 14 Aug 2019 15:37:26 -0700 (PDT) Received: from localhost.localdomain.localdomain ([216.113.160.77]) by smtp.gmail.com with ESMTPSA id c15sm198167edf.37.2019.08.14.15.37.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 14 Aug 2019 15:37:25 -0700 (PDT) From: Han Zhou X-Google-Original-From: Han Zhou To: dev@openvswitch.org Date: Wed, 14 Aug 2019 15:37:03 -0700 Message-Id: <1565822223-90171-4-git-send-email-hzhou8@ebay.com> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1565822223-90171-1-git-send-email-hzhou8@ebay.com> References: <1565822223-90171-1-git-send-email-hzhou8@ebay.com> X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v2 4/4] raft.c: Set candidate_retrying if no leader elected since last election. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Han Zhou candiate_retrying is used to determine if the current node is disconnected from the cluster when the node is in candiate role. However, a node can flap between candidate and follower role before a leader is elected when majority of the cluster is down, so is_connected() will flap, too, which confuses clients. This patch avoids the flapping with the help of a new member had_leader, so that if no leader was elected since last election, we know we are still retrying, and keep as disconnected from the cluster. Signed-off-by: Han Zhou --- v1 -> v2: Fixed the condition in raft_is_connected(), and added a test case. Moved this patch from 3/4 to 4/4 in the series. ovsdb/raft.c | 29 ++++++++++++++++++----- tests/ovsdb-cluster.at | 64 ++++++++++++++++++++++++++++++++++++++++---------- 2 files changed, 74 insertions(+), 19 deletions(-) diff --git a/ovsdb/raft.c b/ovsdb/raft.c index 64b8150..c2bd0ed 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -286,8 +286,11 @@ struct raft { /* Candidates only. Reinitialized at start of election. */ int n_votes; /* Number of votes for me. */ - bool candidate_retrying; /* The first round of election timed-out and it - is now retrying. */ + bool candidate_retrying; /* The earlier election timed-out and we are + now retrying. */ + bool had_leader; /* There has been leader elected since last + election initiated. This is to help setting + candidate_retrying. */ }; /* All Raft structures. */ @@ -345,6 +348,7 @@ static bool raft_handle_write_error(struct raft *, struct ovsdb_error *); static void raft_run_reconfigure(struct raft *); +static void raft_set_leader(struct raft *, const struct uuid *sid); static struct raft_server * raft_find_server(const struct raft *raft, const struct uuid *sid) { @@ -997,11 +1001,13 @@ raft_get_sid(const struct raft *raft) bool raft_is_connected(const struct raft *raft) { - return (!(raft->role == RAFT_CANDIDATE && raft->candidate_retrying) + bool ret = (!raft->candidate_retrying && !raft->joining && !raft->leaving && !raft->left && !raft->failed); + VLOG_DBG("raft_is_connected: %s\n", ret? "true": "false"); + return ret; } /* Returns true if 'raft' is the cluster leader. */ @@ -1616,8 +1622,11 @@ raft_start_election(struct raft *raft, bool leadership_transfer) } ovs_assert(raft->role != RAFT_LEADER); - raft->candidate_retrying = (raft->role == RAFT_CANDIDATE); raft->role = RAFT_CANDIDATE; + /* If there was no leader elected since last election, we know we are + * retrying now. */ + raft->candidate_retrying = !raft->had_leader; + raft->had_leader = false; raft->n_votes = 0; @@ -2487,6 +2496,14 @@ raft_server_init_leader(struct raft *raft, struct raft_server *s) } static void +raft_set_leader(struct raft *raft, const struct uuid *sid) +{ + raft->leader_sid = *sid; + raft->had_leader = true; + raft->candidate_retrying = false; +} + +static void raft_become_leader(struct raft *raft) { log_all_commands(raft); @@ -2498,7 +2515,7 @@ raft_become_leader(struct raft *raft) ovs_assert(raft->role != RAFT_LEADER); raft->role = RAFT_LEADER; - raft->leader_sid = raft->sid; + raft_set_leader(raft, &raft->sid); raft_reset_election_timer(raft); raft_reset_ping_timer(raft); @@ -2892,7 +2909,7 @@ raft_update_leader(struct raft *raft, const struct uuid *sid) raft_get_nickname(raft, sid, buf, sizeof buf), raft->term); } - raft->leader_sid = *sid; + raft_set_leader(raft, sid); /* Record the leader to the log. This is not used by the algorithm * (although it could be, for quick restart), but it is used for diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at index 7146fe6..f124244 100644 --- a/tests/ovsdb-cluster.at +++ b/tests/ovsdb-cluster.at @@ -66,23 +66,30 @@ EXECUTION_EXAMPLES AT_BANNER([OVSDB - disconnect from cluster]) OVS_START_SHELL_HELPERS -# ovsdb_test_cluster_disconnect LEADER_OR_FOLLOWER +# ovsdb_test_cluster_disconnect N_SERVERS LEADER_OR_FOLLOWER [CHECK_FLAPPING] +# Test server disconnected from the cluster. +# N_SERVERS: Number of servers in the cluster. +# LEADER_OR_FOLLOWER: The role of the server that is disconnected from the +# cluster: "leader" or "follower". +# CHECK_FLAPPING: Whether to check if is_disconnected flapped. "yes", "no". ovsdb_test_cluster_disconnect () { - leader_or_follower=$1 + n=$1 + leader_or_follower=$2 + check_flapping=$3 schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema` ordinal_schema > schema AT_CHECK([ovsdb-tool '-vPATTERN:console:%c|%p|%m' create-cluster s1.db $abs_srcdir/idltest.ovsschema unix:s1.raft], [0], [], [stderr]) cid=`ovsdb-tool db-cid s1.db` schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema` - for i in `seq 2 3`; do + for i in `seq 2 $n`; do AT_CHECK([ovsdb-tool join-cluster s$i.db $schema_name unix:s$i.raft unix:s1.raft]) done on_exit 'kill `cat *.pid`' - for i in `seq 3`; do + for i in `seq $n`; do AT_CHECK([ovsdb-server -v -vconsole:off -vsyslog:off --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb s$i.db]) done - for i in `seq 3`; do + for i in `seq $n`; do AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected]) done @@ -96,14 +103,18 @@ ovsdb_test_cluster_disconnect () { # a VIP on a load-balance. So we use single remote to test here. if test $leader_or_follower == "leader"; then target=1 - shutdown="2 3" + shutdown=`seq $(($n/2 + 1)) $n` + cleanup=`seq $(($n/2))` else - target=3 + target=$n - # shutdown follower before the leader so that there is no chance for s3 - # become leader during the process. - shutdown="2 1" + # shutdown followers before the leader (s1) so that there is no chance for + # s$n to become leader during the process. + shutdown="`seq 2 $(($n/2 + 1))` 1" + cleanup=`seq $(($n/2 + 2)) $n` fi + echo shutdown=$shutdown + echo cleanup=$cleanup # Connect to $target. Use "wait" to trigger a non-op transaction so # that test-ovsdb will not quit. @@ -119,6 +130,11 @@ ovsdb_test_cluster_disconnect () { OVS_WAIT_UNTIL([grep "000: i=1" test-ovsdb.log]) + # Start collecting raft_is_connected logs for $target before shutting down + # any servers. + tail -f s$target.log > raft_is_connected.log & + on_exit 'kill `echo $!`' + # Shutdown the other servers so that $target is disconnected from the cluster. for i in $shutdown; do OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid]) @@ -127,18 +143,40 @@ ovsdb_test_cluster_disconnect () { # The test-ovsdb should detect the disconnect and retry. OVS_WAIT_UNTIL([grep disconnect test-ovsdb.log]) - OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$target], [s$target.pid]) + # The $target debug log should show raft_is_connected: false. + OVS_WAIT_UNTIL([grep "raft_is_connected: false" raft_is_connected.log]) + + # Save the current count of "raft_is_connected: true" + count_old=`grep "raft_is_connected: true" raft_is_connected.log | wc -l` + echo count_old $count_old + + if test X$check_flapping == X"yes"; then + sleep 10 + fi + # Make sure raft_is_connected didn't flap from false to true. + count_new=`grep "raft_is_connected: true" raft_is_connected.log | wc -l` + echo count_new $count_new + AT_CHECK([test $count_new == $count_old]) + + for i in $cleanup; do + OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid]) + done } OVS_END_SHELL_HELPERS AT_SETUP([OVSDB cluster - follower disconnect from cluster, single remote]) AT_KEYWORDS([ovsdb server negative unix cluster disconnect]) -ovsdb_test_cluster_disconnect follower +ovsdb_test_cluster_disconnect 3 follower AT_CLEANUP AT_SETUP([OVSDB cluster - leader disconnect from cluster, single remote]) AT_KEYWORDS([ovsdb server negative unix cluster disconnect]) -ovsdb_test_cluster_disconnect leader +ovsdb_test_cluster_disconnect 3 leader +AT_CLEANUP + +AT_SETUP([OVSDB cluster - leader disconnect from cluster, check flapping]) +AT_KEYWORDS([ovsdb server negative unix cluster disconnect]) +ovsdb_test_cluster_disconnect 5 leader yes AT_CLEANUP