From patchwork Tue Aug 13 16:23:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Zhou X-Patchwork-Id: 1146474 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="dR6f+XXH"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 467J1J4mvKz9sDQ for ; Wed, 14 Aug 2019 02:23:30 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 1AB72C5D; Tue, 13 Aug 2019 16:23:27 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id BDDFCB4B for ; Tue, 13 Aug 2019 16:23:25 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 5614E89D for ; Tue, 13 Aug 2019 16:23:25 +0000 (UTC) Received: by mail-pf1-f193.google.com with SMTP id d85so1392688pfd.2 for ; Tue, 13 Aug 2019 09:23:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=pnOybCfymyk/45TBHRenx0a897DbpE3hxnNymjHyzIc=; b=dR6f+XXHSqo6egm3UEZIC5UGuIAAA4LBH3Xa8cmS8421c+p0T2ys4pW+WJGR2x+hs+ jF3dD612tDcIOs2GozBPklipVH+FbCQmFLP26p4a6Sf5mi7NhyYzF+yD860wa28O33kv TC0YB4LlaapOAkiciZEZUGKKoMF9KeY4923ICVe7VfEw4NYIwHo0aUCDpRkhC7CoWKyl 3P9ji3h+a854SjS6GCeEPG33Pn4HB8RZiM2xsz3aHH9VJF5UVgD9oaiAAO5RYTBsjenE n8XyK1gx9M1E1R4tikM8V2mZIYA6XTf3PFcyL5ZAiBXOe5cbr0IHU1XBqmBwf0+GhBRS CJvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=pnOybCfymyk/45TBHRenx0a897DbpE3hxnNymjHyzIc=; b=YP8P8vWtw7yx4uAAg+EmyBC6e472k+3TpecFD0rgfSIIGzBtVCdcpPsQPoaAWvUQDL ep2yrcA8Q7yTr8HNJXHzTgQhFdpKr82RK/udFvGb2dpG43aCZyp6/LiT6+7Ct7yZgEIC my4PqRRwyVo8EKelDhTfp0PikQJpzvBSkqNYAqMR0NIdpL0ZLj8wOdHeA5pJ4eAcKLSp Ub7ygksUBAnUGs8QYpbp42sUwb1SB95g1WzmUdzJj8SGuYwfbDHHCndniotLRstQi7J8 gqTHj4zHfOnCJtvRltjDCzlmcsMUXHlz222h7J6RdKVS0ARB8Qn69lP7pMJT/CPixJp7 4Yqw== X-Gm-Message-State: APjAAAW2PXurb1QamBquGxk+cddw41SOed3Djf26XYsd7rL4SkulLgBF b41Ug/dvHEtUSPO7TVVQTkFRZYbsPok= X-Google-Smtp-Source: APXvYqy2G3yBYO4iE0SG5GTMcKpgRP9P4Jv0tTGaj16MsDs04ag1q1L3Qn89gqx3Kf9ygrAtUCHZjg== X-Received: by 2002:aa7:97b7:: with SMTP id d23mr43196222pfq.203.1565713404615; Tue, 13 Aug 2019 09:23:24 -0700 (PDT) Received: from localhost.localdomain.localdomain ([73.241.94.255]) by smtp.gmail.com with ESMTPSA id e6sm10727206pfl.37.2019.08.13.09.23.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 13 Aug 2019 09:23:24 -0700 (PDT) From: Han Zhou X-Google-Original-From: Han Zhou To: dev@openvswitch.org Date: Tue, 13 Aug 2019 09:23:19 -0700 Message-Id: <1565713402-5458-1-git-send-email-hzhou8@ebay.com> X-Mailer: git-send-email 2.1.0 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH 1/4] raft.c: Move raft_reset_ping_timer() out of the loop. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Han Zhou Fixes: commit 5a9b53a5 ("ovsdb raft: Fix duplicated transaction execution when leader failover.") Signed-off-by: Han Zhou --- ovsdb/raft.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovsdb/raft.c b/ovsdb/raft.c index c60ef41..1c38b3b 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -1816,8 +1816,8 @@ raft_run(struct raft *raft) && now - cmd->timestamp > ELECTION_BASE_MSEC * 2) { raft_command_complete(raft, cmd, RAFT_CMD_TIMEOUT); } - raft_reset_ping_timer(raft); } + raft_reset_ping_timer(raft); } /* Do this only at the end; if we did it as soon as we set raft->left or From patchwork Tue Aug 13 16:23:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Zhou X-Patchwork-Id: 1146475 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="D8nqmwOp"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 467J1z1CGrz9sN1 for ; Wed, 14 Aug 2019 02:24:07 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id B25A3BDC; Tue, 13 Aug 2019 16:23:27 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id CFD46B88 for ; Tue, 13 Aug 2019 16:23:26 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pg1-f193.google.com (mail-pg1-f193.google.com [209.85.215.193]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 547F089D for ; Tue, 13 Aug 2019 16:23:26 +0000 (UTC) Received: by mail-pg1-f193.google.com with SMTP id w10so51516791pgj.7 for ; Tue, 13 Aug 2019 09:23:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=sjHLrc5tBk1BMC7N2Gqe2KeA/oPgnloD9SuKcolwBbw=; b=D8nqmwOpwFiYKaI4qtUlOwGfaa3lXQEUqRB1MhaYKFmck6m8EXQpR25kfwAOCnhR8u BgOAdo0m+RZUshAsz2bjDzh4x7Sp/79owvOZyKPe8pM1tG0R7NQJpz62oJO8+Hz8qXlq 7BKCKEhqLdSb6XoLCS7watceqULWRl5CjfwqjvQ55sz2YDu+kD/5UfMIXX15IlDxznb2 4MOUb/BuJQpZhSy4bpn23O5N+9eEDVhie0Aovw8Xjl+QIgASTB6aH9HOe0Ox8EbE/GRn ib/4hHouuaHAxEelOV3w8VUehdgdjj2swPsZhdZtVK9Q8nwbs1AZsFNVVodHKQRK30Sp xf8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=sjHLrc5tBk1BMC7N2Gqe2KeA/oPgnloD9SuKcolwBbw=; b=TNWAS7m3zWRoYi5WCQnhODVz0ca+TG1RmFiy9/kbTNIdivQEl/RascLJQWOukZd0ev WZU9a2B18sTzSGOT2wwDqOt773NG0ROwxhU0/bRuw2nOP1KczmmvE2um6+IJMIDH+bg1 D81oNFN5+HDGNhB/iv3yUHKyQBOjhi0idUX0sAx7DIYUxan6PL9LfIHNTYQ9Vj1Z7/vI ydvbwmn6PXVlVLEz6biU6wyr5d+sP/RFjeeld3Jo/WgjTAxBc0MTiILtlVzgHhSvYYUA /y/RfwuLHRVnniqsrtcoasVLnhbeX1R7xAs7xOrqINo/K2EygErjFWwgzYzVHrzThPxe +wfQ== X-Gm-Message-State: APjAAAVIk27qtIZR6040LyaRtVulNKT++jb9bby3X5tWR88cjsNhFdVu Ts3c8RQ7LF50sKllm9KqmXLkubupyOw= X-Google-Smtp-Source: APXvYqztgT7yicz6Rhzw0QIefrz6zoJV8aWMImtw+/uD28N4P7MKfJd4Nv3Ca7dCcBLKSjzELCXbYQ== X-Received: by 2002:a63:7d05:: with SMTP id y5mr13713511pgc.425.1565713405473; Tue, 13 Aug 2019 09:23:25 -0700 (PDT) Received: from localhost.localdomain.localdomain ([73.241.94.255]) by smtp.gmail.com with ESMTPSA id e6sm10727206pfl.37.2019.08.13.09.23.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 13 Aug 2019 09:23:24 -0700 (PDT) From: Han Zhou X-Google-Original-From: Han Zhou To: dev@openvswitch.org Date: Tue, 13 Aug 2019 09:23:20 -0700 Message-Id: <1565713402-5458-2-git-send-email-hzhou8@ebay.com> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1565713402-5458-1-git-send-email-hzhou8@ebay.com> References: <1565713402-5458-1-git-send-email-hzhou8@ebay.com> X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH 2/4] ovsdb-idl.c: Allows retry even when using a single remote. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Han Zhou When clustered mode is used, the client needs to retry connecting to new servers when certain failures happen. Today it is allowed to retry new connection only if multiple remotes are used, which prevents using LB VIP with clustered nodes. This patch makes sure the retry logic works when using LB VIP: although same IP is used for retrying, the LB can actually redirect the connection to a new node. Signed-off-by: Han Zhou --- lib/ovsdb-idl.c | 11 +++------- tests/ovsdb-cluster.at | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++ tests/test-ovsdb.c | 1 + 3 files changed, 61 insertions(+), 8 deletions(-) diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c index 1a6a4af..190143f 100644 --- a/lib/ovsdb-idl.c +++ b/lib/ovsdb-idl.c @@ -657,12 +657,8 @@ ovsdb_idl_state_to_string(enum ovsdb_idl_state state) static void ovsdb_idl_retry_at(struct ovsdb_idl *idl, const char *where) { - if (idl->session && jsonrpc_session_get_n_remotes(idl->session) > 1) { - ovsdb_idl_force_reconnect(idl); - ovsdb_idl_transition_at(idl, IDL_S_RETRY, where); - } else { - ovsdb_idl_transition_at(idl, IDL_S_ERROR, where); - } + ovsdb_idl_force_reconnect(idl); + ovsdb_idl_transition_at(idl, IDL_S_RETRY, where); } static void @@ -1895,8 +1891,7 @@ ovsdb_idl_check_server_db(struct ovsdb_idl *idl) if (!database) { VLOG_INFO_RL(&rl, "%s: server does not have %s database", server_name, idl->data.class_->database); - } else if (!strcmp(database->model, "clustered") - && jsonrpc_session_get_n_remotes(idl->session) > 1) { + } else if (!strcmp(database->model, "clustered")) { uint64_t index = database->n_index ? *database->index : 0; if (!database->schema) { diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at index 4701272..6a13843 100644 --- a/tests/ovsdb-cluster.at +++ b/tests/ovsdb-cluster.at @@ -63,6 +63,63 @@ m4_define([OVSDB_CHECK_EXECUTION], EXECUTION_EXAMPLES +AT_BANNER([OVSDB - disconnect from cluster]) + +# When a node is disconnected from the cluster, the IDL should disconnect +# and retry even if it uses a single remote, because the remote IP can be +# a VIP on a load-balance. +AT_SETUP([OVSDB cluster - disconnect from cluster, single remote]) +AT_KEYWORDS([ovsdb server negative unix cluster disconnect]) + +schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema` +ordinal_schema > schema +AT_CHECK([ovsdb-tool '-vPATTERN:console:%c|%p|%m' create-cluster s1.db $abs_srcdir/idltest.ovsschema unix:s1.raft], [0], [], [stderr]) +cid=`ovsdb-tool db-cid s1.db` +schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema` +for i in `seq 2 3`; do + AT_CHECK([ovsdb-tool join-cluster s$i.db $schema_name unix:s$i.raft unix:s1.raft]) +done + +on_exit 'kill `cat *.pid`' +for i in `seq 3`; do + AT_CHECK([ovsdb-server -v -vconsole:off -vsyslog:off --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb s$i.db]) +done +for i in `seq 3`; do + AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected]) +done + +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest", + {"op": "insert", + "table": "simple", + "row": {"i": 1}}]]'], [0], [ignore], [ignore]) + +# Connect to a single remote s3. Use "wait" to trigger a non-op transaction so +# that test-ovsdb will not quit. + +on_exit 'pkill test-ovsdb' +test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -v -t10 idl unix:s3.ovsdb '[["idltest", + {"op": "wait", + "table": "simple", + "where": [["i", "==", 1]], + "columns": ["i"], + "until": "==", + "rows": [{"i": 1}]}]]' > test-ovsdb.log 2>&1 & + +OVS_WAIT_UNTIL([grep "000: i=1" test-ovsdb.log]) + +# Shutdown the other 2 servers so that s3 is disconnected from the cluster. +for i in 2 1; do + OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid]) +done + +# The test-ovsdb should detect the disconnect and retry. +OVS_WAIT_UNTIL([grep disconnect test-ovsdb.log]) + +OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s3], [s3.pid]) + +AT_CLEANUP + + OVS_START_SHELL_HELPERS # ovsdb_cluster_failure_test SCHEMA_FUNC OUTPUT TRANSACTION... ovsdb_cluster_failure_test () { diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c index 187eb28..b1a4be3 100644 --- a/tests/test-ovsdb.c +++ b/tests/test-ovsdb.c @@ -2412,6 +2412,7 @@ do_idl(struct ovs_cmdl_context *ctx) track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track; idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true); + ovsdb_idl_set_leader_only(idl, false); if (ctx->argc > 2) { struct stream *stream; From patchwork Tue Aug 13 16:23:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Zhou X-Patchwork-Id: 1146477 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Z7VKn77G"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 467J326lFTz9sDQ for ; Wed, 14 Aug 2019 02:25:02 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 5AD67CAD; Tue, 13 Aug 2019 16:23:30 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 56413C84 for ; Tue, 13 Aug 2019 16:23:28 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pf1-f196.google.com (mail-pf1-f196.google.com [209.85.210.196]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 64A688D for ; Tue, 13 Aug 2019 16:23:27 +0000 (UTC) Received: by mail-pf1-f196.google.com with SMTP id i30so3176669pfk.9 for ; Tue, 13 Aug 2019 09:23:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=DlEdgzYaD2KM0JOy5v4SmMwG/e7krRSTdSYyIZ00wuY=; b=Z7VKn77Guq0mn7gsaCK6y0UKYPZTdifahsWOcKJlYMOfq0ZiGj94Nb/i+0L4MiMDm7 9M9aVBZkJn+cRBJ7aXC0q6KZen9445ENMTJMbRW3PvYxjEQa9Jl1j/rXLQ72txQU3101 B/DqvGb89eCN4nVDvwMFIVwmFReESmM5Cf1ea17hbthgmQeUHScuo6ZF5LclApda1soO mFZoGv8McucT5jjeWaplsg63qSa8nsxbRkB0IfqTzkwaCXBZtvC7myWTjTdH5zESu+dI Eu7b5QXcfCAp8mdj8qNxQHJW0B48S+hwnS44+dtX82fzpD5wZ5nCZf+xPR3+ojxvAmcc vnLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=DlEdgzYaD2KM0JOy5v4SmMwG/e7krRSTdSYyIZ00wuY=; b=HRDktVkXn8UFrKTbeZwPXn7Rz/AnLLaXrY3YIgY4xz5zPC3pQ6RG8qljFK2p5YwlQt C7W5mt4lcDwZJ/mMcaS0OcEsGlKPz8srnSyFCgbgkASkicO8JGHnfXIO0SIANwGzZ3zS L+RyvUGk/ijMTPb/qJb8FRpxQMb5lGMxCp62cOMWqr+NB8kuSnfKomWPpEmBI4VCx37n DvSjd1RIDl9gp/CESHvZannhYuGIvR+2zI7bwn1J1Cr1JErfyXsdh8Tium9BZqp1/Wrz drZKa/7zrsi+tGPeLmlD0IdJKLt8/6wlpVuwceZ5b0milQ/nwTmyeRnymUmbPz5d/6eQ qrQg== X-Gm-Message-State: APjAAAVFQCv23CZO5pv/E1oWuA64Cn5D/Yzslr11TZ0SBtzNr8t/S33H 8t/ihIfPp2UwOVVRK3pA8wJ6b8F26Tk= X-Google-Smtp-Source: APXvYqwLUzVSB7JtT+ySCoP5HbMy48jhRI8JicYQjJGxfdx9yVOZi+E9rJ3KnuJdLg2NG3ij3990ZQ== X-Received: by 2002:a63:2264:: with SMTP id t36mr33660558pgm.87.1565713406606; Tue, 13 Aug 2019 09:23:26 -0700 (PDT) Received: from localhost.localdomain.localdomain ([73.241.94.255]) by smtp.gmail.com with ESMTPSA id e6sm10727206pfl.37.2019.08.13.09.23.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 13 Aug 2019 09:23:25 -0700 (PDT) From: Han Zhou X-Google-Original-From: Han Zhou To: dev@openvswitch.org Date: Tue, 13 Aug 2019 09:23:21 -0700 Message-Id: <1565713402-5458-3-git-send-email-hzhou8@ebay.com> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1565713402-5458-1-git-send-email-hzhou8@ebay.com> References: <1565713402-5458-1-git-send-email-hzhou8@ebay.com> X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH 3/4] raft.c: Set candidate_retrying if no leader elected since last election. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Han Zhou candiate_retrying is used to determine if the current node is disconnected from the cluster when the node is in candiate role. However, a node can flap between candidate and follower role before a leader is elected when majority of the cluster is down, so is_connected() will flap, too, which confuses clients. This patch avoids the flapping with the help of a new member had_leader, so that if no leader was elected since last election, we know we are still retrying, and keep as disconnected from the cluster. Signed-off-by: Han Zhou --- ovsdb/raft.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/ovsdb/raft.c b/ovsdb/raft.c index 1c38b3b..63c3081 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -286,8 +286,11 @@ struct raft { /* Candidates only. Reinitialized at start of election. */ int n_votes; /* Number of votes for me. */ - bool candidate_retrying; /* The first round of election timed-out and it - is now retrying. */ + bool candidate_retrying; /* The earlier election timed-out and we are + now retrying. */ + bool had_leader; /* There has been leader elected since last + election initiated. This is to help setting + candidate_retrying. */ }; /* All Raft structures. */ @@ -345,6 +348,7 @@ static bool raft_handle_write_error(struct raft *, struct ovsdb_error *); static void raft_run_reconfigure(struct raft *); +static void raft_set_leader(struct raft *, const struct uuid *sid); static struct raft_server * raft_find_server(const struct raft *raft, const struct uuid *sid) { @@ -1616,8 +1620,11 @@ raft_start_election(struct raft *raft, bool leadership_transfer) } ovs_assert(raft->role != RAFT_LEADER); - raft->candidate_retrying = (raft->role == RAFT_CANDIDATE); raft->role = RAFT_CANDIDATE; + /* If there was no leader elected since last election, we know we are + * retrying now. */ + raft->candidate_retrying = !raft->had_leader; + raft->had_leader = false; raft->n_votes = 0; @@ -2450,6 +2457,14 @@ raft_server_init_leader(struct raft *raft, struct raft_server *s) } static void +raft_set_leader(struct raft *raft, const struct uuid *sid) +{ + raft->leader_sid = *sid; + raft->had_leader = true; + raft->candidate_retrying = false; +} + +static void raft_become_leader(struct raft *raft) { log_all_commands(raft); @@ -2461,7 +2476,7 @@ raft_become_leader(struct raft *raft) ovs_assert(raft->role != RAFT_LEADER); raft->role = RAFT_LEADER; - raft->leader_sid = raft->sid; + raft_set_leader(raft, &raft->sid); raft->election_timeout = LLONG_MAX; raft_reset_ping_timer(raft); @@ -2855,7 +2870,7 @@ raft_update_leader(struct raft *raft, const struct uuid *sid) raft_get_nickname(raft, sid, buf, sizeof buf), raft->term); } - raft->leader_sid = *sid; + raft_set_leader(raft, sid); /* Record the leader to the log. This is not used by the algorithm * (although it could be, for quick restart), but it is used for From patchwork Tue Aug 13 16:23:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Han Zhou X-Patchwork-Id: 1146478 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="M2DYusqK"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 467J3n1Gjfz9sDQ for ; Wed, 14 Aug 2019 02:25:41 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id F0CE8CCB; Tue, 13 Aug 2019 16:23:30 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 4F123CAB for ; Tue, 13 Aug 2019 16:23:29 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pf1-f194.google.com (mail-pf1-f194.google.com [209.85.210.194]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 809028D for ; Tue, 13 Aug 2019 16:23:28 +0000 (UTC) Received: by mail-pf1-f194.google.com with SMTP id w2so1070901pfi.3 for ; Tue, 13 Aug 2019 09:23:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WQNKS4z6vWezYhAbhCB3KXXU3TxskAIiJj2H2wE23y8=; b=M2DYusqKdXw2oa9JE68m9nD0i9B1y7DR+MgRdsiy6zx4O/0zC8mDDdoAAAhCUeqWVT eyQ8i+u5Wo6QZqcqrwQm5TH7ewtVH9j5Xx8baFZDdq/Qs5REOM3NrQGpscT1IWM4E1GL JlWHdirT0QFCoqCR53zVK6hTGguZLXMyZNA5Gqxz+VYXtFOHXPmegdrs/6Qwf/ZdwuSY IZtAphs5tjDzmDXe1wUXEiVGAdmlSr/G241A7qZ1fXRGMcHq6brdzC9OFACoCjhjxOlJ KVQDJBKJx1mC7nv0cejg7ihPwATkp6bxqaodATppo3LWp+NSB765f0QPTdl3x9A+yUWP ZK0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WQNKS4z6vWezYhAbhCB3KXXU3TxskAIiJj2H2wE23y8=; b=O5+4XbhqvxSHqgrRFrbV/LapzZ34ZRc6GsyrEOcC2yMECdf/3cpTRMUUZ/ZdrjynmP 3iwDJgvPyiQqdrIR4NXk1pEZcrDVQdR3zoFc9e8NISUSt4J2jcbQFTUKhzIxlp4DrmOJ oD/xTBv8xOYly4C5mqaAubNhCGp/LtQdHcLTSI23AMv8p4P/Kno/vxbEzbiUYF+ZeyDM Q5y0PwwLxA+tIZWTdSymC3ecy0pPTE1FVcjAiMwmXH7Zs9J42kKRoEb7UTyLVuikTL39 eSYAuAlW/nDytPdLDJMxu69P9f4xS0VeC8IPYOwAXIwoYHIGVh1EwVazcM27YIc2Au8g YUeA== X-Gm-Message-State: APjAAAUgqdrN2XqtAEeVkZA6939uA6piqgVD5zJANwA9bna8eZgBYEqO J6Mf9XkaCgIkHHVnM9WMAf6kk97VypI= X-Google-Smtp-Source: APXvYqwctnX1TOPaq5lKLfgVXl0O7WjMKztFMk+Dh3+dQRoqsGcMLFjCzxcowqtk+fMzkKAV/xGhkg== X-Received: by 2002:a63:4a20:: with SMTP id x32mr26807679pga.357.1565713407717; Tue, 13 Aug 2019 09:23:27 -0700 (PDT) Received: from localhost.localdomain.localdomain ([73.241.94.255]) by smtp.gmail.com with ESMTPSA id e6sm10727206pfl.37.2019.08.13.09.23.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 13 Aug 2019 09:23:27 -0700 (PDT) From: Han Zhou X-Google-Original-From: Han Zhou To: dev@openvswitch.org Date: Tue, 13 Aug 2019 09:23:22 -0700 Message-Id: <1565713402-5458-4-git-send-email-hzhou8@ebay.com> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1565713402-5458-1-git-send-email-hzhou8@ebay.com> References: <1565713402-5458-1-git-send-email-hzhou8@ebay.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH 4/4] raft.c: Stale leader should disconnect from cluster. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Han Zhou As mentioned in RAFT paper, section 6.2: Leaders: A server might be in the leader state, but if it isn’t the current leader, it could be needlessly delaying client requests. For example, suppose a leader is partitioned from the rest of the cluster, but it can still communicate with a particular client. Without additional mechanism, it could delay a request from that client forever, being unable to replicate a log entry to any other servers. Meanwhile, there might be another leader of a newer term that is able to communicate with a majority of the cluster and would be able to commit the client’s request. Thus, a leader in Raft steps down if an election timeout elapses without a successful round of heartbeats to a majority of its cluster; this allows clients to retry their requests with another server. Signed-off-by: Han Zhou Reported-by: Aliasgar Ginwala Tested-by: Aliasgar Ginwala --- ovsdb/raft-private.h | 3 ++ ovsdb/raft.c | 42 ++++++++++++++++- tests/ovsdb-cluster.at | 123 +++++++++++++++++++++++++++++-------------------- 3 files changed, 116 insertions(+), 52 deletions(-) diff --git a/ovsdb/raft-private.h b/ovsdb/raft-private.h index 35a3dd7..fb7e6e3 100644 --- a/ovsdb/raft-private.h +++ b/ovsdb/raft-private.h @@ -80,6 +80,9 @@ struct raft_server { uint64_t next_index; /* Index of next log entry to send this server. */ uint64_t match_index; /* Index of max log entry server known to have. */ enum raft_server_phase phase; + bool replied; /* Reply to append_request was received from this + node during current election_timeout interval. + */ /* For use in adding and removing servers: */ struct uuid requester_sid; /* Nonzero if requested via RPC. */ struct unixctl_conn *requester_conn; /* Only if requested via unixctl. */ diff --git a/ovsdb/raft.c b/ovsdb/raft.c index 63c3081..9516a6f 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -1792,7 +1792,43 @@ raft_run(struct raft *raft) } if (!raft->joining && time_msec() >= raft->election_timeout) { - raft_start_election(raft, false); + if (raft->role == RAFT_LEADER) { + /* Check if majority of followers replied, then reset + * election_timeout and reset s->replied. Otherwise, become + * follower. + * + * Raft paper section 6.2: Leaders: A server might be in the leader + * state, but if it isn’t the current leader, it could be + * needlessly delaying client requests. For example, suppose a + * leader is partitioned from the rest of the cluster, but it can + * still communicate with a particular client. Without additional + * mechanism, it could delay a request from that client forever, + * being unable to replicate a log entry to any other servers. + * Meanwhile, there might be another leader of a newer term that is + * able to communicate with a majority of the cluster and would be + * able to commit the client’s request. Thus, a leader in Raft + * steps down if an election timeout elapses without a successful + * round of heartbeats to a majority of its cluster; this allows + * clients to retry their requests with another server. */ + int count = 0; + HMAP_FOR_EACH (server, hmap_node, &raft->servers) { + if (server->replied) { + count ++; + } + } + if (count >= hmap_count(&raft->servers) / 2) { + HMAP_FOR_EACH (server, hmap_node, &raft->servers) { + server->replied = false; + } + raft_reset_election_timer(raft); + } else { + raft_become_follower(raft); + raft_start_election(raft, false); + } + } else { + raft_start_election(raft, false); + } + } if (raft->leaving && time_msec() >= raft->leave_timeout) { @@ -2454,6 +2490,7 @@ raft_server_init_leader(struct raft *raft, struct raft_server *s) s->next_index = raft->log_end; s->match_index = 0; s->phase = RAFT_PHASE_STABLE; + s->replied = false; } static void @@ -2477,7 +2514,7 @@ raft_become_leader(struct raft *raft) ovs_assert(raft->role != RAFT_LEADER); raft->role = RAFT_LEADER; raft_set_leader(raft, &raft->sid); - raft->election_timeout = LLONG_MAX; + raft_reset_election_timer(raft); raft_reset_ping_timer(raft); struct raft_server *s; @@ -3207,6 +3244,7 @@ raft_handle_append_reply(struct raft *raft, } } + s->replied = true; if (rpy->result == RAFT_APPEND_OK) { /* Figure 3.1: "If successful, update nextIndex and matchIndex for * follower (section 3.5)." */ diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at index 6a13843..7146fe6 100644 --- a/tests/ovsdb-cluster.at +++ b/tests/ovsdb-cluster.at @@ -65,59 +65,82 @@ EXECUTION_EXAMPLES AT_BANNER([OVSDB - disconnect from cluster]) -# When a node is disconnected from the cluster, the IDL should disconnect -# and retry even if it uses a single remote, because the remote IP can be -# a VIP on a load-balance. -AT_SETUP([OVSDB cluster - disconnect from cluster, single remote]) -AT_KEYWORDS([ovsdb server negative unix cluster disconnect]) +OVS_START_SHELL_HELPERS +# ovsdb_test_cluster_disconnect LEADER_OR_FOLLOWER +ovsdb_test_cluster_disconnect () { + leader_or_follower=$1 + schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema` + ordinal_schema > schema + AT_CHECK([ovsdb-tool '-vPATTERN:console:%c|%p|%m' create-cluster s1.db $abs_srcdir/idltest.ovsschema unix:s1.raft], [0], [], [stderr]) + cid=`ovsdb-tool db-cid s1.db` + schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema` + for i in `seq 2 3`; do + AT_CHECK([ovsdb-tool join-cluster s$i.db $schema_name unix:s$i.raft unix:s1.raft]) + done + + on_exit 'kill `cat *.pid`' + for i in `seq 3`; do + AT_CHECK([ovsdb-server -v -vconsole:off -vsyslog:off --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb s$i.db]) + done + for i in `seq 3`; do + AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected]) + done + + AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest", + {"op": "insert", + "table": "simple", + "row": {"i": 1}}]]'], [0], [ignore], [ignore]) + + # When a node is disconnected from the cluster, the IDL should disconnect + # and retry even if it uses a single remote, because the remote IP can be + # a VIP on a load-balance. So we use single remote to test here. + if test $leader_or_follower == "leader"; then + target=1 + shutdown="2 3" + else + target=3 + + # shutdown follower before the leader so that there is no chance for s3 + # become leader during the process. + shutdown="2 1" + fi + + # Connect to $target. Use "wait" to trigger a non-op transaction so + # that test-ovsdb will not quit. + + on_exit 'pkill test-ovsdb' + test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -v -t10 idl unix:s$target.ovsdb '[["idltest", + {"op": "wait", + "table": "simple", + "where": [["i", "==", 1]], + "columns": ["i"], + "until": "==", + "rows": [{"i": 1}]}]]' > test-ovsdb.log 2>&1 & -schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema` -ordinal_schema > schema -AT_CHECK([ovsdb-tool '-vPATTERN:console:%c|%p|%m' create-cluster s1.db $abs_srcdir/idltest.ovsschema unix:s1.raft], [0], [], [stderr]) -cid=`ovsdb-tool db-cid s1.db` -schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema` -for i in `seq 2 3`; do - AT_CHECK([ovsdb-tool join-cluster s$i.db $schema_name unix:s$i.raft unix:s1.raft]) -done - -on_exit 'kill `cat *.pid`' -for i in `seq 3`; do - AT_CHECK([ovsdb-server -v -vconsole:off -vsyslog:off --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb s$i.db]) -done -for i in `seq 3`; do - AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected]) -done - -AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest", - {"op": "insert", - "table": "simple", - "row": {"i": 1}}]]'], [0], [ignore], [ignore]) - -# Connect to a single remote s3. Use "wait" to trigger a non-op transaction so -# that test-ovsdb will not quit. - -on_exit 'pkill test-ovsdb' -test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -v -t10 idl unix:s3.ovsdb '[["idltest", - {"op": "wait", - "table": "simple", - "where": [["i", "==", 1]], - "columns": ["i"], - "until": "==", - "rows": [{"i": 1}]}]]' > test-ovsdb.log 2>&1 & - -OVS_WAIT_UNTIL([grep "000: i=1" test-ovsdb.log]) - -# Shutdown the other 2 servers so that s3 is disconnected from the cluster. -for i in 2 1; do - OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid]) -done - -# The test-ovsdb should detect the disconnect and retry. -OVS_WAIT_UNTIL([grep disconnect test-ovsdb.log]) - -OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s3], [s3.pid]) + OVS_WAIT_UNTIL([grep "000: i=1" test-ovsdb.log]) + # Shutdown the other servers so that $target is disconnected from the cluster. + for i in $shutdown; do + OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid]) + done + + # The test-ovsdb should detect the disconnect and retry. + OVS_WAIT_UNTIL([grep disconnect test-ovsdb.log]) + + OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$target], [s$target.pid]) +} +OVS_END_SHELL_HELPERS + +AT_SETUP([OVSDB cluster - follower disconnect from cluster, single remote]) +AT_KEYWORDS([ovsdb server negative unix cluster disconnect]) +ovsdb_test_cluster_disconnect follower +AT_CLEANUP + +AT_SETUP([OVSDB cluster - leader disconnect from cluster, single remote]) +AT_KEYWORDS([ovsdb server negative unix cluster disconnect]) +ovsdb_test_cluster_disconnect leader AT_CLEANUP + OVS_START_SHELL_HELPERS