From patchwork Tue Mar 19 02:38:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Zhou X-Patchwork-Id: 1058179 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="EjDZ45DT"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44Ncjq3qJRz9s70 for ; Tue, 19 Mar 2019 13:40:46 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 566873032; Tue, 19 Mar 2019 02:40:44 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 245F23184 for ; Tue, 19 Mar 2019 02:38:54 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 0A066148 for ; Tue, 19 Mar 2019 02:38:52 +0000 (UTC) Received: by mail-pg1-f169.google.com with SMTP id h34so12743525pgh.11 for ; Mon, 18 Mar 2019 19:38:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=k29ucQ33iAIYCMSKpgqo3d7O7APuzrCM6gYlYmjftHs=; b=EjDZ45DTFIguLveNoguIv6wGgjZYpVNTk5Ys8VYztY5u7vBf8AuoPFFp6Rq9KTBOdT txzxTMwuloIl9gCQnTZPsjJAvORbqCDs6LR7veE53BNAEqkAH237F2O5AJzg7E+8oUxw dRFcc6qSdygDS8N8l77gzXSVcR5sBs2XmdIrGcl/EgVk7Zw5/xLPlbFVEp4cda0GG5QH MPdexIwYe8HpcxV2aCMJAhbs7mjZOw0Q1LjWF6tMOfXtLXI+M63xW2tmkMEK6tRgZLRT dqMdpKyLGmul80t6KbIV93w9yj0S4aJMHc47jCnhr6FIPOf6vgWFGUA51tiHk/u5WuhS pzjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=k29ucQ33iAIYCMSKpgqo3d7O7APuzrCM6gYlYmjftHs=; b=MnkteGtfzBtDgzWsJuZwB4ApzHiEAdFLBKETYj8lsNkLQQqYm0/5f/Q7vHFg2FF/ov o5LAmX0m/ZMpqfTN1b0KfNXzuarolm8YjcfYdEG5887XxI166VslqAFQI8dEr2s+xC26 Yhi/xIOxHxhhXVQauMqhiIyHMh9H7NqSRZSHbHe9VpdRC34yPdrZhaBdno3y7kWTEjjT GFfVEL01hAW7ysBBBDUIa2R7nQD2B1Y6dWyH+WeepWabMvu2m41kaAhmoJUFhteE7f1t wBQdQ196rUTM5bSYyduK/JfGyFfCAYzsPvAbZkBG1a53lRf1YrNhudez/xm8zQmv8xWX 0k9Q== X-Gm-Message-State: APjAAAU0ZpCUlq5iI1P/Z+weP40UJBYmRTSprUwYw/dqld0A4W0+Ys8i wFW69TYwUhBeufY8dBsFHegRrqY/ X-Google-Smtp-Source: APXvYqwyAa0wGYYRq0pO4QVSAVCLbPnNs3D+Oof5MoQL8Z1rn8RtiZkii2Rbvok8UigAtr8s7ux/gA== X-Received: by 2002:a17:902:2ba8:: with SMTP id l37mr22173507plb.17.1552963132194; Mon, 18 Mar 2019 19:38:52 -0700 (PDT) Received: from localhost.localdomain.localdomain ([216.113.160.77]) by smtp.gmail.com with ESMTPSA id h15sm12899677pgd.12.2019.03.18.19.38.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 Mar 2019 19:38:51 -0700 (PDT) From: Han Zhou X-Google-Original-From: Han Zhou To: dev@openvswitch.org Date: Mon, 18 Mar 2019 19:38:48 -0700 Message-Id: <1552963128-46361-1-git-send-email-hzhou8@ebay.com> X-Mailer: git-send-email 2.1.0 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH] ovsdb raft: Configurable leader timeout. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Han Zhou Add option --leader-timeout to ovsdb-server. Signed-off-by: Han Zhou --- Notes: This patch is on top of: https://mail.openvswitch.org/pipermail/ovs-dev/2019-March/357339.html ovsdb/ovsdb-server.1.in | 12 ++++++++++++ ovsdb/ovsdb-server.c | 23 +++++++++++++++++++---- ovsdb/raft.c | 31 ++++++++++++++++++------------- ovsdb/raft.h | 3 ++- ovsdb/storage.c | 18 ++++++++++++++---- ovsdb/storage.h | 2 ++ 6 files changed, 67 insertions(+), 22 deletions(-) diff --git a/ovsdb/ovsdb-server.1.in b/ovsdb/ovsdb-server.1.in index 9f78e87..5bacb12 100644 --- a/ovsdb/ovsdb-server.1.in +++ b/ovsdb/ovsdb-server.1.in @@ -19,6 +19,8 @@ ovsdb\-server \- Open vSwitch database server [\fB\-\-sync\-from=\fIserver\fR] [\fB\-\-sync\-exclude-tables=\fIdb\fB:\fItable\fR[\fB,\fIdb\fB:\fItable\fR]...\fR] [\fB\-\-active\fR] +.IP "Clustered mode options:" +[\fB\-\-leader\-timeout=\fItimeout\fR] .so lib/ssl-syn.man .so lib/ssl-bootstrap-syn.man .so lib/ssl-peer-ca-cert-syn.man @@ -171,6 +173,16 @@ allow the syncing options to be specified using command line options, yet start the server, as the default, active server. To switch the running server to backup mode, use \fBovs-appctl(1)\fR to execute the \fBovsdb\-server/connect\-active\-ovsdb\-server\fR command. +.SS "Clustered Mode Options" +These options apply only to databases working in clustered mode: +.TP +\fB\-\-leader\-timeout=\fIserver\fR +Set the cluster leader election timeout value in miniseconds. This option +should be set to the same value across all nodes in the cluster. Leader +re-election will be triggered if a follower haven't heard from current leader +within this interval. The default timeout is one second. Increasing this +value reduces the chance of leader re-election during transient overload +situations but increases the delay of reacting to real failures, too. .SS "Public Key Infrastructure Options" The options described below for configuring the SSL public key infrastructure accept a special syntax for obtaining their diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c index 9dc1d57..80f6f96 100644 --- a/ovsdb/ovsdb-server.c +++ b/ovsdb/ovsdb-server.c @@ -97,6 +97,7 @@ struct server_config { char **sync_from; char **sync_exclude; bool *is_backup; + unsigned int *leader_timeout; struct ovsdb_jsonrpc_server *jsonrpc; }; static unixctl_cb_func ovsdb_server_add_remote; @@ -119,7 +120,7 @@ static void parse_options(int argc, char *argvp[], struct sset *db_filenames, struct sset *remotes, char **unixctl_pathp, char **run_command, char **sync_from, char **sync_exclude, - bool *is_backup); + bool *is_backup, unsigned int *leader_timeout); OVS_NO_RETURN static void usage(void); static char *reconfigure_remotes(struct ovsdb_jsonrpc_server *, @@ -312,8 +313,10 @@ main(int argc, char *argv[]) process_init(); bool active = false; + unsigned int leader_timeout = 0; parse_options(argc, argv, &db_filenames, &remotes, &unixctl_path, - &run_command, &sync_from, &sync_exclude, &active); + &run_command, &sync_from, &sync_exclude, &active, + &leader_timeout); is_backup = sync_from && !active; daemon_become_new_user(false); @@ -351,6 +354,7 @@ main(int argc, char *argv[]) server_config.sync_from = &sync_from; server_config.sync_exclude = &sync_exclude; server_config.is_backup = &is_backup; + server_config.leader_timeout = &leader_timeout; perf_counters_init(); @@ -639,7 +643,8 @@ open_db(struct server_config *config, const char *filename) struct ovsdb_storage *storage; struct ovsdb_error *error; - error = ovsdb_storage_open(filename, true, &storage); + error = ovsdb_storage_open(filename, true, *config->leader_timeout, + &storage); if (error) { return error; } @@ -1664,7 +1669,8 @@ static void parse_options(int argc, char *argv[], struct sset *db_filenames, struct sset *remotes, char **unixctl_pathp, char **run_command, - char **sync_from, char **sync_exclude, bool *active) + char **sync_from, char **sync_exclude, bool *active, + unsigned int *leader_timeout) { enum { OPT_REMOTE = UCHAR_MAX + 1, @@ -1675,6 +1681,7 @@ parse_options(int argc, char *argv[], OPT_SYNC_FROM, OPT_SYNC_EXCLUDE, OPT_ACTIVE, + OPT_LEADER_TIMEOUT, OPT_NO_DBS, VLOG_OPTION_ENUMS, DAEMON_OPTION_ENUMS, @@ -1697,6 +1704,7 @@ parse_options(int argc, char *argv[], {"sync-from", required_argument, NULL, OPT_SYNC_FROM}, {"sync-exclude-tables", required_argument, NULL, OPT_SYNC_EXCLUDE}, {"active", no_argument, NULL, OPT_ACTIVE}, + {"leader-timeout", required_argument, NULL, OPT_LEADER_TIMEOUT}, {"no-dbs", no_argument, NULL, OPT_NO_DBS}, {NULL, 0, NULL, 0}, }; @@ -1784,6 +1792,12 @@ parse_options(int argc, char *argv[], *active = true; break; + case OPT_LEADER_TIMEOUT: + if (!str_to_uint(optarg, 10, leader_timeout)) { + ovs_fatal(0, "leader-timeout must be a non-negative integer."); + } + break; + case OPT_NO_DBS: add_default_db = false; break; @@ -1822,6 +1836,7 @@ usage(void) daemon_usage(); vlog_usage(); replication_usage(); + storage_usage(); printf("\nOther options:\n" " --run COMMAND run COMMAND as subprocess then exit\n" " --unixctl=SOCKET override default control socket name\n" diff --git a/ovsdb/raft.c b/ovsdb/raft.c index 31e9e72..a789626 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -238,8 +238,11 @@ struct raft { uint64_t last_applied; /* Max log index applied to state machine. */ struct uuid leader_sid; /* Server ID of leader (zero, if unknown). */ +#define ELECTION_DEFAULT 1024 /* Default value of election_interval. */ + unsigned int election_interval; /* election_interval + + random(ELECTION_RANGE_MSEC) is the time + to wait before starting an election. */ /* Followers and candidates only. */ -#define ELECTION_BASE_MSEC 1024 #define ELECTION_RANGE_MSEC 1024 long long int election_base; /* Time of last heartbeat from leader. */ long long int election_timeout; /* Time at which we start an election. */ @@ -269,7 +272,6 @@ struct raft { struct hmap add_servers; /* Contains "struct raft_server"s to add. */ struct raft_server *remove_server; /* Server being removed. */ struct hmap commands; /* Contains "struct raft_command"s. */ -#define PING_TIME_MSEC (ELECTION_BASE_MSEC / 3) long long int ping_timeout; /* Time at which to send a heartbeat */ /* Candidates only. Reinitialized at start of election. */ @@ -360,7 +362,7 @@ raft_make_address_passive(const char *address_) } static struct raft * -raft_alloc(void) +raft_alloc(unsigned int election_interval) { raft_init(); @@ -377,6 +379,8 @@ raft_alloc(void) hmap_init(&raft->add_servers); hmap_init(&raft->commands); + raft->election_interval = election_interval ? election_interval : + ELECTION_DEFAULT; raft_reset_ping_timer(raft); raft_reset_election_timer(raft); @@ -541,7 +545,7 @@ raft_join_cluster(const char *file_name, struct ovsdb_error * OVS_WARN_UNUSED_RESULT raft_read_metadata(struct ovsdb_log *log, struct raft_metadata *md) { - struct raft *raft = raft_alloc(); + struct raft *raft = raft_alloc(0); raft->log = log; struct ovsdb_error *error = raft_read_header(raft); @@ -868,7 +872,7 @@ raft_read_log(struct raft *raft) static void raft_reset_election_timer(struct raft *raft) { - unsigned int duration = (ELECTION_BASE_MSEC + unsigned int duration = (raft->election_interval + random_range(ELECTION_RANGE_MSEC)); raft->election_base = time_msec(); raft->election_timeout = raft->election_base + duration; @@ -877,7 +881,7 @@ raft_reset_election_timer(struct raft *raft) static void raft_reset_ping_timer(struct raft *raft) { - raft->ping_timeout = time_msec() + PING_TIME_MSEC; + raft->ping_timeout = time_msec() + raft->election_interval / 3; } static void @@ -900,9 +904,10 @@ raft_add_conn(struct raft *raft, struct jsonrpc_session *js, * the cluster's log in 'file_name'. Takes ownership of 'log', whether * successful or not. */ struct ovsdb_error * OVS_WARN_UNUSED_RESULT -raft_open(struct ovsdb_log *log, struct raft **raftp) +raft_open(struct ovsdb_log *log, unsigned int election_interval, + struct raft **raftp) { - struct raft *raft = raft_alloc(); + struct raft *raft = raft_alloc(election_interval); raft->log = log; struct ovsdb_error *error = raft_read_header(raft); @@ -1113,7 +1118,7 @@ raft_send_remove_server_requests(struct raft *raft) } } - raft->leave_timeout = time_msec() + ELECTION_BASE_MSEC; + raft->leave_timeout = time_msec() + raft->election_interval; } /* Attempts to start 'raft' leaving its cluster. The caller can check progress @@ -1130,7 +1135,7 @@ raft_leave(struct raft *raft) raft_transfer_leadership(raft, "this server is leaving the cluster"); raft_become_follower(raft); raft_send_remove_server_requests(raft); - raft->leave_timeout = time_msec() + ELECTION_BASE_MSEC; + raft->leave_timeout = time_msec() + raft->election_interval; } /* Returns true if 'raft' is currently attempting to leave its cluster. */ @@ -1785,7 +1790,7 @@ raft_run(struct raft *raft) struct raft_command *cmd, *next_cmd; HMAP_FOR_EACH_SAFE (cmd, next_cmd, hmap_node, &raft->commands) { if (cmd->timestamp - && now - cmd->timestamp > ELECTION_BASE_MSEC) { + && now - cmd->timestamp > raft->election_interval) { raft_command_complete(raft, cmd, RAFT_CMD_TIMEOUT); } } @@ -3231,10 +3236,10 @@ raft_should_suppress_disruptive_server(struct raft *raft, return true; case RAFT_FOLLOWER: - if (now < raft->election_base + ELECTION_BASE_MSEC) { + if (now < raft->election_base + raft->election_interval) { VLOG_WARN_RL(&rl, "ignoring vote request received after only " "%lld ms (minimum election time is %d ms)", - now - raft->election_base, ELECTION_BASE_MSEC); + now - raft->election_base, raft->election_interval); return true; } return false; diff --git a/ovsdb/raft.h b/ovsdb/raft.h index 3d44899..6b719ba 100644 --- a/ovsdb/raft.h +++ b/ovsdb/raft.h @@ -100,7 +100,8 @@ struct ovsdb_error *raft_read_metadata(struct ovsdb_log *, void raft_metadata_destroy(struct raft_metadata *); /* Starting up or shutting down a server within a cluster. */ -struct ovsdb_error *raft_open(struct ovsdb_log *, struct raft **) +struct ovsdb_error *raft_open(struct ovsdb_log *, + unsigned int election_interval, struct raft **) OVS_WARN_UNUSED_RESULT; void raft_close(struct raft *); diff --git a/ovsdb/storage.c b/ovsdb/storage.c index e26252b..b9ad401 100644 --- a/ovsdb/storage.c +++ b/ovsdb/storage.c @@ -58,6 +58,7 @@ static void schedule_next_snapshot(struct ovsdb_storage *, bool quick); static struct ovsdb_error * OVS_WARN_UNUSED_RESULT ovsdb_storage_open__(const char *filename, bool rw, bool allow_clustered, + unsigned int leader_timeout, struct ovsdb_storage **storagep) { *storagep = NULL; @@ -78,7 +79,7 @@ ovsdb_storage_open__(const char *filename, bool rw, bool allow_clustered, return ovsdb_error(NULL, "%s: cannot apply this operation to " "clustered database file", filename); } - error = raft_open(log, &raft); + error = raft_open(log, leader_timeout, &raft); log = NULL; if (error) { return error; @@ -101,10 +102,10 @@ ovsdb_storage_open__(const char *filename, bool rw, bool allow_clustered, * The returned storage might be clustered or standalone, depending on what the * disk file contains. */ struct ovsdb_error * OVS_WARN_UNUSED_RESULT -ovsdb_storage_open(const char *filename, bool rw, +ovsdb_storage_open(const char *filename, bool rw, unsigned int leader_timeout, struct ovsdb_storage **storagep) { - return ovsdb_storage_open__(filename, rw, true, storagep); + return ovsdb_storage_open__(filename, rw, true, leader_timeout, storagep); } struct ovsdb_storage * @@ -112,7 +113,7 @@ ovsdb_storage_open_standalone(const char *filename, bool rw) { struct ovsdb_storage *storage; struct ovsdb_error *error = ovsdb_storage_open__(filename, rw, false, - &storage); + 0, &storage); if (error) { ovs_fatal(0, "%s", ovsdb_error_to_string_free(error)); } @@ -610,3 +611,12 @@ ovsdb_storage_peek_last_eid(struct ovsdb_storage *storage) } return raft_current_eid(storage->raft); } + +void +storage_usage(void) +{ + printf("\n\ +Clustered mode options:\n\ + --leader-timeout=TIMEOUT\n\ + set cluster leader election timeout in ms.\n"); +} diff --git a/ovsdb/storage.h b/ovsdb/storage.h index 8a9bbab..06cb695 100644 --- a/ovsdb/storage.h +++ b/ovsdb/storage.h @@ -26,6 +26,7 @@ struct ovsdb_storage; struct uuid; struct ovsdb_error *ovsdb_storage_open(const char *filename, bool rw, + unsigned int leader_timeout, struct ovsdb_storage **) OVS_WARN_UNUSED_RESULT; struct ovsdb_storage *ovsdb_storage_create_unbacked(void); @@ -93,4 +94,5 @@ struct ovsdb_schema *ovsdb_storage_read_schema(struct ovsdb_storage *); const struct uuid *ovsdb_storage_peek_last_eid(struct ovsdb_storage *); +void storage_usage(void); #endif /* ovsdb/storage.h */