From patchwork Mon Oct 26 01:42:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 1387416 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=fraxinus.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ovn.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CKHfX4DHlz9sSs for ; Mon, 26 Oct 2020 12:43:16 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 838C786911; Mon, 26 Oct 2020 01:43:14 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id djr6ffajIiSd; Mon, 26 Oct 2020 01:43:13 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by fraxinus.osuosl.org (Postfix) with ESMTP id E833085C9F; Mon, 26 Oct 2020 01:43:13 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id B94EBC088B; Mon, 26 Oct 2020 01:43:13 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 83D9AC0051 for ; Mon, 26 Oct 2020 01:43:12 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 6A3AF86767 for ; Mon, 26 Oct 2020 01:43:12 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id y1DhQBxRBobL for ; Mon, 26 Oct 2020 01:43:11 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from relay1-d.mail.gandi.net (relay1-d.mail.gandi.net [217.70.183.193]) by whitealder.osuosl.org (Postfix) with ESMTPS id 55BAF86765 for ; Mon, 26 Oct 2020 01:43:11 +0000 (UTC) X-Originating-IP: 78.45.89.65 Received: from im-t490s.redhat.com (ip-78-45-89-65.net.upcbroadband.cz [78.45.89.65]) (Authenticated sender: i.maximets@ovn.org) by relay1-d.mail.gandi.net (Postfix) with ESMTPSA id A5693240008; Mon, 26 Oct 2020 01:43:08 +0000 (UTC) From: Ilya Maximets To: ovs-dev@openvswitch.org Date: Mon, 26 Oct 2020 02:42:53 +0100 Message-Id: <20201026014257.215501-2-i.maximets@ovn.org> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20201026014257.215501-1-i.maximets@ovn.org> References: <20201026014257.215501-1-i.maximets@ovn.org> MIME-Version: 1.0 Cc: Han Zhou , Ilya Maximets Subject: [ovs-dev] [PATCH 1/5] raft: Add log length to the memory report. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" In many cases a big part of a memory consumed by ovsdb-server process is a raft log, so it's important to add its length to the memory report. Signed-off-by: Ilya Maximets Acked-by: Dumitru Ceara log_end - raft->log_start); } /* Returns true if 'raft' has completed joining its cluster, has not left or From patchwork Mon Oct 26 01:42:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 1387419 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=silver.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ovn.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CKHfq5gr9z9sTK for ; Mon, 26 Oct 2020 12:43:31 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 2AC1F2C35C; Mon, 26 Oct 2020 01:43:30 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VbBXyFpysjsj; Mon, 26 Oct 2020 01:43:20 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by silver.osuosl.org (Postfix) with ESMTP id 22FD627515; Mon, 26 Oct 2020 01:43:20 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id D8D2AC088B; Mon, 26 Oct 2020 01:43:19 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 51444C0051 for ; Mon, 26 Oct 2020 01:43:18 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 24EF28723C for ; Mon, 26 Oct 2020 01:43:18 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xx6iTxBomurq for ; Mon, 26 Oct 2020 01:43:15 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from relay1-d.mail.gandi.net (relay1-d.mail.gandi.net [217.70.183.193]) by hemlock.osuosl.org (Postfix) with ESMTPS id A3EEB87227 for ; Mon, 26 Oct 2020 01:43:14 +0000 (UTC) X-Originating-IP: 78.45.89.65 Received: from im-t490s.redhat.com (ip-78-45-89-65.net.upcbroadband.cz [78.45.89.65]) (Authenticated sender: i.maximets@ovn.org) by relay1-d.mail.gandi.net (Postfix) with ESMTPSA id DBC12240009; Mon, 26 Oct 2020 01:43:11 +0000 (UTC) From: Ilya Maximets To: ovs-dev@openvswitch.org Date: Mon, 26 Oct 2020 02:42:54 +0100 Message-Id: <20201026014257.215501-3-i.maximets@ovn.org> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20201026014257.215501-1-i.maximets@ovn.org> References: <20201026014257.215501-1-i.maximets@ovn.org> MIME-Version: 1.0 Cc: Han Zhou , Ilya Maximets Subject: [ovs-dev] [PATCH 2/5] ovsdb-server: Reclaim heap memory after compaction. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Compaction happens at most once in 10 minutes. That is a big time interval for a heavy loaded ovsdb-server in cluster mode. In 10 minutes raft logs could grow up to tens of thousands of entries with tens of gigabytes in total size. While compaction cleans up raft log entries, the memory in many cases is not returned to the system, but kept in the heap of running ovsdb-server process, and it could stay in this condition for a really long time. In the end one performance spike could lead to a fast growth of the raft log and this memory will never (for a really long time) be released to the system even if the database if empty. Simple example how to reproduce with OVN sandbox: 1. make sandbox SANDBOXFLAGS='--nbdb-model=clustered --sbdb-model=clustered' 2. Run following script that creates 1 port group, adds 4000 acls and removes all of that in the end: # cat ../memory-test.sh pg_name=my_port_group export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach --log-file -vsocket_util:off) ovn-nbctl pg-add $pg_name for i in $(seq 1 4000); do echo "Iteration: $i" ovn-nbctl --log acl-add $pg_name from-lport $i udp drop done ovn-nbctl acl-del $pg_name ovn-nbctl pg-del $pg_name ovs-appctl -t $(pwd)/sandbox/nb1 memory/show ovn-appctl -t ovn-nbctl exit --- 3. Stopping one of Northbound DB servers: ovs-appctl -t $(pwd)/sandbox/nb1 exit Make sure that ovsdb-server didn't compact the database before it was stopped. Now we have a db file on disk that contains 4000 fairly big transactions inside. 4. Trying to start same ovsdb-server with this file. # cd sandbox && ovsdb-server <...> nb1.db At this point ovsdb-server reads all the transactions from db file and performs all of them as fast as it can one by one. When it finishes this, raft log contains 4000 entries and ovsdb-server consumes (on my system) ~13GB of memory while database is empty. And libc will likely never return this memory back to system, or, at least, will hold it for a really long time. This patch adds a new command 'ovsdb-server/memory-trim-on-compaction'. It's disabled by default, but once enabled, ovsdb-server will call 'malloc_trim(0)' after every successful compaction to try to return unused heap memory back to system. This is glibc-specific, so we need to detect function availability in a build time. Disabled by default since it adds from 1% to 30% (depending on the current state) to the snapshot creation time and, also, next memory allocations will likely require requests to kernel and that might be slower. Could be enabled by default later if considered broadly beneficial. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1888829 Signed-off-by: Ilya Maximets Acked-by: Dumitru Ceara --- NEWS | 3 +++ configure.ac | 1 + ovsdb/ovsdb-server.1.in | 4 ++++ ovsdb/ovsdb-server.c | 41 +++++++++++++++++++++++++++++++++++++++-- ovsdb/ovsdb.c | 12 +++++++++++- ovsdb/ovsdb.h | 3 ++- 6 files changed, 60 insertions(+), 4 deletions(-) diff --git a/NEWS b/NEWS index 8bb5bdc3f..2860a8e9c 100644 --- a/NEWS +++ b/NEWS @@ -3,6 +3,9 @@ Post-v2.14.0 - OVSDB: * New unixctl command 'ovsdb-server/get-db-storage-status' to show the status of the storage that's backing a database. + * New unixctl command 'ovsdb-server/memory-trim-on-compaction on|off'. + If turned on, ovsdb-server will try to reclaim all the unused memory + after every DB compaction back to OS. Disabled by default. - DPDK: * Removed support for vhost-user dequeue zero-copy. - The environment variable OVS_UNBOUND_CONF, if set, is now used diff --git a/configure.ac b/configure.ac index 8d37af9db..126a1d9d1 100644 --- a/configure.ac +++ b/configure.ac @@ -100,6 +100,7 @@ OVS_CHECK_IF_DL OVS_CHECK_STRTOK_R OVS_CHECK_LINUX_AF_XDP AC_CHECK_DECLS([sys_siglist], [], [], [[#include ]]) +AC_CHECK_DECLS([malloc_trim], [], [], [[#include ]]) AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec], [], [], [[#include ]]) AC_CHECK_MEMBERS([struct ifreq.ifr_flagshigh], [], [], [[#include ]]) diff --git a/ovsdb/ovsdb-server.1.in b/ovsdb/ovsdb-server.1.in index 6667553df..07a36cc7d 100644 --- a/ovsdb/ovsdb-server.1.in +++ b/ovsdb/ovsdb-server.1.in @@ -206,6 +206,10 @@ but not before 100 commits have been added or 10 minutes have elapsed since the last compaction. It will also be compacted automatically after 24 hours since the last compaction if 100 commits were added regardless of its size. +.IP "\fBovsdb\-server/memory-trim-on-compaction\fR \fIon\fR|\fIoff\fR" +If this option is \fIon\fR, ovsdb-server will try to reclaim all unused +heap memory back to the system after each successful database compaction +to reduce the memory consumption of the process. \fIoff\fR by default. . .IP "\fBovsdb\-server/reconnect\fR" Makes \fBovsdb\-server\fR drop all of the JSON\-RPC diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c index 73a155b3f..6ebe5d720 100644 --- a/ovsdb/ovsdb-server.c +++ b/ovsdb/ovsdb-server.c @@ -76,8 +76,12 @@ static char *ssl_protocols; static char *ssl_ciphers; static bool bootstrap_ca_cert; +/* Try to reclaim heap memory back to system after DB compaction. */ +static bool trim_memory = false; + static unixctl_cb_func ovsdb_server_exit; static unixctl_cb_func ovsdb_server_compact; +static unixctl_cb_func ovsdb_server_memory_trim_on_compaction; static unixctl_cb_func ovsdb_server_reconnect; static unixctl_cb_func ovsdb_server_perf_counters_clear; static unixctl_cb_func ovsdb_server_perf_counters_show; @@ -243,7 +247,7 @@ main_loop(struct server_config *config, xasprintf("removing database %s because storage " "disconnected permanently", node->name)); } else if (ovsdb_storage_should_snapshot(db->db->storage)) { - log_and_free_error(ovsdb_snapshot(db->db)); + log_and_free_error(ovsdb_snapshot(db->db, trim_memory)); } } if (run_process) { @@ -410,6 +414,9 @@ main(int argc, char *argv[]) unixctl_command_register("exit", "", 0, 0, ovsdb_server_exit, &exiting); unixctl_command_register("ovsdb-server/compact", "", 0, 1, ovsdb_server_compact, &all_dbs); + unixctl_command_register("ovsdb-server/memory-trim-on-compaction", + "on|off", 1, 1, + ovsdb_server_memory_trim_on_compaction, NULL); unixctl_command_register("ovsdb-server/reconnect", "", 0, 0, ovsdb_server_reconnect, jsonrpc); @@ -1492,7 +1499,8 @@ ovsdb_server_compact(struct unixctl_conn *conn, int argc, VLOG_INFO("compacting %s database by user request", node->name); - struct ovsdb_error *error = ovsdb_snapshot(db->db); + struct ovsdb_error *error = ovsdb_snapshot(db->db, + trim_memory); if (error) { char *s = ovsdb_error_to_string(error); ds_put_format(&reply, "%s\n", s); @@ -1515,6 +1523,35 @@ ovsdb_server_compact(struct unixctl_conn *conn, int argc, ds_destroy(&reply); } +/* "ovsdb-server/memory-trim-on-compaction": controls whether ovsdb-server + * tries to reclaim heap memory back to system using malloc_trim() after + * compaction. */ +static void +ovsdb_server_memory_trim_on_compaction(struct unixctl_conn *conn, + int argc OVS_UNUSED, + const char *argv[], + void *arg OVS_UNUSED) +{ + const char *command = argv[1]; + +#ifndef HAVE_DECL_MALLOC_TRIM + unixctl_command_reply_error(conn, "memory trimming is not supported"); + return; +#endif + + if (!strcmp(command, "on")) { + trim_memory = true; + } else if (!strcmp(command, "off")) { + trim_memory = false; + } else { + unixctl_command_reply_error(conn, "invalid argument"); + return; + } + VLOG_INFO("memory trimming after compaction %s.", + trim_memory ? "enabled" : "disabled"); + unixctl_command_reply(conn, NULL); +} + /* "ovsdb-server/reconnect": makes ovsdb-server drop all of its JSON-RPC * connections and reconnect. */ static void diff --git a/ovsdb/ovsdb.c b/ovsdb/ovsdb.c index 2da117cb3..cc05d6e2b 100644 --- a/ovsdb/ovsdb.c +++ b/ovsdb/ovsdb.c @@ -17,6 +17,10 @@ #include "ovsdb.h" +#ifdef HAVE_DECL_MALLOC_TRIM +#include +#endif + #include "column.h" #include "file.h" #include "monitor.h" @@ -515,7 +519,7 @@ ovsdb_get_table(const struct ovsdb *db, const char *name) } struct ovsdb_error * OVS_WARN_UNUSED_RESULT -ovsdb_snapshot(struct ovsdb *db) +ovsdb_snapshot(struct ovsdb *db, bool trim_memory OVS_UNUSED) { if (!db->storage) { return NULL; @@ -527,6 +531,12 @@ ovsdb_snapshot(struct ovsdb *db) schema, data); json_destroy(schema); json_destroy(data); + +#ifdef HAVE_DECL_MALLOC_TRIM + if (!error && trim_memory) { + malloc_trim(0); + } +#endif return error; } diff --git a/ovsdb/ovsdb.h b/ovsdb/ovsdb.h index 5c30a83d9..72e127c84 100644 --- a/ovsdb/ovsdb.h +++ b/ovsdb/ovsdb.h @@ -112,7 +112,8 @@ struct json *ovsdb_execute(struct ovsdb *, const struct ovsdb_session *, long long int elapsed_msec, long long int *timeout_msec); -struct ovsdb_error *ovsdb_snapshot(struct ovsdb *) OVS_WARN_UNUSED_RESULT; +struct ovsdb_error *ovsdb_snapshot(struct ovsdb *, bool trim_memory) + OVS_WARN_UNUSED_RESULT; void ovsdb_replace(struct ovsdb *dst, struct ovsdb *src); From patchwork Mon Oct 26 01:42:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 1387421 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=silver.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ovn.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CKHfx3Kh1z9sSs for ; Mon, 26 Oct 2020 12:43:37 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 2F38327A6E; Mon, 26 Oct 2020 01:43:35 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7B9wiwl71uUx; Mon, 26 Oct 2020 01:43:25 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by silver.osuosl.org (Postfix) with ESMTP id 5AC222A128; Mon, 26 Oct 2020 01:43:22 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 4A103C1AD5; Mon, 26 Oct 2020 01:43:22 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id B4B0AC088B for ; Mon, 26 Oct 2020 01:43:19 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id A390D86A7F for ; Mon, 26 Oct 2020 01:43:19 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dLnBLJzUikxa for ; Mon, 26 Oct 2020 01:43:18 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from relay1-d.mail.gandi.net (relay1-d.mail.gandi.net [217.70.183.193]) by fraxinus.osuosl.org (Postfix) with ESMTPS id D591386A11 for ; Mon, 26 Oct 2020 01:43:17 +0000 (UTC) X-Originating-IP: 78.45.89.65 Received: from im-t490s.redhat.com (ip-78-45-89-65.net.upcbroadband.cz [78.45.89.65]) (Authenticated sender: i.maximets@ovn.org) by relay1-d.mail.gandi.net (Postfix) with ESMTPSA id 09ACE240002; Mon, 26 Oct 2020 01:43:14 +0000 (UTC) From: Ilya Maximets To: ovs-dev@openvswitch.org Date: Mon, 26 Oct 2020 02:42:55 +0100 Message-Id: <20201026014257.215501-4-i.maximets@ovn.org> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20201026014257.215501-1-i.maximets@ovn.org> References: <20201026014257.215501-1-i.maximets@ovn.org> MIME-Version: 1.0 Cc: Han Zhou , Ilya Maximets Subject: [ovs-dev] [PATCH 3/5] raft: Set threshold on backlog for raft connections. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" RAFT messages could be fairly big. If something abnormal happens to one of the servers in a cluster it may not be able to process all the incoming messages in a timely manner. This results in jsonrpc backlog growth on the sender's side. For example if follower gets many new clients at once that it needs to serve, or it decides to take a snapshot in a period of high number of database changes. If backlog grows large enough it becomes harder and harder for follower to process incoming raft messages, it sends outdated replies and starts receiving snapshots and the whole raft log from the leader. Sometimes backlog grows too high (60GB in this example): jsonrpc|INFO|excessive sending backlog, jsonrpc: ssl:, num of msgs: 15370, backlog: 61731060773. In this case OS might actually decide to kill the sender to free some memory. Anyway, It could take a lot of time for such a server to catch up with the rest of the cluster if it has so much data to receive and process. Introducing backlog thresholds for jsonrpc connections. If sending backlog will exceed particular values (500 messages or 4GB in size), connection will be dropped and re-created. This will allow to drop all the current backlog and start over increasing chances of cluster recovery. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1888829 Signed-off-by: Ilya Maximets --- NEWS | 2 ++ lib/jsonrpc.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++- lib/jsonrpc.h | 6 ++++++ ovsdb/raft.c | 5 +++++ 4 files changed, 69 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index 2860a8e9c..ebdf8758b 100644 --- a/NEWS +++ b/NEWS @@ -6,6 +6,8 @@ Post-v2.14.0 * New unixctl command 'ovsdb-server/memory-trim-on-compaction on|off'. If turned on, ovsdb-server will try to reclaim all the unused memory after every DB compaction back to OS. Disabled by default. + * Maximum backlog on RAFT connections limited to 500 messages or 4GB. + Once threshold reached, connection is dropped (and re-established). - DPDK: * Removed support for vhost-user dequeue zero-copy. - The environment variable OVS_UNBOUND_CONF, if set, is now used diff --git a/lib/jsonrpc.c b/lib/jsonrpc.c index ecbc939fe..435824844 100644 --- a/lib/jsonrpc.c +++ b/lib/jsonrpc.c @@ -50,6 +50,10 @@ struct jsonrpc { struct ovs_list output; /* Contains "struct ofpbuf"s. */ size_t output_count; /* Number of elements in "output". */ size_t backlog; + + /* Limits. */ + size_t max_output; /* 'output_count' disconnection threshold. */ + size_t max_backlog; /* 'backlog' disconnection threshold. */ }; /* Rate limit for error messages. */ @@ -178,6 +182,17 @@ jsonrpc_get_backlog(const struct jsonrpc *rpc) return rpc->status ? 0 : rpc->backlog; } +/* Sets thresholds for send backlog. If send backlog contains more than + * 'max_n_msgs' messages or larger than 'max_backlog_bytes' bytes, connection + * will be dropped. */ +void +jsonrpc_set_backlog_threshold(struct jsonrpc *rpc, + size_t max_n_msgs, size_t max_backlog_bytes) +{ + rpc->max_output = max_n_msgs; + rpc->max_backlog = max_backlog_bytes; +} + /* Returns the number of bytes that have been received on 'rpc''s underlying * stream. (The value wraps around if it exceeds UINT_MAX.) */ unsigned int @@ -261,9 +276,26 @@ jsonrpc_send(struct jsonrpc *rpc, struct jsonrpc_msg *msg) rpc->backlog += length; if (rpc->output_count >= 50) { - VLOG_INFO_RL(&rl, "excessive sending backlog, jsonrpc: %s, num of" + static struct vlog_rate_limit bl_rl = VLOG_RATE_LIMIT_INIT(5, 5); + bool disconnect = false; + + VLOG_INFO_RL(&bl_rl, "excessive sending backlog, jsonrpc: %s, num of" " msgs: %"PRIuSIZE", backlog: %"PRIuSIZE".", rpc->name, rpc->output_count, rpc->backlog); + if (rpc->max_output && rpc->output_count > rpc->max_output) { + disconnect = true; + VLOG_WARN("sending backlog exceeded maximum number of messages (%" + PRIuSIZE" > %"PRIuSIZE"), disconnecting, jsonrpc: %s.", + rpc->output_count, rpc->max_output, rpc->name); + } else if (rpc->max_backlog && rpc->backlog > rpc->max_backlog) { + disconnect = true; + VLOG_WARN("sending backlog exceeded maximum size (%"PRIuSIZE" > %" + PRIuSIZE" bytes), disconnecting, jsonrpc: %s.", + rpc->backlog, rpc->max_backlog, rpc->name); + } + if (disconnect) { + jsonrpc_error(rpc, E2BIG); + } } if (rpc->backlog == length) { @@ -787,6 +819,10 @@ struct jsonrpc_session { int last_error; unsigned int seqno; uint8_t dscp; + + /* Limits for jsonrpc. */ + size_t max_n_msgs; + size_t max_backlog_bytes; }; static void @@ -970,6 +1006,8 @@ jsonrpc_session_run(struct jsonrpc_session *s) } reconnect_connected(s->reconnect, time_msec()); s->rpc = jsonrpc_open(stream); + jsonrpc_set_backlog_threshold(s->rpc, s->max_n_msgs, + s->max_backlog_bytes); s->seqno++; } else if (error != EAGAIN) { reconnect_listen_error(s->reconnect, time_msec(), error); @@ -1010,6 +1048,8 @@ jsonrpc_session_run(struct jsonrpc_session *s) if (!error) { reconnect_connected(s->reconnect, time_msec()); s->rpc = jsonrpc_open(s->stream); + jsonrpc_set_backlog_threshold(s->rpc, s->max_n_msgs, + s->max_backlog_bytes); s->stream = NULL; s->seqno++; } else if (error != EAGAIN) { @@ -1250,3 +1290,18 @@ jsonrpc_session_set_dscp(struct jsonrpc_session *s, uint8_t dscp) jsonrpc_session_force_reconnect(s); } } + +/* Sets thresholds for send backlog. If send backlog contains more than + * 'max_n_msgs' messages or larger than 'max_backlog_bytes' bytes, connection + * will be closed (then reconnected, if that feature is enabled). */ +void +jsonrpc_session_set_backlog_threshold(struct jsonrpc_session *s, + size_t max_n_msgs, + size_t max_backlog_bytes) +{ + s->max_n_msgs = max_n_msgs; + s->max_backlog_bytes = max_backlog_bytes; + if (s->rpc) { + jsonrpc_set_backlog_threshold(s->rpc, max_n_msgs, max_backlog_bytes); + } +} diff --git a/lib/jsonrpc.h b/lib/jsonrpc.h index a44114e8d..d75d66b86 100644 --- a/lib/jsonrpc.h +++ b/lib/jsonrpc.h @@ -51,6 +51,9 @@ void jsonrpc_wait(struct jsonrpc *); int jsonrpc_get_status(const struct jsonrpc *); size_t jsonrpc_get_backlog(const struct jsonrpc *); +void jsonrpc_set_backlog_threshold(struct jsonrpc *, size_t max_n_msgs, + size_t max_backlog_bytes); + unsigned int jsonrpc_get_received_bytes(const struct jsonrpc *); const char *jsonrpc_get_name(const struct jsonrpc *); @@ -140,6 +143,9 @@ void jsonrpc_session_set_probe_interval(struct jsonrpc_session *, int probe_interval); void jsonrpc_session_set_dscp(struct jsonrpc_session *, uint8_t dscp); +void jsonrpc_session_set_backlog_threshold(struct jsonrpc_session *, + size_t max_n_msgs, + size_t max_backlog_bytes); const char *jsonrpc_session_get_id(const struct jsonrpc_session *); #endif /* jsonrpc.h */ diff --git a/ovsdb/raft.c b/ovsdb/raft.c index ac85c6b67..92a099896 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -925,6 +925,9 @@ raft_reset_ping_timer(struct raft *raft) raft->ping_timeout = time_msec() + raft->election_timer / 3; } +#define RAFT_MAX_BACKLOG_N_MSGS 500 +#define RAFT_MAX_BACKLOG_BYTES UINT32_MAX + static void raft_add_conn(struct raft *raft, struct jsonrpc_session *js, const struct uuid *sid, bool incoming) @@ -940,6 +943,8 @@ raft_add_conn(struct raft *raft, struct jsonrpc_session *js, conn->incoming = incoming; conn->js_seqno = jsonrpc_session_get_seqno(conn->js); jsonrpc_session_set_probe_interval(js, 0); + jsonrpc_session_set_backlog_threshold(js, RAFT_MAX_BACKLOG_N_MSGS, + RAFT_MAX_BACKLOG_BYTES); } /* Starts the local server in an existing Raft cluster, using the local copy of From patchwork Mon Oct 26 01:42:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 1387418 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ovn.org Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CKHfp3wPrz9sT6 for ; Mon, 26 Oct 2020 12:43:30 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 73B4287282; Mon, 26 Oct 2020 01:43:28 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SOfb2dZy49z4; Mon, 26 Oct 2020 01:43:24 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id AA8E087242; Mon, 26 Oct 2020 01:43:24 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 7BC50C088B; Mon, 26 Oct 2020 01:43:24 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 99E49C1AD6 for ; Mon, 26 Oct 2020 01:43:22 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 7FFAA86787 for ; Mon, 26 Oct 2020 01:43:22 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KVBTCIhtJGNq for ; Mon, 26 Oct 2020 01:43:20 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from relay1-d.mail.gandi.net (relay1-d.mail.gandi.net [217.70.183.193]) by whitealder.osuosl.org (Postfix) with ESMTPS id 70F1E86793 for ; Mon, 26 Oct 2020 01:43:20 +0000 (UTC) X-Originating-IP: 78.45.89.65 Received: from im-t490s.redhat.com (ip-78-45-89-65.net.upcbroadband.cz [78.45.89.65]) (Authenticated sender: i.maximets@ovn.org) by relay1-d.mail.gandi.net (Postfix) with ESMTPSA id 17A4524000B; Mon, 26 Oct 2020 01:43:17 +0000 (UTC) From: Ilya Maximets To: ovs-dev@openvswitch.org Date: Mon, 26 Oct 2020 02:42:56 +0100 Message-Id: <20201026014257.215501-5-i.maximets@ovn.org> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20201026014257.215501-1-i.maximets@ovn.org> References: <20201026014257.215501-1-i.maximets@ovn.org> MIME-Version: 1.0 Cc: Han Zhou , Ilya Maximets Subject: [ovs-dev] [PATCH 4/5] raft: Make backlog thresholds configurable. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" New appctl 'cluster/set-backlog-threshold' to configure thresholds on backlog of raft jsonrpc connections. Could be used, for example, in some extreme conditions where size of a database expected to be very large, i.e. comparable with default 4GB threshold. Signed-off-by: Ilya Maximets Acked-by: Dumitru Ceara --- NEWS | 1 + ovsdb/ovsdb-server.1.in | 5 ++++ ovsdb/raft.c | 55 +++++++++++++++++++++++++++++++++++++---- 3 files changed, 56 insertions(+), 5 deletions(-) diff --git a/NEWS b/NEWS index ebdf8758b..c0819bf93 100644 --- a/NEWS +++ b/NEWS @@ -8,6 +8,7 @@ Post-v2.14.0 after every DB compaction back to OS. Disabled by default. * Maximum backlog on RAFT connections limited to 500 messages or 4GB. Once threshold reached, connection is dropped (and re-established). + Use the 'cluster/set-backlog-threshold' command to change limits. - DPDK: * Removed support for vhost-user dequeue zero-copy. - The environment variable OVS_UNBOUND_CONF, if set, is now used diff --git a/ovsdb/ovsdb-server.1.in b/ovsdb/ovsdb-server.1.in index 07a36cc7d..5a7f3ba13 100644 --- a/ovsdb/ovsdb-server.1.in +++ b/ovsdb/ovsdb-server.1.in @@ -381,6 +381,11 @@ This command must be executed on the leader. It initiates the change to the cluster. To see if the change takes effect (committed), use \fBcluster/status\fR to show the current setting. Once a change is committed, it persists at server restarts. +.IP "\fBcluster/set\-backlog\-threshold \fIdb\fR \fIn_msgs\fR \fIn_bytes\fR" +Sets the backlog limits for \fIdb\fR's RAFT connections to a maximum of +\fIn_msgs\fR messages or \fIn_bytes\fR bytes. If the backlog on one of the +connections reaches the limit, it will be disconnected (and re-established). +Values are checked only if the backlog contains more than 50 messages. . .so lib/vlog-unixctl.man .so lib/memory-unixctl.man diff --git a/ovsdb/raft.c b/ovsdb/raft.c index 92a099896..7cfa66fc4 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -305,6 +305,12 @@ struct raft { bool ever_had_leader; /* There has been leader elected since the raft is initialized, meaning it is ever connected. */ + + /* Connection backlog limits. */ +#define DEFAULT_MAX_BACKLOG_N_MSGS 500 +#define DEFAULT_MAX_BACKLOG_N_BYTES UINT32_MAX + size_t conn_backlog_max_n_msgs; /* Number of messages. */ + size_t conn_backlog_max_n_bytes; /* Number of bytes. */ }; /* All Raft structures. */ @@ -412,6 +418,9 @@ raft_alloc(void) raft->election_timer = ELECTION_BASE_MSEC; + raft->conn_backlog_max_n_msgs = DEFAULT_MAX_BACKLOG_N_MSGS; + raft->conn_backlog_max_n_bytes = DEFAULT_MAX_BACKLOG_N_BYTES; + return raft; } @@ -925,9 +934,6 @@ raft_reset_ping_timer(struct raft *raft) raft->ping_timeout = time_msec() + raft->election_timer / 3; } -#define RAFT_MAX_BACKLOG_N_MSGS 500 -#define RAFT_MAX_BACKLOG_BYTES UINT32_MAX - static void raft_add_conn(struct raft *raft, struct jsonrpc_session *js, const struct uuid *sid, bool incoming) @@ -943,8 +949,8 @@ raft_add_conn(struct raft *raft, struct jsonrpc_session *js, conn->incoming = incoming; conn->js_seqno = jsonrpc_session_get_seqno(conn->js); jsonrpc_session_set_probe_interval(js, 0); - jsonrpc_session_set_backlog_threshold(js, RAFT_MAX_BACKLOG_N_MSGS, - RAFT_MAX_BACKLOG_BYTES); + jsonrpc_session_set_backlog_threshold(js, raft->conn_backlog_max_n_msgs, + raft->conn_backlog_max_n_bytes); } /* Starts the local server in an existing Raft cluster, using the local copy of @@ -4727,6 +4733,42 @@ raft_unixctl_change_election_timer(struct unixctl_conn *conn, unixctl_command_reply(conn, "change of election timer initiated."); } +static void +raft_unixctl_set_backlog_threshold(struct unixctl_conn *conn, + int argc OVS_UNUSED, const char *argv[], + void *aux OVS_UNUSED) +{ + const char *cluster_name = argv[1]; + unsigned long long n_msgs, n_bytes; + struct raft_conn *r_conn; + + struct raft *raft = raft_lookup_by_name(cluster_name); + if (!raft) { + unixctl_command_reply_error(conn, "unknown cluster"); + return; + } + + if (!str_to_ullong(argv[2], 10, &n_msgs) + || !str_to_ullong(argv[3], 10, &n_bytes)) { + unixctl_command_reply_error(conn, "invalid argument"); + return; + } + + if (n_msgs < 50 || n_msgs > SIZE_MAX || n_bytes > SIZE_MAX) { + unixctl_command_reply_error(conn, "values out of range"); + return; + } + + raft->conn_backlog_max_n_msgs = n_msgs; + raft->conn_backlog_max_n_bytes = n_bytes; + + LIST_FOR_EACH (r_conn, list_node, &raft->conns) { + jsonrpc_session_set_backlog_threshold(r_conn->js, n_msgs, n_bytes); + } + + unixctl_command_reply(conn, NULL); +} + static void raft_unixctl_failure_test(struct unixctl_conn *conn OVS_UNUSED, int argc OVS_UNUSED, const char *argv[], @@ -4787,6 +4829,9 @@ raft_init(void) raft_unixctl_kick, NULL); unixctl_command_register("cluster/change-election-timer", "DB TIME", 2, 2, raft_unixctl_change_election_timer, NULL); + unixctl_command_register("cluster/set-backlog-threshold", + "DB N_MSGS N_BYTES", 3, 3, + raft_unixctl_set_backlog_threshold, NULL); unixctl_command_register("cluster/failure-test", "FAILURE SCENARIO", 1, 1, raft_unixctl_failure_test, NULL); ovsthread_once_done(&once); From patchwork Mon Oct 26 01:42:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 1387420 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=fraxinus.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ovn.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CKHfr0lNzz9sT6 for ; Mon, 26 Oct 2020 12:43:32 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 75A9286A63; Mon, 26 Oct 2020 01:43:30 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id k08DYJJRaNWe; Mon, 26 Oct 2020 01:43:28 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by fraxinus.osuosl.org (Postfix) with ESMTP id ECD5386AF8; Mon, 26 Oct 2020 01:43:26 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id DB019C1AD5; Mon, 26 Oct 2020 01:43:26 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 133E3C1ADA for ; Mon, 26 Oct 2020 01:43:26 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id EF960867D3 for ; Mon, 26 Oct 2020 01:43:25 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1GU9pmLuqBIg for ; Mon, 26 Oct 2020 01:43:24 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from relay1-d.mail.gandi.net (relay1-d.mail.gandi.net [217.70.183.193]) by whitealder.osuosl.org (Postfix) with ESMTPS id C0851867DA for ; Mon, 26 Oct 2020 01:43:23 +0000 (UTC) X-Originating-IP: 78.45.89.65 Received: from im-t490s.redhat.com (ip-78-45-89-65.net.upcbroadband.cz [78.45.89.65]) (Authenticated sender: i.maximets@ovn.org) by relay1-d.mail.gandi.net (Postfix) with ESMTPSA id 6AE12240008; Mon, 26 Oct 2020 01:43:21 +0000 (UTC) From: Ilya Maximets To: ovs-dev@openvswitch.org Date: Mon, 26 Oct 2020 02:42:57 +0100 Message-Id: <20201026014257.215501-6-i.maximets@ovn.org> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20201026014257.215501-1-i.maximets@ovn.org> References: <20201026014257.215501-1-i.maximets@ovn.org> MIME-Version: 1.0 Cc: Han Zhou , Ilya Maximets Subject: [ovs-dev] [PATCH 5/5] raft: Avoid having more than one snapshot in-flight. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Previous commit 8c2c503bdb0d ("raft: Avoid sending equal snapshots.") took a "safe" approach to not send only exactly same snapshot installation requests. However, it doesn't make much sense to send more than one snapshot at a time. If obsolete snapshot installed, leader will re-send the most recent one. With this change leader will have only 1 snapshot in-flight per connection. This will reduce backlogs on raft connections in case new snapshot created while 'install_snapshot_request' is in progress or if election timer changed in that period. Also, not tracking the exact 'install_snapshot_request' we've sent allows to simplify the code. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1888829 Fixes: 8c2c503bdb0d ("raft: Avoid sending equal snapshots.") Signed-off-by: Ilya Maximets Acked-by: Dumitru Ceara --- ovsdb/raft-private.c | 1 - ovsdb/raft-private.h | 4 ++-- ovsdb/raft.c | 42 ++++++++++++++++-------------------------- 3 files changed, 18 insertions(+), 29 deletions(-) diff --git a/ovsdb/raft-private.c b/ovsdb/raft-private.c index 9468fdaf4..26d39a087 100644 --- a/ovsdb/raft-private.c +++ b/ovsdb/raft-private.c @@ -137,7 +137,6 @@ raft_server_destroy(struct raft_server *s) if (s) { free(s->address); free(s->nickname); - free(s->last_install_snapshot_request); free(s); } } diff --git a/ovsdb/raft-private.h b/ovsdb/raft-private.h index 1f366b4ab..76b097b89 100644 --- a/ovsdb/raft-private.h +++ b/ovsdb/raft-private.h @@ -84,8 +84,8 @@ struct raft_server { bool replied; /* Reply to append_request was received from this node during current election_timeout interval. */ - /* Copy of the last install_snapshot_request sent to this server. */ - struct raft_install_snapshot_request *last_install_snapshot_request; + /* install_snapshot_request has been sent, but there is no response yet. */ + bool install_snapshot_request_in_progress; /* For use in adding and removing servers: */ struct uuid requester_sid; /* Nonzero if requested via RPC. */ diff --git a/ovsdb/raft.c b/ovsdb/raft.c index 7cfa66fc4..760dfca6d 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -1448,12 +1448,11 @@ raft_conn_run(struct raft *raft, struct raft_conn *conn) && jsonrpc_session_is_connected(conn->js)); if (reconnected) { - /* Clear 'last_install_snapshot_request' since it might not reach the - * destination or server was restarted. */ + /* Clear 'install_snapshot_request_in_progress' since it might not + * reach the destination or server was restarted. */ struct raft_server *server = raft_find_server(raft, &conn->sid); if (server) { - free(server->last_install_snapshot_request); - server->last_install_snapshot_request = NULL; + server->install_snapshot_request_in_progress = false; } } @@ -2575,6 +2574,7 @@ raft_server_init_leader(struct raft *raft, struct raft_server *s) s->match_index = 0; s->phase = RAFT_PHASE_STABLE; s->replied = false; + s->install_snapshot_request_in_progress = false; } static void @@ -3331,31 +3331,19 @@ raft_send_install_snapshot_request(struct raft *raft, } }; - if (s->last_install_snapshot_request) { - struct raft_install_snapshot_request *old, *new; - - old = s->last_install_snapshot_request; - new = &rpc.install_snapshot_request; - if ( old->term == new->term - && old->last_index == new->last_index - && old->last_term == new->last_term - && old->last_servers == new->last_servers - && old->data == new->data - && old->election_timer == new->election_timer - && uuid_equals(&old->last_eid, &new->last_eid)) { - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); + if (s->install_snapshot_request_in_progress) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); - VLOG_WARN_RL(&rl, "not sending exact same install_snapshot_request" - " to server %s again", s->nickname); - return; - } + VLOG_INFO_RL(&rl, "not sending snapshot to server %s, " + "already in progress", s->nickname); + return; } - free(s->last_install_snapshot_request); - CONST_CAST(struct raft_server *, s)->last_install_snapshot_request - = xmemdup(&rpc.install_snapshot_request, - sizeof rpc.install_snapshot_request); - raft_send(raft, &rpc); + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); + VLOG_INFO_RL(&rl, "sending snapshot to server %s, %"PRIu64":%"PRIu64".", + s->nickname, raft->term, raft->log_start - 1); + CONST_CAST(struct raft_server *, s)->install_snapshot_request_in_progress + = raft_send(raft, &rpc); } static void @@ -4072,6 +4060,8 @@ raft_handle_install_snapshot_reply( } } + s->install_snapshot_request_in_progress = false; + if (rpy->last_index != raft->log_start - 1 || rpy->last_term != raft->snap.term) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5);