diff mbox series

[ovs-dev] ovsdb: provide raft and command interfaces with priority

Message ID 20210608092708.15711-1-anton.ivanov@cambridgegreys.com
State Changes Requested
Headers show
Series [ovs-dev] ovsdb: provide raft and command interfaces with priority | expand

Commit Message

Anton Ivanov June 8, 2021, 9:27 a.m. UTC
From: Anton Ivanov <anton.ivanov@cambridgegreys.com>

Set a soft time limit of "raft election timer"/2 on ovsdb
processing.

This improves behaviour in large heavily loaded clusters.
While it cannot fully eliminate spurious raft elections
under heavy load, it significantly decreases their number.

TODO: randomize session processing order to ensure individual
sessions towards the end of the remotes list are not starved

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
---
 ovsdb/jsonrpc-server.c | 43 ++++++++++++++++++++++++++++++++++++++----
 ovsdb/jsonrpc-server.h |  2 +-
 ovsdb/ovsdb-server.c   | 15 ++++++++++++++-
 ovsdb/raft.c           |  5 +++++
 ovsdb/raft.h           |  3 +++
 ovsdb/storage.c        |  8 ++++++++
 ovsdb/storage.h        |  2 ++
 7 files changed, 72 insertions(+), 6 deletions(-)

Comments

Ben Pfaff June 10, 2021, 10:01 p.m. UTC | #1
On Tue, Jun 08, 2021 at 10:27:08AM +0100, anton.ivanov@cambridgegreys.com wrote:
> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> 
> Set a soft time limit of "raft election timer"/2 on ovsdb
> processing.
> 
> This improves behaviour in large heavily loaded clusters.
> While it cannot fully eliminate spurious raft elections
> under heavy load, it significantly decreases their number.
> 
> TODO: randomize session processing order to ensure individual
> sessions towards the end of the remotes list are not starved
> 
> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>

Personally I'd consider a random or round-robin processing order to be
pretty critical here.

A better approach (which would be a lot more work) would be for the Raft
code to use a separate thread or process.
Anton Ivanov June 11, 2021, 6:05 a.m. UTC | #2
On 10/06/2021 23:01, Ben Pfaff wrote:
> On Tue, Jun 08, 2021 at 10:27:08AM +0100, anton.ivanov@cambridgegreys.com wrote:
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>
>> Set a soft time limit of "raft election timer"/2 on ovsdb
>> processing.
>>
>> This improves behaviour in large heavily loaded clusters.
>> While it cannot fully eliminate spurious raft elections
>> under heavy load, it significantly decreases their number.
>>
>> TODO: randomize session processing order to ensure individual
>> sessions towards the end of the remotes list are not starved
>>
>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> Personally I'd consider a random or round-robin processing order to be
> pretty critical here.

I agree.

I will send a revised patch with either round-robin or "restart where it 
stopped".


> A better approach (which would be a lot more work) would be for the Raft
> code to use a separate thread or process.

Looked at that from several angles. Not realistic without a complete 
rewrite:

1. The amount of locking needed will be severely detrimental to 
performance. I tried at some point to play with putting all monitors 
processing into a parallel pool and it needed on the order of 4-5 
mutexes per monitor protecting different parts of it. Storage is likely 
to be even worse.

2. It will end up pushing storage state that does not correspond to a 
complete transaction which is a can of worms in its own right.

>
Anton Ivanov June 11, 2021, 3:49 p.m. UTC | #3
On 10/06/2021 23:01, Ben Pfaff wrote:
> On Tue, Jun 08, 2021 at 10:27:08AM +0100, anton.ivanov@cambridgegreys.com wrote:
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>
>> Set a soft time limit of "raft election timer"/2 on ovsdb
>> processing.
>>
>> This improves behaviour in large heavily loaded clusters.
>> While it cannot fully eliminate spurious raft elections
>> under heavy load, it significantly decreases their number.
>>
>> TODO: randomize session processing order to ensure individual
>> sessions towards the end of the remotes list are not starved
>>
>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> Personally I'd consider a random or round-robin processing order to be
> pretty critical here.

I have pushed an updated version which implements a "restart where you stopped" approach. This should be mostly identical to the current master in terms of "session starvation" while having "raft at higher priority".

I will do the RR separately - I need to get my head around a few things. For example - is it possible to make it in a more generalized manner (not just this case).

> A better approach (which would be a lot more work) would be for the Raft
> code to use a separate thread or process.
>
Ben Pfaff June 11, 2021, 4:08 p.m. UTC | #4
On Fri, Jun 11, 2021 at 04:49:44PM +0100, Anton Ivanov wrote:
> 
> On 10/06/2021 23:01, Ben Pfaff wrote:
> > On Tue, Jun 08, 2021 at 10:27:08AM +0100, anton.ivanov@cambridgegreys.com wrote:
> > > From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> > > 
> > > Set a soft time limit of "raft election timer"/2 on ovsdb
> > > processing.
> > > 
> > > This improves behaviour in large heavily loaded clusters.
> > > While it cannot fully eliminate spurious raft elections
> > > under heavy load, it significantly decreases their number.
> > > 
> > > TODO: randomize session processing order to ensure individual
> > > sessions towards the end of the remotes list are not starved
> > > 
> > > Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> > Personally I'd consider a random or round-robin processing order to be
> > pretty critical here.
> 
> I have pushed an updated version which implements a "restart where you stopped" approach. This should be mostly identical to the current master in terms of "session starvation" while having "raft at higher priority".

That seems fine, then.
diff mbox series

Patch

diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c
index 4e2dfc3d7..18b8a5deb 100644
--- a/ovsdb/jsonrpc-server.c
+++ b/ovsdb/jsonrpc-server.c
@@ -60,7 +60,8 @@  static struct ovsdb_jsonrpc_session *ovsdb_jsonrpc_session_create(
     struct ovsdb_jsonrpc_remote *, struct jsonrpc_session *, bool);
 static void ovsdb_jsonrpc_session_preremove_db(struct ovsdb_jsonrpc_remote *,
                                                struct ovsdb *);
-static void ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *);
+static void ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *,
+                                          uint64_t limit);
 static void ovsdb_jsonrpc_session_wait_all(struct ovsdb_jsonrpc_remote *);
 static void ovsdb_jsonrpc_session_get_memory_usage_all(
     const struct ovsdb_jsonrpc_remote *, struct simap *usage);
@@ -128,6 +129,7 @@  struct ovsdb_jsonrpc_server {
     bool read_only;            /* This server is does not accept any
                                   transactions that can modify the database. */
     struct shash remotes;      /* Contains "struct ovsdb_jsonrpc_remote *"s. */
+    bool yield_immediately;
 };
 
 /* A configured remote.  This is either a passive stream listener plus a list
@@ -159,6 +161,7 @@  ovsdb_jsonrpc_server_create(bool read_only)
     ovsdb_server_init(&server->up);
     shash_init(&server->remotes);
     server->read_only = read_only;
+    server->yield_immediately = false;
     return server;
 }
 
@@ -378,9 +381,16 @@  ovsdb_jsonrpc_server_set_read_only(struct ovsdb_jsonrpc_server *svr,
 }
 
 void
-ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr)
+ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr, uint64_t limit)
 {
     struct shash_node *node;
+    uint64_t elapsed = 0, start_time = 0;
+
+    if (limit) {
+        start_time = time_now();
+    }
+
+    svr->yield_immediately = false;
 
     SHASH_FOR_EACH (node, &svr->remotes) {
         struct ovsdb_jsonrpc_remote *remote = node->data;
@@ -403,7 +413,15 @@  ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr)
             }
         }
 
-        ovsdb_jsonrpc_session_run_all(remote);
+        ovsdb_jsonrpc_session_run_all(remote, limit - elapsed);
+
+        if (limit) {
+            elapsed = start_time - time_now();
+            if (elapsed >= limit) {
+                svr->yield_immediately = true;
+                break;
+            }
+        }
     }
 }
 
@@ -412,6 +430,12 @@  ovsdb_jsonrpc_server_wait(struct ovsdb_jsonrpc_server *svr)
 {
     struct shash_node *node;
 
+    if (svr->yield_immediately) {
+        poll_immediate_wake();
+        svr->yield_immediately = false;
+        return;
+    }
+
     SHASH_FOR_EACH (node, &svr->remotes) {
         struct ovsdb_jsonrpc_remote *remote = node->data;
 
@@ -583,15 +607,26 @@  ovsdb_jsonrpc_session_set_options(struct ovsdb_jsonrpc_session *session,
 }
 
 static void
-ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *remote)
+ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *remote,
+                              uint64_t limit)
 {
     struct ovsdb_jsonrpc_session *s, *next;
+    uint64_t start_time;
+
+    if (limit) {
+        start_time = time_msec();
+    }
 
     LIST_FOR_EACH_SAFE (s, next, node, &remote->sessions) {
         int error = ovsdb_jsonrpc_session_run(s);
         if (error) {
             ovsdb_jsonrpc_session_close(s);
         }
+        if (limit) {
+            if (time_msec() - start_time > limit) {
+                break;
+            }
+        }
     }
 }
 
diff --git a/ovsdb/jsonrpc-server.h b/ovsdb/jsonrpc-server.h
index e0653aa39..218152e9d 100644
--- a/ovsdb/jsonrpc-server.h
+++ b/ovsdb/jsonrpc-server.h
@@ -67,7 +67,7 @@  void ovsdb_jsonrpc_server_free_remote_status(
 void ovsdb_jsonrpc_server_reconnect(struct ovsdb_jsonrpc_server *, bool force,
                                     char *comment);
 
-void ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *);
+void ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *, uint64_t limit);
 void ovsdb_jsonrpc_server_wait(struct ovsdb_jsonrpc_server *);
 
 void ovsdb_jsonrpc_server_set_read_only(struct ovsdb_jsonrpc_server *,
diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
index b09232c65..97bfa05b2 100644
--- a/ovsdb/ovsdb-server.c
+++ b/ovsdb/ovsdb-server.c
@@ -184,6 +184,7 @@  main_loop(struct server_config *config,
     char *remotes_error, *ssl_error;
     struct shash_node *node;
     long long int status_timer = LLONG_MIN;
+    uint64_t limit = 0;
 
     *exiting = false;
     ssl_error = NULL;
@@ -215,7 +216,7 @@  main_loop(struct server_config *config,
             reconfigure_remotes(jsonrpc, all_dbs, remotes),
             &remotes_error);
         report_error_if_changed(reconfigure_ssl(all_dbs), &ssl_error);
-        ovsdb_jsonrpc_server_run(jsonrpc);
+        ovsdb_jsonrpc_server_run(jsonrpc, limit);
 
         if (*is_backup) {
             replication_run();
@@ -228,8 +229,20 @@  main_loop(struct server_config *config,
         struct shash_node *next;
         SHASH_FOR_EACH_SAFE (node, next, all_dbs) {
             struct db *db = node->data;
+            uint64_t db_limit = 0;
+
             ovsdb_txn_history_run(db->db);
             ovsdb_storage_run(db->db->storage);
+
+            db_limit = max_processing_time(db->db->storage);
+            if (db_limit) {
+                if (!limit) {
+                    limit = db_limit;
+                } else {
+                    limit = MIN(db_limit, limit);
+                }
+            }
+
             read_db(config, db);
             /* Run triggers after storage_run and read_db to make sure new raft
              * updates are utilized in current iteration. */
diff --git a/ovsdb/raft.c b/ovsdb/raft.c
index e06c1f1ab..c473e5d32 100644
--- a/ovsdb/raft.c
+++ b/ovsdb/raft.c
@@ -407,6 +407,11 @@  raft_make_address_passive(const char *address_)
     }
 }
 
+uint64_t raft_election_timer_value(const struct raft *raft)
+{
+    return raft->election_timer;
+}
+
 static struct raft *
 raft_alloc(void)
 {
diff --git a/ovsdb/raft.h b/ovsdb/raft.h
index 3545c41c2..1d270ec0c 100644
--- a/ovsdb/raft.h
+++ b/ovsdb/raft.h
@@ -188,4 +188,7 @@  void raft_take_leadership(struct raft *);
 void raft_transfer_leadership(struct raft *, const char *reason);
 
 const struct uuid *raft_current_eid(const struct raft *);
+
+uint64_t raft_election_timer_value(const struct raft *);
+
 #endif /* lib/raft.h */
diff --git a/ovsdb/storage.c b/ovsdb/storage.c
index 40415fcf6..a2142851a 100644
--- a/ovsdb/storage.c
+++ b/ovsdb/storage.c
@@ -640,3 +640,11 @@  ovsdb_storage_peek_last_eid(struct ovsdb_storage *storage)
     }
     return raft_current_eid(storage->raft);
 }
+
+uint64_t max_processing_time(struct ovsdb_storage *storage)
+{
+    if (!storage->raft) {
+        return UINT64_MAX;
+    }
+    return raft_election_timer_value(storage->raft) / 2;
+}
diff --git a/ovsdb/storage.h b/ovsdb/storage.h
index 02b6e7e6c..9e195bbe8 100644
--- a/ovsdb/storage.h
+++ b/ovsdb/storage.h
@@ -97,4 +97,6 @@  struct ovsdb_schema *ovsdb_storage_read_schema(struct ovsdb_storage *);
 
 const struct uuid *ovsdb_storage_peek_last_eid(struct ovsdb_storage *);
 
+uint64_t max_processing_time(struct ovsdb_storage *);
+
 #endif /* ovsdb/storage.h */