Message ID | 20210608092708.15711-1-anton.ivanov@cambridgegreys.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev] ovsdb: provide raft and command interfaces with priority | expand |
On Tue, Jun 08, 2021 at 10:27:08AM +0100, anton.ivanov@cambridgegreys.com wrote: > From: Anton Ivanov <anton.ivanov@cambridgegreys.com> > > Set a soft time limit of "raft election timer"/2 on ovsdb > processing. > > This improves behaviour in large heavily loaded clusters. > While it cannot fully eliminate spurious raft elections > under heavy load, it significantly decreases their number. > > TODO: randomize session processing order to ensure individual > sessions towards the end of the remotes list are not starved > > Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Personally I'd consider a random or round-robin processing order to be pretty critical here. A better approach (which would be a lot more work) would be for the Raft code to use a separate thread or process.
On 10/06/2021 23:01, Ben Pfaff wrote: > On Tue, Jun 08, 2021 at 10:27:08AM +0100, anton.ivanov@cambridgegreys.com wrote: >> From: Anton Ivanov <anton.ivanov@cambridgegreys.com> >> >> Set a soft time limit of "raft election timer"/2 on ovsdb >> processing. >> >> This improves behaviour in large heavily loaded clusters. >> While it cannot fully eliminate spurious raft elections >> under heavy load, it significantly decreases their number. >> >> TODO: randomize session processing order to ensure individual >> sessions towards the end of the remotes list are not starved >> >> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> > Personally I'd consider a random or round-robin processing order to be > pretty critical here. I agree. I will send a revised patch with either round-robin or "restart where it stopped". > A better approach (which would be a lot more work) would be for the Raft > code to use a separate thread or process. Looked at that from several angles. Not realistic without a complete rewrite: 1. The amount of locking needed will be severely detrimental to performance. I tried at some point to play with putting all monitors processing into a parallel pool and it needed on the order of 4-5 mutexes per monitor protecting different parts of it. Storage is likely to be even worse. 2. It will end up pushing storage state that does not correspond to a complete transaction which is a can of worms in its own right. >
On 10/06/2021 23:01, Ben Pfaff wrote: > On Tue, Jun 08, 2021 at 10:27:08AM +0100, anton.ivanov@cambridgegreys.com wrote: >> From: Anton Ivanov <anton.ivanov@cambridgegreys.com> >> >> Set a soft time limit of "raft election timer"/2 on ovsdb >> processing. >> >> This improves behaviour in large heavily loaded clusters. >> While it cannot fully eliminate spurious raft elections >> under heavy load, it significantly decreases their number. >> >> TODO: randomize session processing order to ensure individual >> sessions towards the end of the remotes list are not starved >> >> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> > Personally I'd consider a random or round-robin processing order to be > pretty critical here. I have pushed an updated version which implements a "restart where you stopped" approach. This should be mostly identical to the current master in terms of "session starvation" while having "raft at higher priority". I will do the RR separately - I need to get my head around a few things. For example - is it possible to make it in a more generalized manner (not just this case). > A better approach (which would be a lot more work) would be for the Raft > code to use a separate thread or process. >
On Fri, Jun 11, 2021 at 04:49:44PM +0100, Anton Ivanov wrote: > > On 10/06/2021 23:01, Ben Pfaff wrote: > > On Tue, Jun 08, 2021 at 10:27:08AM +0100, anton.ivanov@cambridgegreys.com wrote: > > > From: Anton Ivanov <anton.ivanov@cambridgegreys.com> > > > > > > Set a soft time limit of "raft election timer"/2 on ovsdb > > > processing. > > > > > > This improves behaviour in large heavily loaded clusters. > > > While it cannot fully eliminate spurious raft elections > > > under heavy load, it significantly decreases their number. > > > > > > TODO: randomize session processing order to ensure individual > > > sessions towards the end of the remotes list are not starved > > > > > > Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> > > Personally I'd consider a random or round-robin processing order to be > > pretty critical here. > > I have pushed an updated version which implements a "restart where you stopped" approach. This should be mostly identical to the current master in terms of "session starvation" while having "raft at higher priority". That seems fine, then.
diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c index 4e2dfc3d7..18b8a5deb 100644 --- a/ovsdb/jsonrpc-server.c +++ b/ovsdb/jsonrpc-server.c @@ -60,7 +60,8 @@ static struct ovsdb_jsonrpc_session *ovsdb_jsonrpc_session_create( struct ovsdb_jsonrpc_remote *, struct jsonrpc_session *, bool); static void ovsdb_jsonrpc_session_preremove_db(struct ovsdb_jsonrpc_remote *, struct ovsdb *); -static void ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *); +static void ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *, + uint64_t limit); static void ovsdb_jsonrpc_session_wait_all(struct ovsdb_jsonrpc_remote *); static void ovsdb_jsonrpc_session_get_memory_usage_all( const struct ovsdb_jsonrpc_remote *, struct simap *usage); @@ -128,6 +129,7 @@ struct ovsdb_jsonrpc_server { bool read_only; /* This server is does not accept any transactions that can modify the database. */ struct shash remotes; /* Contains "struct ovsdb_jsonrpc_remote *"s. */ + bool yield_immediately; }; /* A configured remote. This is either a passive stream listener plus a list @@ -159,6 +161,7 @@ ovsdb_jsonrpc_server_create(bool read_only) ovsdb_server_init(&server->up); shash_init(&server->remotes); server->read_only = read_only; + server->yield_immediately = false; return server; } @@ -378,9 +381,16 @@ ovsdb_jsonrpc_server_set_read_only(struct ovsdb_jsonrpc_server *svr, } void -ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr) +ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr, uint64_t limit) { struct shash_node *node; + uint64_t elapsed = 0, start_time = 0; + + if (limit) { + start_time = time_now(); + } + + svr->yield_immediately = false; SHASH_FOR_EACH (node, &svr->remotes) { struct ovsdb_jsonrpc_remote *remote = node->data; @@ -403,7 +413,15 @@ ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr) } } - ovsdb_jsonrpc_session_run_all(remote); + ovsdb_jsonrpc_session_run_all(remote, limit - elapsed); + + if (limit) { + elapsed = start_time - time_now(); + if (elapsed >= limit) { + svr->yield_immediately = true; + break; + } + } } } @@ -412,6 +430,12 @@ ovsdb_jsonrpc_server_wait(struct ovsdb_jsonrpc_server *svr) { struct shash_node *node; + if (svr->yield_immediately) { + poll_immediate_wake(); + svr->yield_immediately = false; + return; + } + SHASH_FOR_EACH (node, &svr->remotes) { struct ovsdb_jsonrpc_remote *remote = node->data; @@ -583,15 +607,26 @@ ovsdb_jsonrpc_session_set_options(struct ovsdb_jsonrpc_session *session, } static void -ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *remote) +ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *remote, + uint64_t limit) { struct ovsdb_jsonrpc_session *s, *next; + uint64_t start_time; + + if (limit) { + start_time = time_msec(); + } LIST_FOR_EACH_SAFE (s, next, node, &remote->sessions) { int error = ovsdb_jsonrpc_session_run(s); if (error) { ovsdb_jsonrpc_session_close(s); } + if (limit) { + if (time_msec() - start_time > limit) { + break; + } + } } } diff --git a/ovsdb/jsonrpc-server.h b/ovsdb/jsonrpc-server.h index e0653aa39..218152e9d 100644 --- a/ovsdb/jsonrpc-server.h +++ b/ovsdb/jsonrpc-server.h @@ -67,7 +67,7 @@ void ovsdb_jsonrpc_server_free_remote_status( void ovsdb_jsonrpc_server_reconnect(struct ovsdb_jsonrpc_server *, bool force, char *comment); -void ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *); +void ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *, uint64_t limit); void ovsdb_jsonrpc_server_wait(struct ovsdb_jsonrpc_server *); void ovsdb_jsonrpc_server_set_read_only(struct ovsdb_jsonrpc_server *, diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c index b09232c65..97bfa05b2 100644 --- a/ovsdb/ovsdb-server.c +++ b/ovsdb/ovsdb-server.c @@ -184,6 +184,7 @@ main_loop(struct server_config *config, char *remotes_error, *ssl_error; struct shash_node *node; long long int status_timer = LLONG_MIN; + uint64_t limit = 0; *exiting = false; ssl_error = NULL; @@ -215,7 +216,7 @@ main_loop(struct server_config *config, reconfigure_remotes(jsonrpc, all_dbs, remotes), &remotes_error); report_error_if_changed(reconfigure_ssl(all_dbs), &ssl_error); - ovsdb_jsonrpc_server_run(jsonrpc); + ovsdb_jsonrpc_server_run(jsonrpc, limit); if (*is_backup) { replication_run(); @@ -228,8 +229,20 @@ main_loop(struct server_config *config, struct shash_node *next; SHASH_FOR_EACH_SAFE (node, next, all_dbs) { struct db *db = node->data; + uint64_t db_limit = 0; + ovsdb_txn_history_run(db->db); ovsdb_storage_run(db->db->storage); + + db_limit = max_processing_time(db->db->storage); + if (db_limit) { + if (!limit) { + limit = db_limit; + } else { + limit = MIN(db_limit, limit); + } + } + read_db(config, db); /* Run triggers after storage_run and read_db to make sure new raft * updates are utilized in current iteration. */ diff --git a/ovsdb/raft.c b/ovsdb/raft.c index e06c1f1ab..c473e5d32 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -407,6 +407,11 @@ raft_make_address_passive(const char *address_) } } +uint64_t raft_election_timer_value(const struct raft *raft) +{ + return raft->election_timer; +} + static struct raft * raft_alloc(void) { diff --git a/ovsdb/raft.h b/ovsdb/raft.h index 3545c41c2..1d270ec0c 100644 --- a/ovsdb/raft.h +++ b/ovsdb/raft.h @@ -188,4 +188,7 @@ void raft_take_leadership(struct raft *); void raft_transfer_leadership(struct raft *, const char *reason); const struct uuid *raft_current_eid(const struct raft *); + +uint64_t raft_election_timer_value(const struct raft *); + #endif /* lib/raft.h */ diff --git a/ovsdb/storage.c b/ovsdb/storage.c index 40415fcf6..a2142851a 100644 --- a/ovsdb/storage.c +++ b/ovsdb/storage.c @@ -640,3 +640,11 @@ ovsdb_storage_peek_last_eid(struct ovsdb_storage *storage) } return raft_current_eid(storage->raft); } + +uint64_t max_processing_time(struct ovsdb_storage *storage) +{ + if (!storage->raft) { + return UINT64_MAX; + } + return raft_election_timer_value(storage->raft) / 2; +} diff --git a/ovsdb/storage.h b/ovsdb/storage.h index 02b6e7e6c..9e195bbe8 100644 --- a/ovsdb/storage.h +++ b/ovsdb/storage.h @@ -97,4 +97,6 @@ struct ovsdb_schema *ovsdb_storage_read_schema(struct ovsdb_storage *); const struct uuid *ovsdb_storage_peek_last_eid(struct ovsdb_storage *); +uint64_t max_processing_time(struct ovsdb_storage *); + #endif /* ovsdb/storage.h */