Message ID | 20230802134532.2370039-1-i.maximets@ovn.org |
---|---|
State | Accepted |
Commit | bd2a80b1df4bd22446e9c80e7865c512e460a3c9 |
Headers | show |
Series | [ovs-dev] ovsdb-server: Fix excessive memory usage on DB open. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/intel-ovs-compilation | fail | test: fail |
On 8/2/23 15:45, Ilya Maximets wrote: > During initial read of a database file all the file transactions are > added to the transaction history. The history run with the history > size checks is only executed after the whole file is processed. > If, for some reason, the file contains way too many transactions, > this behavior may result in excessive memory consumption up to > hundreds of GBs. For example, here is a log entry about memory usage > after reading a file with 100K+ OVN NbDB transactions: > > |00004|memory|INFO|95650400 kB peak resident set size after 96.9 seconds > |00005|memory|INFO|atoms:3083346 cells:1838767 monitors:0 > raft-log:123309 txn-history:123307 txn-history-atoms:1647022868 > > In this particular case ovsdb-server allocated 95 GB of RAM in order > to accommodate 1.6 billion ovsdb atoms in the history, while only 3 > million atoms are in the actual database. Wow.. > > Fix that by running history size checks after applying each file > transaction. This way the memory usage while reading the database > from the example stays at about 1 GB mark. History size checks are > cheap in comparison with transaction replay, so the additional calls > do not reduce performance. > > We could've just moved the history run into ovsdb_txn_replay_commit(), > but it seems more organic to call it externally, since we have init() > and destroy() functions called externally as well. > > Since the history run will be executed shortly after reading the > database and actual memory consumption peak is not always logged, > there seem to be no reliable way to unit test for the issue without > adding extra testing infrastructure into the code. > > Fixes: 695e81502794 ("ovsdb-server: Transaction history tracking.") > Reported-at: https://bugzilla.redhat.com/2228464 > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > --- > ovsdb/ovsdb-server.c | 3 ++- > ovsdb/relay.c | 1 + > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c > index 8e623118b..cf09c9079 100644 > --- a/ovsdb/ovsdb-server.c > +++ b/ovsdb/ovsdb-server.c > @@ -235,7 +235,7 @@ main_loop(struct server_config *config, > > SHASH_FOR_EACH_SAFE (node, all_dbs) { > struct db *db = node->data; > - ovsdb_txn_history_run(db->db); > + > ovsdb_storage_run(db->db->storage); > read_db(config, db); > /* Run triggers after storage_run and read_db to make sure new raft > @@ -678,6 +678,7 @@ parse_txn(struct server_config *config, struct db *db, > if (!error && !uuid_is_zero(txnid)) { > db->db->prereq = *txnid; > } > + ovsdb_txn_history_run(db->db); > } > return error; > } > diff --git a/ovsdb/relay.c b/ovsdb/relay.c > index b035cb492..27ff196b7 100644 > --- a/ovsdb/relay.c > +++ b/ovsdb/relay.c > @@ -413,6 +413,7 @@ ovsdb_relay_run(void) > } > ovsdb_cs_event_destroy(event); > } > + ovsdb_txn_history_run(ctx->db); > } > } > It's always cool and somewhat worrying to see such a commit-log-size-to-loc ratio but I think this is fine: Acked-by: Dumitru Ceara <dceara@redhat.com> Regards, Dumitru
On 8/2/23 22:41, Dumitru Ceara wrote: > On 8/2/23 15:45, Ilya Maximets wrote: >> During initial read of a database file all the file transactions are >> added to the transaction history. The history run with the history >> size checks is only executed after the whole file is processed. >> If, for some reason, the file contains way too many transactions, >> this behavior may result in excessive memory consumption up to >> hundreds of GBs. For example, here is a log entry about memory usage >> after reading a file with 100K+ OVN NbDB transactions: >> >> |00004|memory|INFO|95650400 kB peak resident set size after 96.9 seconds >> |00005|memory|INFO|atoms:3083346 cells:1838767 monitors:0 >> raft-log:123309 txn-history:123307 txn-history-atoms:1647022868 >> >> In this particular case ovsdb-server allocated 95 GB of RAM in order >> to accommodate 1.6 billion ovsdb atoms in the history, while only 3 >> million atoms are in the actual database. > > Wow.. > >> >> Fix that by running history size checks after applying each file >> transaction. This way the memory usage while reading the database >> from the example stays at about 1 GB mark. History size checks are >> cheap in comparison with transaction replay, so the additional calls >> do not reduce performance. >> >> We could've just moved the history run into ovsdb_txn_replay_commit(), >> but it seems more organic to call it externally, since we have init() >> and destroy() functions called externally as well. >> >> Since the history run will be executed shortly after reading the >> database and actual memory consumption peak is not always logged, >> there seem to be no reliable way to unit test for the issue without >> adding extra testing infrastructure into the code. >> >> Fixes: 695e81502794 ("ovsdb-server: Transaction history tracking.") >> Reported-at: https://bugzilla.redhat.com/2228464 >> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> >> --- >> ovsdb/ovsdb-server.c | 3 ++- >> ovsdb/relay.c | 1 + >> 2 files changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c >> index 8e623118b..cf09c9079 100644 >> --- a/ovsdb/ovsdb-server.c >> +++ b/ovsdb/ovsdb-server.c >> @@ -235,7 +235,7 @@ main_loop(struct server_config *config, >> >> SHASH_FOR_EACH_SAFE (node, all_dbs) { >> struct db *db = node->data; >> - ovsdb_txn_history_run(db->db); >> + >> ovsdb_storage_run(db->db->storage); >> read_db(config, db); >> /* Run triggers after storage_run and read_db to make sure new raft >> @@ -678,6 +678,7 @@ parse_txn(struct server_config *config, struct db *db, >> if (!error && !uuid_is_zero(txnid)) { >> db->db->prereq = *txnid; >> } >> + ovsdb_txn_history_run(db->db); >> } >> return error; >> } >> diff --git a/ovsdb/relay.c b/ovsdb/relay.c >> index b035cb492..27ff196b7 100644 >> --- a/ovsdb/relay.c >> +++ b/ovsdb/relay.c >> @@ -413,6 +413,7 @@ ovsdb_relay_run(void) >> } >> ovsdb_cs_event_destroy(event); >> } >> + ovsdb_txn_history_run(ctx->db); >> } >> } >> > > It's always cool and somewhat worrying to see such a > commit-log-size-to-loc ratio but I think this is fine: > > Acked-by: Dumitru Ceara <dceara@redhat.com> Thanks! Applied and backported down to 2.17. Best regards, Ilya Maximets.
diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c index 8e623118b..cf09c9079 100644 --- a/ovsdb/ovsdb-server.c +++ b/ovsdb/ovsdb-server.c @@ -235,7 +235,7 @@ main_loop(struct server_config *config, SHASH_FOR_EACH_SAFE (node, all_dbs) { struct db *db = node->data; - ovsdb_txn_history_run(db->db); + ovsdb_storage_run(db->db->storage); read_db(config, db); /* Run triggers after storage_run and read_db to make sure new raft @@ -678,6 +678,7 @@ parse_txn(struct server_config *config, struct db *db, if (!error && !uuid_is_zero(txnid)) { db->db->prereq = *txnid; } + ovsdb_txn_history_run(db->db); } return error; } diff --git a/ovsdb/relay.c b/ovsdb/relay.c index b035cb492..27ff196b7 100644 --- a/ovsdb/relay.c +++ b/ovsdb/relay.c @@ -413,6 +413,7 @@ ovsdb_relay_run(void) } ovsdb_cs_event_destroy(event); } + ovsdb_txn_history_run(ctx->db); } }
During initial read of a database file all the file transactions are added to the transaction history. The history run with the history size checks is only executed after the whole file is processed. If, for some reason, the file contains way too many transactions, this behavior may result in excessive memory consumption up to hundreds of GBs. For example, here is a log entry about memory usage after reading a file with 100K+ OVN NbDB transactions: |00004|memory|INFO|95650400 kB peak resident set size after 96.9 seconds |00005|memory|INFO|atoms:3083346 cells:1838767 monitors:0 raft-log:123309 txn-history:123307 txn-history-atoms:1647022868 In this particular case ovsdb-server allocated 95 GB of RAM in order to accommodate 1.6 billion ovsdb atoms in the history, while only 3 million atoms are in the actual database. Fix that by running history size checks after applying each file transaction. This way the memory usage while reading the database from the example stays at about 1 GB mark. History size checks are cheap in comparison with transaction replay, so the additional calls do not reduce performance. We could've just moved the history run into ovsdb_txn_replay_commit(), but it seems more organic to call it externally, since we have init() and destroy() functions called externally as well. Since the history run will be executed shortly after reading the database and actual memory consumption peak is not always logged, there seem to be no reliable way to unit test for the issue without adding extra testing infrastructure into the code. Fixes: 695e81502794 ("ovsdb-server: Transaction history tracking.") Reported-at: https://bugzilla.redhat.com/2228464 Signed-off-by: Ilya Maximets <i.maximets@ovn.org> --- ovsdb/ovsdb-server.c | 3 ++- ovsdb/relay.c | 1 + 2 files changed, 3 insertions(+), 1 deletion(-)