diff mbox series

[ovs-dev] ovsdb-server: Fix excessive memory usage on DB open.

Message ID 20230802134532.2370039-1-i.maximets@ovn.org
State Accepted
Commit bd2a80b1df4bd22446e9c80e7865c512e460a3c9
Headers show
Series [ovs-dev] ovsdb-server: Fix excessive memory usage on DB open. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation fail test: fail

Commit Message

Ilya Maximets Aug. 2, 2023, 1:45 p.m. UTC
During initial read of a database file all the file transactions are
added to the transaction history.  The history run with the history
size checks is only executed after the whole file is processed.
If, for some reason, the file contains way too many transactions,
this behavior may result in excessive memory consumption up to
hundreds of GBs.  For example, here is a log entry about memory usage
after reading a file with 100K+ OVN NbDB transactions:

  |00004|memory|INFO|95650400 kB peak resident set size after 96.9 seconds
  |00005|memory|INFO|atoms:3083346 cells:1838767 monitors:0
        raft-log:123309 txn-history:123307 txn-history-atoms:1647022868

In this particular case ovsdb-server allocated 95 GB of RAM in order
to accommodate 1.6 billion ovsdb atoms in the history, while only 3
million atoms are in the actual database.

Fix that by running history size checks after applying each file
transaction.  This way the memory usage while reading the database
from the example stays at about 1 GB mark.  History size checks are
cheap in comparison with transaction replay, so the additional calls
do not reduce performance.

We could've just moved the history run into ovsdb_txn_replay_commit(),
but it seems more organic to call it externally, since we have init()
and destroy() functions called externally as well.

Since the history run will be executed shortly after reading the
database and actual memory consumption peak is not always logged,
there seem to be no reliable way to unit test for the issue without
adding extra testing infrastructure into the code.

Fixes: 695e81502794 ("ovsdb-server: Transaction history tracking.")
Reported-at: https://bugzilla.redhat.com/2228464
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
 ovsdb/ovsdb-server.c | 3 ++-
 ovsdb/relay.c        | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

Comments

Dumitru Ceara Aug. 2, 2023, 8:41 p.m. UTC | #1
On 8/2/23 15:45, Ilya Maximets wrote:
> During initial read of a database file all the file transactions are
> added to the transaction history.  The history run with the history
> size checks is only executed after the whole file is processed.
> If, for some reason, the file contains way too many transactions,
> this behavior may result in excessive memory consumption up to
> hundreds of GBs.  For example, here is a log entry about memory usage
> after reading a file with 100K+ OVN NbDB transactions:
> 
>   |00004|memory|INFO|95650400 kB peak resident set size after 96.9 seconds
>   |00005|memory|INFO|atoms:3083346 cells:1838767 monitors:0
>         raft-log:123309 txn-history:123307 txn-history-atoms:1647022868
> 
> In this particular case ovsdb-server allocated 95 GB of RAM in order
> to accommodate 1.6 billion ovsdb atoms in the history, while only 3
> million atoms are in the actual database.

Wow..

> 
> Fix that by running history size checks after applying each file
> transaction.  This way the memory usage while reading the database
> from the example stays at about 1 GB mark.  History size checks are
> cheap in comparison with transaction replay, so the additional calls
> do not reduce performance.
> 
> We could've just moved the history run into ovsdb_txn_replay_commit(),
> but it seems more organic to call it externally, since we have init()
> and destroy() functions called externally as well.
> 
> Since the history run will be executed shortly after reading the
> database and actual memory consumption peak is not always logged,
> there seem to be no reliable way to unit test for the issue without
> adding extra testing infrastructure into the code.
> 
> Fixes: 695e81502794 ("ovsdb-server: Transaction history tracking.")
> Reported-at: https://bugzilla.redhat.com/2228464
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
>  ovsdb/ovsdb-server.c | 3 ++-
>  ovsdb/relay.c        | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
> index 8e623118b..cf09c9079 100644
> --- a/ovsdb/ovsdb-server.c
> +++ b/ovsdb/ovsdb-server.c
> @@ -235,7 +235,7 @@ main_loop(struct server_config *config,
>  
>          SHASH_FOR_EACH_SAFE (node, all_dbs) {
>              struct db *db = node->data;
> -            ovsdb_txn_history_run(db->db);
> +
>              ovsdb_storage_run(db->db->storage);
>              read_db(config, db);
>              /* Run triggers after storage_run and read_db to make sure new raft
> @@ -678,6 +678,7 @@ parse_txn(struct server_config *config, struct db *db,
>          if (!error && !uuid_is_zero(txnid)) {
>              db->db->prereq = *txnid;
>          }
> +        ovsdb_txn_history_run(db->db);
>      }
>      return error;
>  }
> diff --git a/ovsdb/relay.c b/ovsdb/relay.c
> index b035cb492..27ff196b7 100644
> --- a/ovsdb/relay.c
> +++ b/ovsdb/relay.c
> @@ -413,6 +413,7 @@ ovsdb_relay_run(void)
>              }
>              ovsdb_cs_event_destroy(event);
>          }
> +        ovsdb_txn_history_run(ctx->db);
>      }
>  }
>  

It's always cool and somewhat worrying to see such a
commit-log-size-to-loc ratio but I think this is fine:

Acked-by: Dumitru Ceara <dceara@redhat.com>

Regards,
Dumitru
Ilya Maximets Aug. 3, 2023, 2:44 p.m. UTC | #2
On 8/2/23 22:41, Dumitru Ceara wrote:
> On 8/2/23 15:45, Ilya Maximets wrote:
>> During initial read of a database file all the file transactions are
>> added to the transaction history.  The history run with the history
>> size checks is only executed after the whole file is processed.
>> If, for some reason, the file contains way too many transactions,
>> this behavior may result in excessive memory consumption up to
>> hundreds of GBs.  For example, here is a log entry about memory usage
>> after reading a file with 100K+ OVN NbDB transactions:
>>
>>   |00004|memory|INFO|95650400 kB peak resident set size after 96.9 seconds
>>   |00005|memory|INFO|atoms:3083346 cells:1838767 monitors:0
>>         raft-log:123309 txn-history:123307 txn-history-atoms:1647022868
>>
>> In this particular case ovsdb-server allocated 95 GB of RAM in order
>> to accommodate 1.6 billion ovsdb atoms in the history, while only 3
>> million atoms are in the actual database.
> 
> Wow..
> 
>>
>> Fix that by running history size checks after applying each file
>> transaction.  This way the memory usage while reading the database
>> from the example stays at about 1 GB mark.  History size checks are
>> cheap in comparison with transaction replay, so the additional calls
>> do not reduce performance.
>>
>> We could've just moved the history run into ovsdb_txn_replay_commit(),
>> but it seems more organic to call it externally, since we have init()
>> and destroy() functions called externally as well.
>>
>> Since the history run will be executed shortly after reading the
>> database and actual memory consumption peak is not always logged,
>> there seem to be no reliable way to unit test for the issue without
>> adding extra testing infrastructure into the code.
>>
>> Fixes: 695e81502794 ("ovsdb-server: Transaction history tracking.")
>> Reported-at: https://bugzilla.redhat.com/2228464
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
>>  ovsdb/ovsdb-server.c | 3 ++-
>>  ovsdb/relay.c        | 1 +
>>  2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
>> index 8e623118b..cf09c9079 100644
>> --- a/ovsdb/ovsdb-server.c
>> +++ b/ovsdb/ovsdb-server.c
>> @@ -235,7 +235,7 @@ main_loop(struct server_config *config,
>>  
>>          SHASH_FOR_EACH_SAFE (node, all_dbs) {
>>              struct db *db = node->data;
>> -            ovsdb_txn_history_run(db->db);
>> +
>>              ovsdb_storage_run(db->db->storage);
>>              read_db(config, db);
>>              /* Run triggers after storage_run and read_db to make sure new raft
>> @@ -678,6 +678,7 @@ parse_txn(struct server_config *config, struct db *db,
>>          if (!error && !uuid_is_zero(txnid)) {
>>              db->db->prereq = *txnid;
>>          }
>> +        ovsdb_txn_history_run(db->db);
>>      }
>>      return error;
>>  }
>> diff --git a/ovsdb/relay.c b/ovsdb/relay.c
>> index b035cb492..27ff196b7 100644
>> --- a/ovsdb/relay.c
>> +++ b/ovsdb/relay.c
>> @@ -413,6 +413,7 @@ ovsdb_relay_run(void)
>>              }
>>              ovsdb_cs_event_destroy(event);
>>          }
>> +        ovsdb_txn_history_run(ctx->db);
>>      }
>>  }
>>  
> 
> It's always cool and somewhat worrying to see such a
> commit-log-size-to-loc ratio but I think this is fine:
> 
> Acked-by: Dumitru Ceara <dceara@redhat.com>

Thanks!  Applied and backported down to 2.17.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
index 8e623118b..cf09c9079 100644
--- a/ovsdb/ovsdb-server.c
+++ b/ovsdb/ovsdb-server.c
@@ -235,7 +235,7 @@  main_loop(struct server_config *config,
 
         SHASH_FOR_EACH_SAFE (node, all_dbs) {
             struct db *db = node->data;
-            ovsdb_txn_history_run(db->db);
+
             ovsdb_storage_run(db->db->storage);
             read_db(config, db);
             /* Run triggers after storage_run and read_db to make sure new raft
@@ -678,6 +678,7 @@  parse_txn(struct server_config *config, struct db *db,
         if (!error && !uuid_is_zero(txnid)) {
             db->db->prereq = *txnid;
         }
+        ovsdb_txn_history_run(db->db);
     }
     return error;
 }
diff --git a/ovsdb/relay.c b/ovsdb/relay.c
index b035cb492..27ff196b7 100644
--- a/ovsdb/relay.c
+++ b/ovsdb/relay.c
@@ -413,6 +413,7 @@  ovsdb_relay_run(void)
             }
             ovsdb_cs_event_destroy(event);
         }
+        ovsdb_txn_history_run(ctx->db);
     }
 }