diff mbox series

[ovs-dev,2/2] ovsdb/transaction.c: Fix weak reference leak.

Message ID 20221102040908.3292152-2-hzhou@ovn.org
State Accepted
Headers show
Series [ovs-dev,1/2] ovsdb/transaction.c: Refactor assess_weak_refs. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Han Zhou Nov. 2, 2022, 4:09 a.m. UTC
When a row is deleted, if the row has weak references to other rows, the
weak reference nodes attached to the destination rows (through
weak->dst_node hmap) are not destroyed.

Deleting weak references is properly handled when a row is modified. The
removed references are taken care by:
1. assess_weak_refs() figures out the deleted references from the row
   and add them to txn_row->deleted_refs.
2. before commit, in ovsdb_txn_update_weak_refs() it finds the
   destination row for each item in txn_row->deleted_refs (from step 1),
   and destroy the corresponding weak references of the destionation row.

However, when the row is deleted, the step 1 in assess_weak_refs() is
missing. It directly returns without adding the deleted references to
txn_row->deleted_refs. So, the detination nodes will keep those weak
references although the source side of the references are already
deleted.  When such rows that originating weak references are created
and deleted, more and more such useless weak reference structures
accumulate in the memory, and can stay there until the destination rows
are deleted. It is possible that the detination row is never deleted,
and in such case the ovsdb-server memory keeps growing (although it is
not strictly memory leak, because the structures are still referenced).

This problem has an impact to applications like OVN SB DB - the memory
grows very fast in long-runing deployments and finally causes OOM.

This patch fixes it by generating deleted_refs for deleted rows in
assess_weak_refs().

Fixes: 4dbff9f0a685 ("ovsdb: transaction: Incremental reassessment of weak refs.")
Signed-off-by: Han Zhou <hzhou@ovn.org>
---
 ovsdb/transaction.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Ilya Maximets Nov. 4, 2022, 5:22 p.m. UTC | #1
On 11/2/22 05:09, Han Zhou wrote:
> When a row is deleted, if the row has weak references to other rows, the
> weak reference nodes attached to the destination rows (through
> weak->dst_node hmap) are not destroyed.
> 
> Deleting weak references is properly handled when a row is modified. The
> removed references are taken care by:
> 1. assess_weak_refs() figures out the deleted references from the row
>    and add them to txn_row->deleted_refs.
> 2. before commit, in ovsdb_txn_update_weak_refs() it finds the
>    destination row for each item in txn_row->deleted_refs (from step 1),
>    and destroy the corresponding weak references of the destionation row.
> 
> However, when the row is deleted, the step 1 in assess_weak_refs() is
> missing. It directly returns without adding the deleted references to
> txn_row->deleted_refs. So, the detination nodes will keep those weak
> references although the source side of the references are already
> deleted.  When such rows that originating weak references are created
> and deleted, more and more such useless weak reference structures
> accumulate in the memory, and can stay there until the destination rows
> are deleted. It is possible that the detination row is never deleted,
> and in such case the ovsdb-server memory keeps growing (although it is
> not strictly memory leak, because the structures are still referenced).
> 
> This problem has an impact to applications like OVN SB DB - the memory
> grows very fast in long-runing deployments and finally causes OOM.
> 
> This patch fixes it by generating deleted_refs for deleted rows in
> assess_weak_refs().
> 
> Fixes: 4dbff9f0a685 ("ovsdb: transaction: Incremental reassessment of weak refs.")
> Signed-off-by: Han Zhou <hzhou@ovn.org>
> ---
>  ovsdb/transaction.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)

Hi, Han.

I reproduced the issue by continuously adding and removing logical
switches in OVN setup that already has some other logical switches
pre-created.  Number of weak reference objects grows without bound.

The patch fixes the problem and looks good to me.

I also ran some of our scale tests with ovn-heater and observed
a bit lower memory consumption overall with the fix applied (These
tests do not really remove a lot of resources).

Good catch!  Thanks!

I slightly re-named the 'area' part of the patch subject just to
be make the look similar to other commits in that area.

With that, applied and backported down to 2.17.

Best regards, Ilya Maximets.
Han Zhou Nov. 4, 2022, 9:31 p.m. UTC | #2
On Fri, Nov 4, 2022 at 10:22 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 11/2/22 05:09, Han Zhou wrote:
> > When a row is deleted, if the row has weak references to other rows, the
> > weak reference nodes attached to the destination rows (through
> > weak->dst_node hmap) are not destroyed.
> >
> > Deleting weak references is properly handled when a row is modified. The
> > removed references are taken care by:
> > 1. assess_weak_refs() figures out the deleted references from the row
> >    and add them to txn_row->deleted_refs.
> > 2. before commit, in ovsdb_txn_update_weak_refs() it finds the
> >    destination row for each item in txn_row->deleted_refs (from step 1),
> >    and destroy the corresponding weak references of the destionation
row.
> >
> > However, when the row is deleted, the step 1 in assess_weak_refs() is
> > missing. It directly returns without adding the deleted references to
> > txn_row->deleted_refs. So, the detination nodes will keep those weak
> > references although the source side of the references are already
> > deleted.  When such rows that originating weak references are created
> > and deleted, more and more such useless weak reference structures
> > accumulate in the memory, and can stay there until the destination rows
> > are deleted. It is possible that the detination row is never deleted,
> > and in such case the ovsdb-server memory keeps growing (although it is
> > not strictly memory leak, because the structures are still referenced).
> >
> > This problem has an impact to applications like OVN SB DB - the memory
> > grows very fast in long-runing deployments and finally causes OOM.
> >
> > This patch fixes it by generating deleted_refs for deleted rows in
> > assess_weak_refs().
> >
> > Fixes: 4dbff9f0a685 ("ovsdb: transaction: Incremental reassessment of
weak refs.")
> > Signed-off-by: Han Zhou <hzhou@ovn.org>
> > ---
> >  ovsdb/transaction.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
>
> Hi, Han.
>
> I reproduced the issue by continuously adding and removing logical
> switches in OVN setup that already has some other logical switches
> pre-created.  Number of weak reference objects grows without bound.
>
> The patch fixes the problem and looks good to me.
>
> I also ran some of our scale tests with ovn-heater and observed
> a bit lower memory consumption overall with the fix applied (These
> tests do not really remove a lot of resources).
>
> Good catch!  Thanks!
>
> I slightly re-named the 'area' part of the patch subject just to
> be make the look similar to other commits in that area.
>
> With that, applied and backported down to 2.17.
>
> Best regards, Ilya Maximets.

Thanks Ilya!
diff mbox series

Patch

diff --git a/ovsdb/transaction.c b/ovsdb/transaction.c
index 1c418349429d..b0c0c311c0ef 100644
--- a/ovsdb/transaction.c
+++ b/ovsdb/transaction.c
@@ -665,7 +665,7 @@  static struct ovsdb_error * OVS_WARN_UNUSED_RESULT
 assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row)
 {
     struct ovsdb_weak_ref *weak;
-    struct ovsdb_table *table;
+    struct ovsdb_table *table = txn_row->table;
     struct shash_node *node;
 
     if (txn_row->old && !txn_row->new) {
@@ -687,6 +687,14 @@  assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row)
             ovs_assert(ovs_list_is_empty(&weak->src_node));
             ovs_list_insert(&src_txn_row->deleted_refs, &weak->src_node);
         }
+
+        /* Creating refs that needs to be removed on commit. */
+        SHASH_FOR_EACH (node, &table->schema->columns) {
+            const struct ovsdb_column *column = node->data;
+            struct ovsdb_datum *datum = &txn_row->old->fields[column->index];
+            find_and_add_weak_refs(txn_row->old, datum, column,
+                                   &txn_row->deleted_refs, NULL, NULL);
+        }
     }
 
     if (!txn_row->new) {
@@ -697,7 +705,6 @@  assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row)
         return NULL;
     }
 
-    table = txn_row->table;
     SHASH_FOR_EACH (node, &table->schema->columns) {
         const struct ovsdb_column *column = node->data;
         struct ovsdb_datum *datum = &txn_row->new->fields[column->index];