Message ID | 20221102040908.3292152-2-hzhou@ovn.org |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev,1/2] ovsdb/transaction.c: Refactor assess_weak_refs. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/intel-ovs-compilation | success | test: success |
On 11/2/22 05:09, Han Zhou wrote: > When a row is deleted, if the row has weak references to other rows, the > weak reference nodes attached to the destination rows (through > weak->dst_node hmap) are not destroyed. > > Deleting weak references is properly handled when a row is modified. The > removed references are taken care by: > 1. assess_weak_refs() figures out the deleted references from the row > and add them to txn_row->deleted_refs. > 2. before commit, in ovsdb_txn_update_weak_refs() it finds the > destination row for each item in txn_row->deleted_refs (from step 1), > and destroy the corresponding weak references of the destionation row. > > However, when the row is deleted, the step 1 in assess_weak_refs() is > missing. It directly returns without adding the deleted references to > txn_row->deleted_refs. So, the detination nodes will keep those weak > references although the source side of the references are already > deleted. When such rows that originating weak references are created > and deleted, more and more such useless weak reference structures > accumulate in the memory, and can stay there until the destination rows > are deleted. It is possible that the detination row is never deleted, > and in such case the ovsdb-server memory keeps growing (although it is > not strictly memory leak, because the structures are still referenced). > > This problem has an impact to applications like OVN SB DB - the memory > grows very fast in long-runing deployments and finally causes OOM. > > This patch fixes it by generating deleted_refs for deleted rows in > assess_weak_refs(). > > Fixes: 4dbff9f0a685 ("ovsdb: transaction: Incremental reassessment of weak refs.") > Signed-off-by: Han Zhou <hzhou@ovn.org> > --- > ovsdb/transaction.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) Hi, Han. I reproduced the issue by continuously adding and removing logical switches in OVN setup that already has some other logical switches pre-created. Number of weak reference objects grows without bound. The patch fixes the problem and looks good to me. I also ran some of our scale tests with ovn-heater and observed a bit lower memory consumption overall with the fix applied (These tests do not really remove a lot of resources). Good catch! Thanks! I slightly re-named the 'area' part of the patch subject just to be make the look similar to other commits in that area. With that, applied and backported down to 2.17. Best regards, Ilya Maximets.
On Fri, Nov 4, 2022 at 10:22 AM Ilya Maximets <i.maximets@ovn.org> wrote: > > On 11/2/22 05:09, Han Zhou wrote: > > When a row is deleted, if the row has weak references to other rows, the > > weak reference nodes attached to the destination rows (through > > weak->dst_node hmap) are not destroyed. > > > > Deleting weak references is properly handled when a row is modified. The > > removed references are taken care by: > > 1. assess_weak_refs() figures out the deleted references from the row > > and add them to txn_row->deleted_refs. > > 2. before commit, in ovsdb_txn_update_weak_refs() it finds the > > destination row for each item in txn_row->deleted_refs (from step 1), > > and destroy the corresponding weak references of the destionation row. > > > > However, when the row is deleted, the step 1 in assess_weak_refs() is > > missing. It directly returns without adding the deleted references to > > txn_row->deleted_refs. So, the detination nodes will keep those weak > > references although the source side of the references are already > > deleted. When such rows that originating weak references are created > > and deleted, more and more such useless weak reference structures > > accumulate in the memory, and can stay there until the destination rows > > are deleted. It is possible that the detination row is never deleted, > > and in such case the ovsdb-server memory keeps growing (although it is > > not strictly memory leak, because the structures are still referenced). > > > > This problem has an impact to applications like OVN SB DB - the memory > > grows very fast in long-runing deployments and finally causes OOM. > > > > This patch fixes it by generating deleted_refs for deleted rows in > > assess_weak_refs(). > > > > Fixes: 4dbff9f0a685 ("ovsdb: transaction: Incremental reassessment of weak refs.") > > Signed-off-by: Han Zhou <hzhou@ovn.org> > > --- > > ovsdb/transaction.c | 11 +++++++++-- > > 1 file changed, 9 insertions(+), 2 deletions(-) > > Hi, Han. > > I reproduced the issue by continuously adding and removing logical > switches in OVN setup that already has some other logical switches > pre-created. Number of weak reference objects grows without bound. > > The patch fixes the problem and looks good to me. > > I also ran some of our scale tests with ovn-heater and observed > a bit lower memory consumption overall with the fix applied (These > tests do not really remove a lot of resources). > > Good catch! Thanks! > > I slightly re-named the 'area' part of the patch subject just to > be make the look similar to other commits in that area. > > With that, applied and backported down to 2.17. > > Best regards, Ilya Maximets. Thanks Ilya!
diff --git a/ovsdb/transaction.c b/ovsdb/transaction.c index 1c418349429d..b0c0c311c0ef 100644 --- a/ovsdb/transaction.c +++ b/ovsdb/transaction.c @@ -665,7 +665,7 @@ static struct ovsdb_error * OVS_WARN_UNUSED_RESULT assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row) { struct ovsdb_weak_ref *weak; - struct ovsdb_table *table; + struct ovsdb_table *table = txn_row->table; struct shash_node *node; if (txn_row->old && !txn_row->new) { @@ -687,6 +687,14 @@ assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row) ovs_assert(ovs_list_is_empty(&weak->src_node)); ovs_list_insert(&src_txn_row->deleted_refs, &weak->src_node); } + + /* Creating refs that needs to be removed on commit. */ + SHASH_FOR_EACH (node, &table->schema->columns) { + const struct ovsdb_column *column = node->data; + struct ovsdb_datum *datum = &txn_row->old->fields[column->index]; + find_and_add_weak_refs(txn_row->old, datum, column, + &txn_row->deleted_refs, NULL, NULL); + } } if (!txn_row->new) { @@ -697,7 +705,6 @@ assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row) return NULL; } - table = txn_row->table; SHASH_FOR_EACH (node, &table->schema->columns) { const struct ovsdb_column *column = node->data; struct ovsdb_datum *datum = &txn_row->new->fields[column->index];
When a row is deleted, if the row has weak references to other rows, the weak reference nodes attached to the destination rows (through weak->dst_node hmap) are not destroyed. Deleting weak references is properly handled when a row is modified. The removed references are taken care by: 1. assess_weak_refs() figures out the deleted references from the row and add them to txn_row->deleted_refs. 2. before commit, in ovsdb_txn_update_weak_refs() it finds the destination row for each item in txn_row->deleted_refs (from step 1), and destroy the corresponding weak references of the destionation row. However, when the row is deleted, the step 1 in assess_weak_refs() is missing. It directly returns without adding the deleted references to txn_row->deleted_refs. So, the detination nodes will keep those weak references although the source side of the references are already deleted. When such rows that originating weak references are created and deleted, more and more such useless weak reference structures accumulate in the memory, and can stay there until the destination rows are deleted. It is possible that the detination row is never deleted, and in such case the ovsdb-server memory keeps growing (although it is not strictly memory leak, because the structures are still referenced). This problem has an impact to applications like OVN SB DB - the memory grows very fast in long-runing deployments and finally causes OOM. This patch fixes it by generating deleted_refs for deleted rows in assess_weak_refs(). Fixes: 4dbff9f0a685 ("ovsdb: transaction: Incremental reassessment of weak refs.") Signed-off-by: Han Zhou <hzhou@ovn.org> --- ovsdb/transaction.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)