diff mbox series

[ovs-dev,v3,2/3] ovsdb-idl: Preserve references for deleted rows.

Message ID 20210312120741.17242.17028.stgit@dceara.remote.csb
State Changes Requested
Headers show
Series ovsdb-idl: Preserve references for tracked deleted rows. | expand

Commit Message

Dumitru Ceara March 12, 2021, 12:07 p.m. UTC
Considering two DB rows, 'a' from table A and 'b' from table B (with
column 'ref_a' a reference to table A):
a = {A._uuid=<U1>}
b = {B._uuid=<U2>, B.ref_a=<U1>}

Assuming both records are present in the IDL client's in-memory view of
the database, depending whether row 'b' is also deleted in the same
transaction or not, deletion of row 'a' should generate the following
tracked changes:

1. only row 'a' is deleted:
- for table A:
  - deleted records: a = {A._uuid=<U1>}
- for table B:
  - updated records: b = {B._uuid=<U2>, B.ref_a=[]}

2. row 'a' and row 'b' are deleted in the same update:
- for table A:
  - deleted records: a = {A._uuid=<U1>}
- for table B:
  - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}

To ensure this, we now delay reparsing row backrefs for deleted rows
until all updates in the current run have been processed.

Without this change, in scenario 2 above, the tracked changes for table
B would be:
- deleted records: b = {B._uuid=<U2>, B.ref_a=[]}

In particular, for strong references, row 'a' can never be deleted in
a transaction that happens strictly before row 'b' is deleted.  In some
cases [0] both rows are deleted in the same transaction and having
B.ref_a=[] would violate the integrity of the database from client
perspective.  This would force the client to always validate that
strong reference fields are non-NULL.  This is not really an option
because the information in the original reference is required for
incrementally processing the record deletion.

[0] with ovn-monitor-all=true, the following command triggers a crash
    in ovn-controller because a strong reference field becomes NULL:
    $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp 00:00:00:00:00:01 1.0.0.1/24
    $ ovn-nbctl lr-del r

Reported-at: https://bugzilla.redhat.com/1932642
Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for deleted rows.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
---
 lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
 tests/ovsdb-idl.at  |  203 +++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/test-ovsdb.c  |   50 +++++++++++++
 tests/test-ovsdb.py |   14 ++++
 4 files changed, 370 insertions(+), 32 deletions(-)

Comments

Ilya Maximets March 18, 2021, 6:07 p.m. UTC | #1
On 3/12/21 1:07 PM, Dumitru Ceara wrote:
> Considering two DB rows, 'a' from table A and 'b' from table B (with
> column 'ref_a' a reference to table A):
> a = {A._uuid=<U1>}
> b = {B._uuid=<U2>, B.ref_a=<U1>}
> 
> Assuming both records are present in the IDL client's in-memory view of
> the database, depending whether row 'b' is also deleted in the same
> transaction or not, deletion of row 'a' should generate the following
> tracked changes:
> 
> 1. only row 'a' is deleted:
> - for table A:
>   - deleted records: a = {A._uuid=<U1>}
> - for table B:
>   - updated records: b = {B._uuid=<U2>, B.ref_a=[]}
> 
> 2. row 'a' and row 'b' are deleted in the same update:
> - for table A:
>   - deleted records: a = {A._uuid=<U1>}
> - for table B:
>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
> 
> To ensure this, we now delay reparsing row backrefs for deleted rows
> until all updates in the current run have been processed.
> 
> Without this change, in scenario 2 above, the tracked changes for table
> B would be:
> - deleted records: b = {B._uuid=<U2>, B.ref_a=[]}
> 
> In particular, for strong references, row 'a' can never be deleted in
> a transaction that happens strictly before row 'b' is deleted.  In some
> cases [0] both rows are deleted in the same transaction and having
> B.ref_a=[] would violate the integrity of the database from client
> perspective.  This would force the client to always validate that
> strong reference fields are non-NULL.  This is not really an option
> because the information in the original reference is required for
> incrementally processing the record deletion.
> 
> [0] with ovn-monitor-all=true, the following command triggers a crash
>     in ovn-controller because a strong reference field becomes NULL:
>     $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp 00:00:00:00:00:01 1.0.0.1/24
>     $ ovn-nbctl lr-del r
> 
> Reported-at: https://bugzilla.redhat.com/1932642
> Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for deleted rows.")
> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> ---
>  lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
>  tests/ovsdb-idl.at  |  203 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/test-ovsdb.c  |   50 +++++++++++++
>  tests/test-ovsdb.py |   14 ++++
>  4 files changed, 370 insertions(+), 32 deletions(-)
> 
> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> index 9e1e787..1542da9 100644
> --- a/lib/ovsdb-idl.c
> +++ b/lib/ovsdb-idl.c
> @@ -92,6 +92,7 @@ struct ovsdb_idl {
>      struct ovsdb_idl_txn *txn;
>      struct hmap outstanding_txns;
>      bool verify_write_only;
> +    struct ovs_list orphan_rows; /* All deleted but still referenced rows. */

The name here is a bit confusing.  I mean it's not a list
of orphan rows, at least it's not a list of all the orphan
rows.  So, we should, probably, rename it to something like
'deleted_rows' instead and also rename all related functions.

What do you think?


For the rest of the code... It seems fine, but it's really hard
to review and I still have a feeling that something might go
wrong (not more wrong than in current code, though), but I can't
think of any example.  So, it's, probably, OK.

Best regards, Ilya Maximets.
Han Zhou March 19, 2021, 1:27 a.m. UTC | #2
On Thu, Mar 18, 2021 at 11:07 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 3/12/21 1:07 PM, Dumitru Ceara wrote:
> > Considering two DB rows, 'a' from table A and 'b' from table B (with
> > column 'ref_a' a reference to table A):
> > a = {A._uuid=<U1>}
> > b = {B._uuid=<U2>, B.ref_a=<U1>}
> >
> > Assuming both records are present in the IDL client's in-memory view of
> > the database, depending whether row 'b' is also deleted in the same
> > transaction or not, deletion of row 'a' should generate the following
> > tracked changes:
> >
> > 1. only row 'a' is deleted:
> > - for table A:
> >   - deleted records: a = {A._uuid=<U1>}
> > - for table B:
> >   - updated records: b = {B._uuid=<U2>, B.ref_a=[]}
> >
> > 2. row 'a' and row 'b' are deleted in the same update:
> > - for table A:
> >   - deleted records: a = {A._uuid=<U1>}
> > - for table B:
> >   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
> >
> > To ensure this, we now delay reparsing row backrefs for deleted rows
> > until all updates in the current run have been processed.
> >
> > Without this change, in scenario 2 above, the tracked changes for table
> > B would be:
> > - deleted records: b = {B._uuid=<U2>, B.ref_a=[]}
> >
> > In particular, for strong references, row 'a' can never be deleted in
> > a transaction that happens strictly before row 'b' is deleted.  In some
> > cases [0] both rows are deleted in the same transaction and having
> > B.ref_a=[] would violate the integrity of the database from client
> > perspective.  This would force the client to always validate that
> > strong reference fields are non-NULL.  This is not really an option
> > because the information in the original reference is required for
> > incrementally processing the record deletion.
> >
> > [0] with ovn-monitor-all=true, the following command triggers a crash
> >     in ovn-controller because a strong reference field becomes NULL:
> >     $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp 00:00:00:00:00:01
1.0.0.1/24
> >     $ ovn-nbctl lr-del r
> >
> > Reported-at: https://bugzilla.redhat.com/1932642
> > Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for deleted
rows.")
> > Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> > ---
> >  lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
> >  tests/ovsdb-idl.at  |  203
+++++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/test-ovsdb.c  |   50 +++++++++++++
> >  tests/test-ovsdb.py |   14 ++++
> >  4 files changed, 370 insertions(+), 32 deletions(-)
> >
> > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> > index 9e1e787..1542da9 100644
> > --- a/lib/ovsdb-idl.c
> > +++ b/lib/ovsdb-idl.c
> > @@ -92,6 +92,7 @@ struct ovsdb_idl {
> >      struct ovsdb_idl_txn *txn;
> >      struct hmap outstanding_txns;
> >      bool verify_write_only;
> > +    struct ovs_list orphan_rows; /* All deleted but still referenced
rows. */
>
> The name here is a bit confusing.  I mean it's not a list
> of orphan rows, at least it's not a list of all the orphan
> rows.  So, we should, probably, rename it to something like
> 'deleted_rows' instead and also rename all related functions.
>
> What do you think?
>

I agree with Ilya that the "orphan_rows" is confusing. However,
"deleted_rows" may also be confusing, because the current code also
maintains deleted rows in the track_list of each table (no matter if
tracking is enabled or not). If tracking is enabled, the rows are destroyed
(for real) when ovsdb_idl_track_clear() is called; if not, it is destroyed
in ovsdb_idl_row_destroy_postprocess(). The part of logic was really
confusing. This patch changes that behavior further by introducing this
orphan_rows list as a temporary place (for some situations) before moving
to the track_list. Since this patch already did some refactoring (which is
great), I'd suggest doing a little more to avoid the above confusion: for
deleted rows, shall we just put the untracked rows in a
"deleted_untracked_rows" list (i.e. rename the "orphan_rows" of this
patch), and let the "track_list" only hold the tracked ones, and update the
ovsdb_idl_row_destroy_postprocess() to destroy deleted_untracked_rows
without worrying the track_list. track_list should exist only if tracking
is enabled, and should be taken care by ovsdb_idl_track_clear().

What do you think?

>
> For the rest of the code... It seems fine, but it's really hard
> to review and I still have a feeling that something might go
> wrong (not more wrong than in current code, though), but I can't
> think of any example.  So, it's, probably, OK.
>
> Best regards, Ilya Maximets.
Han Zhou March 19, 2021, 1:41 a.m. UTC | #3
On Thu, Mar 18, 2021 at 6:27 PM Han Zhou <hzhou@ovn.org> wrote:
>
>
>
> On Thu, Mar 18, 2021 at 11:07 AM Ilya Maximets <i.maximets@ovn.org> wrote:
> >
> > On 3/12/21 1:07 PM, Dumitru Ceara wrote:
> > > Considering two DB rows, 'a' from table A and 'b' from table B (with
> > > column 'ref_a' a reference to table A):
> > > a = {A._uuid=<U1>}
> > > b = {B._uuid=<U2>, B.ref_a=<U1>}
> > >
> > > Assuming both records are present in the IDL client's in-memory view
of
> > > the database, depending whether row 'b' is also deleted in the same
> > > transaction or not, deletion of row 'a' should generate the following
> > > tracked changes:
> > >
> > > 1. only row 'a' is deleted:
> > > - for table A:
> > >   - deleted records: a = {A._uuid=<U1>}
> > > - for table B:
> > >   - updated records: b = {B._uuid=<U2>, B.ref_a=[]}
> > >
> > > 2. row 'a' and row 'b' are deleted in the same update:
> > > - for table A:
> > >   - deleted records: a = {A._uuid=<U1>}
> > > - for table B:
> > >   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
> > >
> > > To ensure this, we now delay reparsing row backrefs for deleted rows
> > > until all updates in the current run have been processed.
> > >
> > > Without this change, in scenario 2 above, the tracked changes for
table
> > > B would be:
> > > - deleted records: b = {B._uuid=<U2>, B.ref_a=[]}
> > >
> > > In particular, for strong references, row 'a' can never be deleted in
> > > a transaction that happens strictly before row 'b' is deleted.  In
some
> > > cases [0] both rows are deleted in the same transaction and having
> > > B.ref_a=[] would violate the integrity of the database from client
> > > perspective.  This would force the client to always validate that
> > > strong reference fields are non-NULL.  This is not really an option
> > > because the information in the original reference is required for
> > > incrementally processing the record deletion.
> > >
> > > [0] with ovn-monitor-all=true, the following command triggers a crash
> > >     in ovn-controller because a strong reference field becomes NULL:
> > >     $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp
00:00:00:00:00:01 1.0.0.1/24
> > >     $ ovn-nbctl lr-del r
> > >
> > > Reported-at: https://bugzilla.redhat.com/1932642
> > > Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for deleted
rows.")
> > > Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> > > ---
> > >  lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
> > >  tests/ovsdb-idl.at  |  203
+++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  tests/test-ovsdb.c  |   50 +++++++++++++
> > >  tests/test-ovsdb.py |   14 ++++
> > >  4 files changed, 370 insertions(+), 32 deletions(-)
> > >
> > > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> > > index 9e1e787..1542da9 100644
> > > --- a/lib/ovsdb-idl.c
> > > +++ b/lib/ovsdb-idl.c
> > > @@ -92,6 +92,7 @@ struct ovsdb_idl {
> > >      struct ovsdb_idl_txn *txn;
> > >      struct hmap outstanding_txns;
> > >      bool verify_write_only;
> > > +    struct ovs_list orphan_rows; /* All deleted but still referenced
rows. */
> >
> > The name here is a bit confusing.  I mean it's not a list
> > of orphan rows, at least it's not a list of all the orphan
> > rows.  So, we should, probably, rename it to something like
> > 'deleted_rows' instead and also rename all related functions.
> >
> > What do you think?
> >
>
> I agree with Ilya that the "orphan_rows" is confusing. However,
"deleted_rows" may also be confusing, because the current code also
maintains deleted rows in the track_list of each table (no matter if
tracking is enabled or not). If tracking is enabled, the rows are destroyed
(for real) when ovsdb_idl_track_clear() is called; if not, it is destroyed
in ovsdb_idl_row_destroy_postprocess(). The part of logic was really
confusing. This patch changes that behavior further by introducing this
orphan_rows list as a temporary place (for some situations) before moving
to the track_list. Since this patch already did some refactoring (which is
great), I'd suggest doing a little more to avoid the above confusion: for
deleted rows, shall we just put the untracked rows in a
"deleted_untracked_rows" list (i.e. rename the "orphan_rows" of this
patch), and let the "track_list" only hold the tracked ones, and update the
ovsdb_idl_row_destroy_postprocess() to destroy deleted_untracked_rows
without worrying the track_list. track_list should exist only if tracking
is enabled, and should be taken care by ovsdb_idl_track_clear().
>
> What do you think?
>

I took a second look at the patch, and it seems when tracking is not
enabled it would result in the rows in "orphan_rows" leaked, because in
ovsdb_idl_process_orphans() the rows are moved to "track_list" ONLY IF
tracking is enabled for the table. If I am correct, would the above
suggested refactoring help avoiding such kinds of problems?

Thanks,
Han

> >
> > For the rest of the code... It seems fine, but it's really hard
> > to review and I still have a feeling that something might go
> > wrong (not more wrong than in current code, though), but I can't
> > think of any example.  So, it's, probably, OK.
> >
> > Best regards, Ilya Maximets.
Dumitru Ceara March 19, 2021, 8:42 a.m. UTC | #4
On 3/19/21 2:41 AM, Han Zhou wrote:
> 
> 
> On Thu, Mar 18, 2021 at 6:27 PM Han Zhou <hzhou@ovn.org
> <mailto:hzhou@ovn.org>> wrote:
>>
>>
>>
>> On Thu, Mar 18, 2021 at 11:07 AM Ilya Maximets <i.maximets@ovn.org
> <mailto:i.maximets@ovn.org>> wrote:
>> >
>> > On 3/12/21 1:07 PM, Dumitru Ceara wrote:
>> > > Considering two DB rows, 'a' from table A and 'b' from table B (with
>> > > column 'ref_a' a reference to table A):
>> > > a = {A._uuid=<U1>}
>> > > b = {B._uuid=<U2>, B.ref_a=<U1>}
>> > >
>> > > Assuming both records are present in the IDL client's in-memory
> view of
>> > > the database, depending whether row 'b' is also deleted in the same
>> > > transaction or not, deletion of row 'a' should generate the following
>> > > tracked changes:
>> > >
>> > > 1. only row 'a' is deleted:
>> > > - for table A:
>> > >   - deleted records: a = {A._uuid=<U1>}
>> > > - for table B:
>> > >   - updated records: b = {B._uuid=<U2>, B.ref_a=[]}
>> > >
>> > > 2. row 'a' and row 'b' are deleted in the same update:
>> > > - for table A:
>> > >   - deleted records: a = {A._uuid=<U1>}
>> > > - for table B:
>> > >   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
>> > >
>> > > To ensure this, we now delay reparsing row backrefs for deleted rows
>> > > until all updates in the current run have been processed.
>> > >
>> > > Without this change, in scenario 2 above, the tracked changes for
> table
>> > > B would be:
>> > > - deleted records: b = {B._uuid=<U2>, B.ref_a=[]}
>> > >
>> > > In particular, for strong references, row 'a' can never be deleted in
>> > > a transaction that happens strictly before row 'b' is deleted.  In
> some
>> > > cases [0] both rows are deleted in the same transaction and having
>> > > B.ref_a=[] would violate the integrity of the database from client
>> > > perspective.  This would force the client to always validate that
>> > > strong reference fields are non-NULL.  This is not really an option
>> > > because the information in the original reference is required for
>> > > incrementally processing the record deletion.
>> > >
>> > > [0] with ovn-monitor-all=true, the following command triggers a crash
>> > >     in ovn-controller because a strong reference field becomes NULL:
>> > >     $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp
> 00:00:00:00:00:01 1.0.0.1/24 <http://1.0.0.1/24>
>> > >     $ ovn-nbctl lr-del r
>> > >
>> > > Reported-at: https://bugzilla.redhat.com/1932642
> <https://bugzilla.redhat.com/1932642>
>> > > Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for
> deleted rows.")
>> > > Signed-off-by: Dumitru Ceara <dceara@redhat.com
> <mailto:dceara@redhat.com>>
>> > > ---
>> > >  lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
>> > >  tests/ovsdb-idl.at <http://ovsdb-idl.at>  |  203
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >  tests/test-ovsdb.c  |   50 +++++++++++++
>> > >  tests/test-ovsdb.py |   14 ++++
>> > >  4 files changed, 370 insertions(+), 32 deletions(-)
>> > >
>> > > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
>> > > index 9e1e787..1542da9 100644
>> > > --- a/lib/ovsdb-idl.c
>> > > +++ b/lib/ovsdb-idl.c
>> > > @@ -92,6 +92,7 @@ struct ovsdb_idl {
>> > >      struct ovsdb_idl_txn *txn;
>> > >      struct hmap outstanding_txns;
>> > >      bool verify_write_only;
>> > > +    struct ovs_list orphan_rows; /* All deleted but still
> referenced rows. */
>> >
>> > The name here is a bit confusing.  I mean it's not a list
>> > of orphan rows, at least it's not a list of all the orphan
>> > rows.  So, we should, probably, rename it to something like
>> > 'deleted_rows' instead and also rename all related functions.
>> >
>> > What do you think?
>> >
>>
>> I agree with Ilya that the "orphan_rows" is confusing. However,
> "deleted_rows" may also be confusing, because the current code also
> maintains deleted rows in the track_list of each table (no matter if
> tracking is enabled or not). If tracking is enabled, the rows are
> destroyed (for real) when ovsdb_idl_track_clear() is called; if not, it
> is destroyed in ovsdb_idl_row_destroy_postprocess(). The part of logic
> was really confusing. This patch changes that behavior further by
> introducing this orphan_rows list as a temporary place (for some
> situations) before moving to the track_list. Since this patch already
> did some refactoring (which is great), I'd suggest doing a little more
> to avoid the above confusion: for deleted rows, shall we just put the
> untracked rows in a "deleted_untracked_rows" list (i.e. rename the
> "orphan_rows" of this patch), and let the "track_list" only hold the
> tracked ones, and update the ovsdb_idl_row_destroy_postprocess() to
> destroy deleted_untracked_rows without worrying the track_list.
> track_list should exist only if tracking is enabled, and should be taken
> care by ovsdb_idl_track_clear().

Thanks, Ilya and Han, for the review and suggestions.

Renaming 'orphan_rows' definitely makes sense, I'll do that.  However,
regarding using a different list for tables without change tracking I'm
not so sure if that's OK.  Even if the table doesn't have change
tracking its deleted rows should still first go to the "temporary"
orphan_rows list.

For example:

Table T1 (no change tracking):
- Row a = {A.uuid=<U1>}

Table T2 (change tracking enabled):
- Row b = {B.uuid=<U2>, B.ref_to_A=<U1>}

If in a transaction both rows 'a' and 'b' are deleted, and updates are
received in this order, if we wouldn't add 'a' to 'orphan_rows' then
we'd be reparsing backrefs for 'a' immediately leading to:

Row b (updated) = {B.uuid=<U2>, B.ref_to_A=None}

Then the update to delete 'b' is processed:

Row b (deleted) = {B.uuid=<U2>, B.ref_to_A=None}

The above is wrong and breaks consistency if ref_to_A is a strong reference.

>>
>> What do you think?
>>
> 
> I took a second look at the patch, and it seems when tracking is not
> enabled it would result in the rows in "orphan_rows" leaked, because in
> ovsdb_idl_process_orphans() the rows are moved to "track_list" ONLY IF
> tracking is enabled for the table. If I am correct, would the above
> suggested refactoring help avoiding such kinds of problems?

Actually, there's no leak AFAICT.  ovsdb_idl_process_orphans() moves
rows to 'track_list' in two scenarios:
a. they're not referenced anymore (ovs_list_is_empty(&row->dst_arcs))
b. they're in a table with change tracking enabled.

If the table doesn't have change tracking enabled and the row is not
referenced anymore it will be added to 'track_list' and will be cleaned
up properly by ovsdb_idl_row_destroy_postprocess() (case "a" above).

If the table doesn't have change tracking enabled but the row (let's
call it 'R1') is still referenced (i.e., it became "orphan") then we
shouldn't delete it yet.  When, in the future, the rows referring to
this orphan row are deleted/updated (let's call them 'R2'),
ovsdb_idl_row_clear_arcs(R2, true) is called and the orphan row R1 is
now properly deleted.

> 
> Thanks,
> Han
> 
>> >
>> > For the rest of the code... It seems fine, but it's really hard
>> > to review and I still have a feeling that something might go
>> > wrong (not more wrong than in current code, though), but I can't
>> > think of any example.  So, it's, probably, OK.

Thanks, I'm not sure how we can improve the stability/reliability of
this code without major refactoring/rewrite.  Until then though, the
fact that we break consistency of the IDL client's view of the database
is something that needs to be addressed because it's taken for granted
by ovn-controller in quite a few cases.

>> >
>> > Best regards, Ilya Maximets.

Regards,
Dumitru
Han Zhou March 19, 2021, 5:55 p.m. UTC | #5
On Fri, Mar 19, 2021 at 1:42 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 3/19/21 2:41 AM, Han Zhou wrote:
> >
> >
> > On Thu, Mar 18, 2021 at 6:27 PM Han Zhou <hzhou@ovn.org
> > <mailto:hzhou@ovn.org>> wrote:
> >>
> >>
> >>
> >> On Thu, Mar 18, 2021 at 11:07 AM Ilya Maximets <i.maximets@ovn.org
> > <mailto:i.maximets@ovn.org>> wrote:
> >> >
> >> > On 3/12/21 1:07 PM, Dumitru Ceara wrote:
> >> > > Considering two DB rows, 'a' from table A and 'b' from table B
(with
> >> > > column 'ref_a' a reference to table A):
> >> > > a = {A._uuid=<U1>}
> >> > > b = {B._uuid=<U2>, B.ref_a=<U1>}
> >> > >
> >> > > Assuming both records are present in the IDL client's in-memory
> > view of
> >> > > the database, depending whether row 'b' is also deleted in the same
> >> > > transaction or not, deletion of row 'a' should generate the
following
> >> > > tracked changes:
> >> > >
> >> > > 1. only row 'a' is deleted:
> >> > > - for table A:
> >> > >   - deleted records: a = {A._uuid=<U1>}
> >> > > - for table B:
> >> > >   - updated records: b = {B._uuid=<U2>, B.ref_a=[]}
> >> > >
> >> > > 2. row 'a' and row 'b' are deleted in the same update:
> >> > > - for table A:
> >> > >   - deleted records: a = {A._uuid=<U1>}
> >> > > - for table B:
> >> > >   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
> >> > >
> >> > > To ensure this, we now delay reparsing row backrefs for deleted
rows
> >> > > until all updates in the current run have been processed.
> >> > >
> >> > > Without this change, in scenario 2 above, the tracked changes for
> > table
> >> > > B would be:
> >> > > - deleted records: b = {B._uuid=<U2>, B.ref_a=[]}
> >> > >
> >> > > In particular, for strong references, row 'a' can never be deleted
in
> >> > > a transaction that happens strictly before row 'b' is deleted.  In
> > some
> >> > > cases [0] both rows are deleted in the same transaction and having
> >> > > B.ref_a=[] would violate the integrity of the database from client
> >> > > perspective.  This would force the client to always validate that
> >> > > strong reference fields are non-NULL.  This is not really an option
> >> > > because the information in the original reference is required for
> >> > > incrementally processing the record deletion.
> >> > >
> >> > > [0] with ovn-monitor-all=true, the following command triggers a
crash
> >> > >     in ovn-controller because a strong reference field becomes
NULL:
> >> > >     $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp
> > 00:00:00:00:00:01 1.0.0.1/24 <http://1.0.0.1/24>
> >> > >     $ ovn-nbctl lr-del r
> >> > >
> >> > > Reported-at: https://bugzilla.redhat.com/1932642
> > <https://bugzilla.redhat.com/1932642>
> >> > > Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for
> > deleted rows.")
> >> > > Signed-off-by: Dumitru Ceara <dceara@redhat.com
> > <mailto:dceara@redhat.com>>
> >> > > ---
> >> > >  lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
> >> > >  tests/ovsdb-idl.at <http://ovsdb-idl.at>  |  203
> > +++++++++++++++++++++++++++++++++++++++++++++++++++
> >> > >  tests/test-ovsdb.c  |   50 +++++++++++++
> >> > >  tests/test-ovsdb.py |   14 ++++
> >> > >  4 files changed, 370 insertions(+), 32 deletions(-)
> >> > >
> >> > > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> >> > > index 9e1e787..1542da9 100644
> >> > > --- a/lib/ovsdb-idl.c
> >> > > +++ b/lib/ovsdb-idl.c
> >> > > @@ -92,6 +92,7 @@ struct ovsdb_idl {
> >> > >      struct ovsdb_idl_txn *txn;
> >> > >      struct hmap outstanding_txns;
> >> > >      bool verify_write_only;
> >> > > +    struct ovs_list orphan_rows; /* All deleted but still
> > referenced rows. */
> >> >
> >> > The name here is a bit confusing.  I mean it's not a list
> >> > of orphan rows, at least it's not a list of all the orphan
> >> > rows.  So, we should, probably, rename it to something like
> >> > 'deleted_rows' instead and also rename all related functions.
> >> >
> >> > What do you think?
> >> >
> >>
> >> I agree with Ilya that the "orphan_rows" is confusing. However,
> > "deleted_rows" may also be confusing, because the current code also
> > maintains deleted rows in the track_list of each table (no matter if
> > tracking is enabled or not). If tracking is enabled, the rows are
> > destroyed (for real) when ovsdb_idl_track_clear() is called; if not, it
> > is destroyed in ovsdb_idl_row_destroy_postprocess(). The part of logic
> > was really confusing. This patch changes that behavior further by
> > introducing this orphan_rows list as a temporary place (for some
> > situations) before moving to the track_list. Since this patch already
> > did some refactoring (which is great), I'd suggest doing a little more
> > to avoid the above confusion: for deleted rows, shall we just put the
> > untracked rows in a "deleted_untracked_rows" list (i.e. rename the
> > "orphan_rows" of this patch), and let the "track_list" only hold the
> > tracked ones, and update the ovsdb_idl_row_destroy_postprocess() to
> > destroy deleted_untracked_rows without worrying the track_list.
> > track_list should exist only if tracking is enabled, and should be taken
> > care by ovsdb_idl_track_clear().
>
> Thanks, Ilya and Han, for the review and suggestions.
>
> Renaming 'orphan_rows' definitely makes sense, I'll do that.  However,
> regarding using a different list for tables without change tracking I'm
> not so sure if that's OK.  Even if the table doesn't have change
> tracking its deleted rows should still first go to the "temporary"
> orphan_rows list.
>
> For example:
>
> Table T1 (no change tracking):
> - Row a = {A.uuid=<U1>}
>
> Table T2 (change tracking enabled):
> - Row b = {B.uuid=<U2>, B.ref_to_A=<U1>}
>
> If in a transaction both rows 'a' and 'b' are deleted, and updates are
> received in this order, if we wouldn't add 'a' to 'orphan_rows' then
> we'd be reparsing backrefs for 'a' immediately leading to:
>
> Row b (updated) = {B.uuid=<U2>, B.ref_to_A=None}
>
> Then the update to delete 'b' is processed:
>
> Row b (deleted) = {B.uuid=<U2>, B.ref_to_A=None}
>
> The above is wrong and breaks consistency if ref_to_A is a strong
reference.
>
Sorry that I didn't make it clear enough. The "deleted_untracked_rows" list
in my suggestion is just a renaming of your "orphan_rows". The real change
I suggested was that let's don't use it as a temporary place to hold these
rows. We can make it clear that deleted_untracked_rows is for deleted
untracked rows and the "track_list" of each table contains only tracked
rows.

In the above example, row a (without change tracking) will be put to
deleted_untracked_rows, but no need to move to "track_list" later.

In ovsdb_idl_row_destroy_postprocess, we can destroy
deleted_untracked_rows, and no need to touch "track_list".

Would this be more clear?

> >>
> >> What do you think?
> >>
> >
> > I took a second look at the patch, and it seems when tracking is not
> > enabled it would result in the rows in "orphan_rows" leaked, because in
> > ovsdb_idl_process_orphans() the rows are moved to "track_list" ONLY IF
> > tracking is enabled for the table. If I am correct, would the above
> > suggested refactoring help avoiding such kinds of problems?
>
> Actually, there's no leak AFAICT.  ovsdb_idl_process_orphans() moves
> rows to 'track_list' in two scenarios:
> a. they're not referenced anymore (ovs_list_is_empty(&row->dst_arcs))
> b. they're in a table with change tracking enabled.
>
> If the table doesn't have change tracking enabled and the row is not
> referenced anymore it will be added to 'track_list' and will be cleaned
> up properly by ovsdb_idl_row_destroy_postprocess() (case "a" above).
>
> If the table doesn't have change tracking enabled but the row (let's
> call it 'R1') is still referenced (i.e., it became "orphan") then we
> shouldn't delete it yet.  When, in the future, the rows referring to
> this orphan row are deleted/updated (let's call them 'R2'),
> ovsdb_idl_row_clear_arcs(R2, true) is called and the orphan row R1 is
> now properly deleted.
>

Ok, it seems no leak at all. I think I was confused by this check:
+        /* Orphan rows that are still unreferenced or are part of tables
that
+         * have hange tracking enabled should be added to their table's
+         * 'track_list'.
+         */
+        if (ovs_list_is_empty(&row->dst_arcs)
+                || ovsdb_idl_track_is_set(row->table)) {
+            ovsdb_idl_row_track_change(row, OVSDB_IDL_CHANGE_DELETE);
+        }

For R1, since "ovs_list_is_empty(&row->dst_arcs)" is true, it will be moved
to track_list and finally get destroyed.
But I think this "if" can be removed because deleted rows should always go
to "track_list" so that they can be destroyed (assume not doing the
refactoring I suggested earlier).
Before this check, ovsdb_idl_row_reparse_backrefs(row) is called to make
sure the backrefs are cleared, so "ovs_list_is_empty(&row->dst_arcs)"
should always be true, right?

> >
> > Thanks,
> > Han
> >
> >> >
> >> > For the rest of the code... It seems fine, but it's really hard
> >> > to review and I still have a feeling that something might go
> >> > wrong (not more wrong than in current code, though), but I can't
> >> > think of any example.  So, it's, probably, OK.
>
> Thanks, I'm not sure how we can improve the stability/reliability of
> this code without major refactoring/rewrite.  Until then though, the
> fact that we break consistency of the IDL client's view of the database
> is something that needs to be addressed because it's taken for granted
> by ovn-controller in quite a few cases.
>
> >> >
> >> > Best regards, Ilya Maximets.
>
> Regards,
> Dumitru
>
Dumitru Ceara March 22, 2021, 11:27 a.m. UTC | #6
On 3/19/21 6:55 PM, Han Zhou wrote:
> On Fri, Mar 19, 2021 at 1:42 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 3/19/21 2:41 AM, Han Zhou wrote:
>>>
>>>
>>> On Thu, Mar 18, 2021 at 6:27 PM Han Zhou <hzhou@ovn.org
>>> <mailto:hzhou@ovn.org>> wrote:
>>>>
>>>>
>>>>
>>>> On Thu, Mar 18, 2021 at 11:07 AM Ilya Maximets <i.maximets@ovn.org
>>> <mailto:i.maximets@ovn.org>> wrote:
>>>>>
>>>>> On 3/12/21 1:07 PM, Dumitru Ceara wrote:
>>>>>> Considering two DB rows, 'a' from table A and 'b' from table B
> (with
>>>>>> column 'ref_a' a reference to table A):
>>>>>> a = {A._uuid=<U1>}
>>>>>> b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>>>>
>>>>>> Assuming both records are present in the IDL client's in-memory
>>> view of
>>>>>> the database, depending whether row 'b' is also deleted in the same
>>>>>> transaction or not, deletion of row 'a' should generate the
> following
>>>>>> tracked changes:
>>>>>>
>>>>>> 1. only row 'a' is deleted:
>>>>>> - for table A:
>>>>>>   - deleted records: a = {A._uuid=<U1>}
>>>>>> - for table B:
>>>>>>   - updated records: b = {B._uuid=<U2>, B.ref_a=[]}
>>>>>>
>>>>>> 2. row 'a' and row 'b' are deleted in the same update:
>>>>>> - for table A:
>>>>>>   - deleted records: a = {A._uuid=<U1>}
>>>>>> - for table B:
>>>>>>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>>>>
>>>>>> To ensure this, we now delay reparsing row backrefs for deleted
> rows
>>>>>> until all updates in the current run have been processed.
>>>>>>
>>>>>> Without this change, in scenario 2 above, the tracked changes for
>>> table
>>>>>> B would be:
>>>>>> - deleted records: b = {B._uuid=<U2>, B.ref_a=[]}
>>>>>>
>>>>>> In particular, for strong references, row 'a' can never be deleted
> in
>>>>>> a transaction that happens strictly before row 'b' is deleted.  In
>>> some
>>>>>> cases [0] both rows are deleted in the same transaction and having
>>>>>> B.ref_a=[] would violate the integrity of the database from client
>>>>>> perspective.  This would force the client to always validate that
>>>>>> strong reference fields are non-NULL.  This is not really an option
>>>>>> because the information in the original reference is required for
>>>>>> incrementally processing the record deletion.
>>>>>>
>>>>>> [0] with ovn-monitor-all=true, the following command triggers a
> crash
>>>>>>     in ovn-controller because a strong reference field becomes
> NULL:
>>>>>>     $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp
>>> 00:00:00:00:00:01 1.0.0.1/24 <http://1.0.0.1/24>
>>>>>>     $ ovn-nbctl lr-del r
>>>>>>
>>>>>> Reported-at: https://bugzilla.redhat.com/1932642
>>> <https://bugzilla.redhat.com/1932642>
>>>>>> Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for
>>> deleted rows.")
>>>>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com
>>> <mailto:dceara@redhat.com>>
>>>>>> ---
>>>>>>  lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
>>>>>>  tests/ovsdb-idl.at <http://ovsdb-idl.at>  |  203
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  tests/test-ovsdb.c  |   50 +++++++++++++
>>>>>>  tests/test-ovsdb.py |   14 ++++
>>>>>>  4 files changed, 370 insertions(+), 32 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
>>>>>> index 9e1e787..1542da9 100644
>>>>>> --- a/lib/ovsdb-idl.c
>>>>>> +++ b/lib/ovsdb-idl.c
>>>>>> @@ -92,6 +92,7 @@ struct ovsdb_idl {
>>>>>>      struct ovsdb_idl_txn *txn;
>>>>>>      struct hmap outstanding_txns;
>>>>>>      bool verify_write_only;
>>>>>> +    struct ovs_list orphan_rows; /* All deleted but still
>>> referenced rows. */
>>>>>
>>>>> The name here is a bit confusing.  I mean it's not a list
>>>>> of orphan rows, at least it's not a list of all the orphan
>>>>> rows.  So, we should, probably, rename it to something like
>>>>> 'deleted_rows' instead and also rename all related functions.
>>>>>
>>>>> What do you think?
>>>>>
>>>>
>>>> I agree with Ilya that the "orphan_rows" is confusing. However,
>>> "deleted_rows" may also be confusing, because the current code also
>>> maintains deleted rows in the track_list of each table (no matter if
>>> tracking is enabled or not). If tracking is enabled, the rows are
>>> destroyed (for real) when ovsdb_idl_track_clear() is called; if not, it
>>> is destroyed in ovsdb_idl_row_destroy_postprocess(). The part of logic
>>> was really confusing. This patch changes that behavior further by
>>> introducing this orphan_rows list as a temporary place (for some
>>> situations) before moving to the track_list. Since this patch already
>>> did some refactoring (which is great), I'd suggest doing a little more
>>> to avoid the above confusion: for deleted rows, shall we just put the
>>> untracked rows in a "deleted_untracked_rows" list (i.e. rename the
>>> "orphan_rows" of this patch), and let the "track_list" only hold the
>>> tracked ones, and update the ovsdb_idl_row_destroy_postprocess() to
>>> destroy deleted_untracked_rows without worrying the track_list.
>>> track_list should exist only if tracking is enabled, and should be taken
>>> care by ovsdb_idl_track_clear().
>>
>> Thanks, Ilya and Han, for the review and suggestions.
>>
>> Renaming 'orphan_rows' definitely makes sense, I'll do that.  However,
>> regarding using a different list for tables without change tracking I'm
>> not so sure if that's OK.  Even if the table doesn't have change
>> tracking its deleted rows should still first go to the "temporary"
>> orphan_rows list.
>>
>> For example:
>>
>> Table T1 (no change tracking):
>> - Row a = {A.uuid=<U1>}
>>
>> Table T2 (change tracking enabled):
>> - Row b = {B.uuid=<U2>, B.ref_to_A=<U1>}
>>
>> If in a transaction both rows 'a' and 'b' are deleted, and updates are
>> received in this order, if we wouldn't add 'a' to 'orphan_rows' then
>> we'd be reparsing backrefs for 'a' immediately leading to:
>>
>> Row b (updated) = {B.uuid=<U2>, B.ref_to_A=None}
>>
>> Then the update to delete 'b' is processed:
>>
>> Row b (deleted) = {B.uuid=<U2>, B.ref_to_A=None}
>>
>> The above is wrong and breaks consistency if ref_to_A is a strong
> reference.
>>
> Sorry that I didn't make it clear enough. The "deleted_untracked_rows" list
> in my suggestion is just a renaming of your "orphan_rows". The real change
> I suggested was that let's don't use it as a temporary place to hold these
> rows. We can make it clear that deleted_untracked_rows is for deleted
> untracked rows and the "track_list" of each table contains only tracked
> rows.

I think there's some confusion: 'orphan_rows' doesn't contain deleted
untracked rows.  What it actually stores is "rows deleted in the current
IDL run regardless of change tracking enabled or not".

> 
> In the above example, row a (without change tracking) will be put to
> deleted_untracked_rows, but no need to move to "track_list" later.
> 
> In ovsdb_idl_row_destroy_postprocess, we can destroy
> deleted_untracked_rows, and no need to touch "track_list".
> 
> Would this be more clear?

The problem is with deleted rows that are part of tables with change
tracking enabled.  Taking the example from the commit log:

Considering two DB rows, 'a' from table A and 'b' from table B (with
column 'ref_a' a reference to table A), both tables with change tracking
enabled:
a = {A._uuid=<U1>}
b = {B._uuid=<U2>, B.ref_a=<U1>}

If row 'a' and row 'b' are deleted in the same update then the IDL user
should be notified of the following events.
- for table A:
  - deleted records: a = {A._uuid=<U1>}
- for table B:
  - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}

To achieve this we need to make sure all "delete" updates are processed
before reparsing any backrefs.  This allows to check in
ovsdb_idl_row_reparse_backrefs() if an arc's source (B in the example
above) is also deleted in the current run, and skip unparsing B.ref_a so
that that the IDL user (e.g., ovn-controller) can still access it.

> 
>>>>
>>>> What do you think?
>>>>
>>>
>>> I took a second look at the patch, and it seems when tracking is not
>>> enabled it would result in the rows in "orphan_rows" leaked, because in
>>> ovsdb_idl_process_orphans() the rows are moved to "track_list" ONLY IF
>>> tracking is enabled for the table. If I am correct, would the above
>>> suggested refactoring help avoiding such kinds of problems?
>>
>> Actually, there's no leak AFAICT.  ovsdb_idl_process_orphans() moves
>> rows to 'track_list' in two scenarios:
>> a. they're not referenced anymore (ovs_list_is_empty(&row->dst_arcs))
>> b. they're in a table with change tracking enabled.
>>
>> If the table doesn't have change tracking enabled and the row is not
>> referenced anymore it will be added to 'track_list' and will be cleaned
>> up properly by ovsdb_idl_row_destroy_postprocess() (case "a" above).
>>
>> If the table doesn't have change tracking enabled but the row (let's
>> call it 'R1') is still referenced (i.e., it became "orphan") then we
>> shouldn't delete it yet.  When, in the future, the rows referring to
>> this orphan row are deleted/updated (let's call them 'R2'),
>> ovsdb_idl_row_clear_arcs(R2, true) is called and the orphan row R1 is
>> now properly deleted.
>>
> 
> Ok, it seems no leak at all. I think I was confused by this check:
> +        /* Orphan rows that are still unreferenced or are part of tables
> that
> +         * have hange tracking enabled should be added to their table's
> +         * 'track_list'.
> +         */
> +        if (ovs_list_is_empty(&row->dst_arcs)
> +                || ovsdb_idl_track_is_set(row->table)) {
> +            ovsdb_idl_row_track_change(row, OVSDB_IDL_CHANGE_DELETE);
> +        }
> 
> For R1, since "ovs_list_is_empty(&row->dst_arcs)" is true, it will be moved
> to track_list and finally get destroyed.

R1 is still referenced by R2 at this point in the example above, that is
there is an arc from R2 to R1.

> But I think this "if" can be removed because deleted rows should always go
> to "track_list" so that they can be destroyed (assume not doing the
> refactoring I suggested earlier).

R1 should not be destroyed until R2 is destroyed too.  The complexity
here exists because for tables with change tracking enabled, the current
design is to always add deleted rows (even if they're still referenced)
to 'track_list'.  But, even for those, ovsdb_idl_track_clear() will not
destroy them as long as they're still referenced.

> Before this check, ovsdb_idl_row_reparse_backrefs(row) is called to make
> sure the backrefs are cleared, so "ovs_list_is_empty(&row->dst_arcs)"
> should always be true, right?

No, even if there was an update to delete R1, R2 might still exist
(e.g., monitor condition change causes R1 to be deleted from the local
database view) and then ovsdb_idl_row_reparse_backrefs(R1) would create
an arc from R2 to R1.

I agree that the existing change tracking code is complex and can
potentially be simplified and/or refactored.  However, that's a big
effort and I think should not be in the scope of this bug fix.  I did
have to refactor some things but I limited myself to only changing parts
that are absolutely required to fix the bug.

For the time being, without this fix ovn-controller is crashing because
of the inconsistent view of the database the IDL change tracking code
creates.

I think this should be fixed as soon as possible.  In my opinion all
other refactoring and hardening should be done as follow up.

Regards,
Dumitru

> 
>>>
>>> Thanks,
>>> Han
>>>
>>>>>
>>>>> For the rest of the code... It seems fine, but it's really hard
>>>>> to review and I still have a feeling that something might go
>>>>> wrong (not more wrong than in current code, though), but I can't
>>>>> think of any example.  So, it's, probably, OK.
>>
>> Thanks, I'm not sure how we can improve the stability/reliability of
>> this code without major refactoring/rewrite.  Until then though, the
>> fact that we break consistency of the IDL client's view of the database
>> is something that needs to be addressed because it's taken for granted
>> by ovn-controller in quite a few cases.
>>
>>>>>
>>>>> Best regards, Ilya Maximets.
>>
>> Regards,
>> Dumitru
>>
>
Han Zhou March 22, 2021, 7:45 p.m. UTC | #7
On Mon, Mar 22, 2021 at 4:27 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 3/19/21 6:55 PM, Han Zhou wrote:
> > On Fri, Mar 19, 2021 at 1:42 AM Dumitru Ceara <dceara@redhat.com> wrote:
> >>
> >> On 3/19/21 2:41 AM, Han Zhou wrote:
> >>>
> >>>
> >>> On Thu, Mar 18, 2021 at 6:27 PM Han Zhou <hzhou@ovn.org
> >>> <mailto:hzhou@ovn.org>> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Mar 18, 2021 at 11:07 AM Ilya Maximets <i.maximets@ovn.org
> >>> <mailto:i.maximets@ovn.org>> wrote:
> >>>>>
> >>>>> On 3/12/21 1:07 PM, Dumitru Ceara wrote:
> >>>>>> Considering two DB rows, 'a' from table A and 'b' from table B
> > (with
> >>>>>> column 'ref_a' a reference to table A):
> >>>>>> a = {A._uuid=<U1>}
> >>>>>> b = {B._uuid=<U2>, B.ref_a=<U1>}
> >>>>>>
> >>>>>> Assuming both records are present in the IDL client's in-memory
> >>> view of
> >>>>>> the database, depending whether row 'b' is also deleted in the same
> >>>>>> transaction or not, deletion of row 'a' should generate the
> > following
> >>>>>> tracked changes:
> >>>>>>
> >>>>>> 1. only row 'a' is deleted:
> >>>>>> - for table A:
> >>>>>>   - deleted records: a = {A._uuid=<U1>}
> >>>>>> - for table B:
> >>>>>>   - updated records: b = {B._uuid=<U2>, B.ref_a=[]}
> >>>>>>
> >>>>>> 2. row 'a' and row 'b' are deleted in the same update:
> >>>>>> - for table A:
> >>>>>>   - deleted records: a = {A._uuid=<U1>}
> >>>>>> - for table B:
> >>>>>>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
> >>>>>>
> >>>>>> To ensure this, we now delay reparsing row backrefs for deleted
> > rows
> >>>>>> until all updates in the current run have been processed.
> >>>>>>
> >>>>>> Without this change, in scenario 2 above, the tracked changes for
> >>> table
> >>>>>> B would be:
> >>>>>> - deleted records: b = {B._uuid=<U2>, B.ref_a=[]}
> >>>>>>
> >>>>>> In particular, for strong references, row 'a' can never be deleted
> > in
> >>>>>> a transaction that happens strictly before row 'b' is deleted.  In
> >>> some
> >>>>>> cases [0] both rows are deleted in the same transaction and having
> >>>>>> B.ref_a=[] would violate the integrity of the database from client
> >>>>>> perspective.  This would force the client to always validate that
> >>>>>> strong reference fields are non-NULL.  This is not really an option
> >>>>>> because the information in the original reference is required for
> >>>>>> incrementally processing the record deletion.
> >>>>>>
> >>>>>> [0] with ovn-monitor-all=true, the following command triggers a
> > crash
> >>>>>>     in ovn-controller because a strong reference field becomes
> > NULL:
> >>>>>>     $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp
> >>> 00:00:00:00:00:01 1.0.0.1/24 <http://1.0.0.1/24>
> >>>>>>     $ ovn-nbctl lr-del r
> >>>>>>
> >>>>>> Reported-at: https://bugzilla.redhat.com/1932642
> >>> <https://bugzilla.redhat.com/1932642>
> >>>>>> Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for
> >>> deleted rows.")
> >>>>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com
> >>> <mailto:dceara@redhat.com>>
> >>>>>> ---
> >>>>>>  lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
> >>>>>>  tests/ovsdb-idl.at <http://ovsdb-idl.at>  |  203
> >>> +++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>  tests/test-ovsdb.c  |   50 +++++++++++++
> >>>>>>  tests/test-ovsdb.py |   14 ++++
> >>>>>>  4 files changed, 370 insertions(+), 32 deletions(-)
> >>>>>>
> >>>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> >>>>>> index 9e1e787..1542da9 100644
> >>>>>> --- a/lib/ovsdb-idl.c
> >>>>>> +++ b/lib/ovsdb-idl.c
> >>>>>> @@ -92,6 +92,7 @@ struct ovsdb_idl {
> >>>>>>      struct ovsdb_idl_txn *txn;
> >>>>>>      struct hmap outstanding_txns;
> >>>>>>      bool verify_write_only;
> >>>>>> +    struct ovs_list orphan_rows; /* All deleted but still
> >>> referenced rows. */
> >>>>>
> >>>>> The name here is a bit confusing.  I mean it's not a list
> >>>>> of orphan rows, at least it's not a list of all the orphan
> >>>>> rows.  So, we should, probably, rename it to something like
> >>>>> 'deleted_rows' instead and also rename all related functions.
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>>
> >>>> I agree with Ilya that the "orphan_rows" is confusing. However,
> >>> "deleted_rows" may also be confusing, because the current code also
> >>> maintains deleted rows in the track_list of each table (no matter if
> >>> tracking is enabled or not). If tracking is enabled, the rows are
> >>> destroyed (for real) when ovsdb_idl_track_clear() is called; if not,
it
> >>> is destroyed in ovsdb_idl_row_destroy_postprocess(). The part of logic
> >>> was really confusing. This patch changes that behavior further by
> >>> introducing this orphan_rows list as a temporary place (for some
> >>> situations) before moving to the track_list. Since this patch already
> >>> did some refactoring (which is great), I'd suggest doing a little more
> >>> to avoid the above confusion: for deleted rows, shall we just put the
> >>> untracked rows in a "deleted_untracked_rows" list (i.e. rename the
> >>> "orphan_rows" of this patch), and let the "track_list" only hold the
> >>> tracked ones, and update the ovsdb_idl_row_destroy_postprocess() to
> >>> destroy deleted_untracked_rows without worrying the track_list.
> >>> track_list should exist only if tracking is enabled, and should be
taken
> >>> care by ovsdb_idl_track_clear().
> >>
> >> Thanks, Ilya and Han, for the review and suggestions.
> >>
> >> Renaming 'orphan_rows' definitely makes sense, I'll do that.  However,
> >> regarding using a different list for tables without change tracking I'm
> >> not so sure if that's OK.  Even if the table doesn't have change
> >> tracking its deleted rows should still first go to the "temporary"
> >> orphan_rows list.
> >>
> >> For example:
> >>
> >> Table T1 (no change tracking):
> >> - Row a = {A.uuid=<U1>}
> >>
> >> Table T2 (change tracking enabled):
> >> - Row b = {B.uuid=<U2>, B.ref_to_A=<U1>}
> >>
> >> If in a transaction both rows 'a' and 'b' are deleted, and updates are
> >> received in this order, if we wouldn't add 'a' to 'orphan_rows' then
> >> we'd be reparsing backrefs for 'a' immediately leading to:
> >>
> >> Row b (updated) = {B.uuid=<U2>, B.ref_to_A=None}
> >>
> >> Then the update to delete 'b' is processed:
> >>
> >> Row b (deleted) = {B.uuid=<U2>, B.ref_to_A=None}
> >>
> >> The above is wrong and breaks consistency if ref_to_A is a strong
> > reference.
> >>
> > Sorry that I didn't make it clear enough. The "deleted_untracked_rows"
list
> > in my suggestion is just a renaming of your "orphan_rows". The real
change
> > I suggested was that let's don't use it as a temporary place to hold
these
> > rows. We can make it clear that deleted_untracked_rows is for deleted
> > untracked rows and the "track_list" of each table contains only tracked
> > rows.
>
> I think there's some confusion: 'orphan_rows' doesn't contain deleted
> untracked rows.  What it actually stores is "rows deleted in the current
> IDL run regardless of change tracking enabled or not".
>
I think we are saying the same thing from different perspectives. For
"deleted untracked rows" I am not talking about whether change tracking is
enabled or not, but about whether the deleted row is in "track_list" or
not. In this patch, a deleted row is in either "track_list" (what I called
"tracked") or "orphan_rows" (what I called "deleted and untracked"). You
always call "ovsdb_idl_row_untrack_change" before putting a row to
"orphan_rows", so I think it would be clear to rename "orphan_rows" as
"deleted_untracked_rows".

> >
> > In the above example, row a (without change tracking) will be put to
> > deleted_untracked_rows, but no need to move to "track_list" later.
> >
> > In ovsdb_idl_row_destroy_postprocess, we can destroy
> > deleted_untracked_rows, and no need to touch "track_list".
> >
> > Would this be more clear?
>
> The problem is with deleted rows that are part of tables with change
> tracking enabled.  Taking the example from the commit log:
>
> Considering two DB rows, 'a' from table A and 'b' from table B (with
> column 'ref_a' a reference to table A), both tables with change tracking
> enabled:
> a = {A._uuid=<U1>}
> b = {B._uuid=<U2>, B.ref_a=<U1>}
>
> If row 'a' and row 'b' are deleted in the same update then the IDL user
> should be notified of the following events.
> - for table A:
>   - deleted records: a = {A._uuid=<U1>}
> - for table B:
>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
>
> To achieve this we need to make sure all "delete" updates are processed
> before reparsing any backrefs.  This allows to check in
> ovsdb_idl_row_reparse_backrefs() if an arc's source (B in the example
> above) is also deleted in the current run, and skip unparsing B.ref_a so
> that that the IDL user (e.g., ovn-controller) can still access it.
>

In fact I didn't see the check you mentioned above. What's in the code is:
+        /* Skip reparsing deleted refs.  If ref's table has change tracking
+         * enabled, the users need the original information (and that will
be
+         * cleaned up by ovsdb_idl_track_clear()).  Otherwise, 'row' is
+         * already orphan and is not accessible to the IDL user and will be
+         * cleaned up at the end of the current IDL run.
+         */
+        if (ovsdb_idl_row_get_seqno(row, OVSDB_IDL_CHANGE_DELETE)) {
+            continue;
+        }
+

1) "row" is "a" in the above example, and "ref" is "b". It seems you should
check if "ref" is deleted?

2) The comments says "if ref's table has change tracking enabled", but in
fact the above code didn't check if change tracking is enabled or not.
Should we check it?

On the other hand, I understand the problem you are trying to solve, but I
don't see why it is related to my comment regarding clearly distinguishing
deleted_untracked_rows v.s. track_list.

> >
> >>>>
> >>>> What do you think?
> >>>>
> >>>
> >>> I took a second look at the patch, and it seems when tracking is not
> >>> enabled it would result in the rows in "orphan_rows" leaked, because
in
> >>> ovsdb_idl_process_orphans() the rows are moved to "track_list" ONLY IF
> >>> tracking is enabled for the table. If I am correct, would the above
> >>> suggested refactoring help avoiding such kinds of problems?
> >>
> >> Actually, there's no leak AFAICT.  ovsdb_idl_process_orphans() moves
> >> rows to 'track_list' in two scenarios:
> >> a. they're not referenced anymore (ovs_list_is_empty(&row->dst_arcs))
> >> b. they're in a table with change tracking enabled.
> >>
> >> If the table doesn't have change tracking enabled and the row is not
> >> referenced anymore it will be added to 'track_list' and will be cleaned
> >> up properly by ovsdb_idl_row_destroy_postprocess() (case "a" above).
> >>
> >> If the table doesn't have change tracking enabled but the row (let's
> >> call it 'R1') is still referenced (i.e., it became "orphan") then we
> >> shouldn't delete it yet.  When, in the future, the rows referring to
> >> this orphan row are deleted/updated (let's call them 'R2'),
> >> ovsdb_idl_row_clear_arcs(R2, true) is called and the orphan row R1 is
> >> now properly deleted.
> >>
> >
> > Ok, it seems no leak at all. I think I was confused by this check:
> > +        /* Orphan rows that are still unreferenced or are part of
tables
> > that
> > +         * have hange tracking enabled should be added to their table's
> > +         * 'track_list'.
> > +         */
> > +        if (ovs_list_is_empty(&row->dst_arcs)
> > +                || ovsdb_idl_track_is_set(row->table)) {
> > +            ovsdb_idl_row_track_change(row, OVSDB_IDL_CHANGE_DELETE);
> > +        }
> >
> > For R1, since "ovs_list_is_empty(&row->dst_arcs)" is true, it will be
moved
> > to track_list and finally get destroyed.
>
> R1 is still referenced by R2 at this point in the example above, that is
> there is an arc from R2 to R1.

Ok, my mistake.

>
> > But I think this "if" can be removed because deleted rows should always
go
> > to "track_list" so that they can be destroyed (assume not doing the
> > refactoring I suggested earlier).
>
> R1 should not be destroyed until R2 is destroyed too.  The complexity
> here exists because for tables with change tracking enabled, the current
> design is to always add deleted rows (even if they're still referenced)
> to 'track_list'.  But, even for those, ovsdb_idl_track_clear() will not
> destroy them as long as they're still referenced.

I am confused again. According to the above code:
1) If "row" is not referenced, ovs_list_is_empty(&row->dst_arcs) is true
and the row will be put to track_list.
2) If "row" is referenced
   2.1)  if the table is tracked, ovsdb_idl_track_is_set(row->table) is
true and the row will be put to track_list.
   2.2)  if the table is not tracked, the row will NOT be put to
track_list. The row will NOT be put to track_list.

If I understand correctly, in the current implementation (and in the
patch), everything in track_list will finally get released (either in
post_processing or in track_clear).
For 2.2), since the row is NOT put to track_list, how will it be released?

And for "But, even for those, ovsdb_idl_track_clear() will not destroy them
as long as they're still referenced.", do you mean it is possible that a
row is NOT destroyed even after the IDL iteration after user calling
track_clear()? If yes, why is that necessary? And how is it achieved (in
track_clear() it goes through each track_list and release all is_orphaned()
rows, so how could it not get released)?

>
> > Before this check, ovsdb_idl_row_reparse_backrefs(row) is called to make
> > sure the backrefs are cleared, so "ovs_list_is_empty(&row->dst_arcs)"
> > should always be true, right?
>
> No, even if there was an update to delete R1, R2 might still exist
> (e.g., monitor condition change causes R1 to be deleted from the local
> database view) and then ovsdb_idl_row_reparse_backrefs(R1) would create
> an arc from R2 to R1.
>
> I agree that the existing change tracking code is complex and can
> potentially be simplified and/or refactored.  However, that's a big
> effort and I think should not be in the scope of this bug fix.  I did
> have to refactor some things but I limited myself to only changing parts
> that are absolutely required to fix the bug.
>
Sure, I agree that refactoring shouldn't be required in this patch. I am
totally ok with it being merged without the refactoring I suggested, but
regardless of the refactoring, could you still have my above concerns
clarified? Thanks a lot for your patience!

Thanks,
Han

> For the time being, without this fix ovn-controller is crashing because
> of the inconsistent view of the database the IDL change tracking code
> creates.
>
> I think this should be fixed as soon as possible.  In my opinion all
> other refactoring and hardening should be done as follow up.
>
> Regards,
> Dumitru
>
> >
> >>>
> >>> Thanks,
> >>> Han
> >>>
> >>>>>
> >>>>> For the rest of the code... It seems fine, but it's really hard
> >>>>> to review and I still have a feeling that something might go
> >>>>> wrong (not more wrong than in current code, though), but I can't
> >>>>> think of any example.  So, it's, probably, OK.
> >>
> >> Thanks, I'm not sure how we can improve the stability/reliability of
> >> this code without major refactoring/rewrite.  Until then though, the
> >> fact that we break consistency of the IDL client's view of the database
> >> is something that needs to be addressed because it's taken for granted
> >> by ovn-controller in quite a few cases.
> >>
> >>>>>
> >>>>> Best regards, Ilya Maximets.
> >>
> >> Regards,
> >> Dumitru
> >>
> >
>
Dumitru Ceara March 23, 2021, noon UTC | #8
On 3/22/21 8:45 PM, Han Zhou wrote:
> On Mon, Mar 22, 2021 at 4:27 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 3/19/21 6:55 PM, Han Zhou wrote:
>>> On Fri, Mar 19, 2021 at 1:42 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>>>
>>>> On 3/19/21 2:41 AM, Han Zhou wrote:
>>>>>
>>>>>
>>>>> On Thu, Mar 18, 2021 at 6:27 PM Han Zhou <hzhou@ovn.org
>>>>> <mailto:hzhou@ovn.org>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 18, 2021 at 11:07 AM Ilya Maximets <i.maximets@ovn.org
>>>>> <mailto:i.maximets@ovn.org>> wrote:
>>>>>>>
>>>>>>> On 3/12/21 1:07 PM, Dumitru Ceara wrote:
>>>>>>>> Considering two DB rows, 'a' from table A and 'b' from table B
>>> (with
>>>>>>>> column 'ref_a' a reference to table A):
>>>>>>>> a = {A._uuid=<U1>}
>>>>>>>> b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>>>>>>
>>>>>>>> Assuming both records are present in the IDL client's in-memory
>>>>> view of
>>>>>>>> the database, depending whether row 'b' is also deleted in the same
>>>>>>>> transaction or not, deletion of row 'a' should generate the
>>> following
>>>>>>>> tracked changes:
>>>>>>>>
>>>>>>>> 1. only row 'a' is deleted:
>>>>>>>> - for table A:
>>>>>>>>   - deleted records: a = {A._uuid=<U1>}
>>>>>>>> - for table B:
>>>>>>>>   - updated records: b = {B._uuid=<U2>, B.ref_a=[]}
>>>>>>>>
>>>>>>>> 2. row 'a' and row 'b' are deleted in the same update:
>>>>>>>> - for table A:
>>>>>>>>   - deleted records: a = {A._uuid=<U1>}
>>>>>>>> - for table B:
>>>>>>>>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>>>>>>
>>>>>>>> To ensure this, we now delay reparsing row backrefs for deleted
>>> rows
>>>>>>>> until all updates in the current run have been processed.
>>>>>>>>
>>>>>>>> Without this change, in scenario 2 above, the tracked changes for
>>>>> table
>>>>>>>> B would be:
>>>>>>>> - deleted records: b = {B._uuid=<U2>, B.ref_a=[]}
>>>>>>>>
>>>>>>>> In particular, for strong references, row 'a' can never be deleted
>>> in
>>>>>>>> a transaction that happens strictly before row 'b' is deleted.  In
>>>>> some
>>>>>>>> cases [0] both rows are deleted in the same transaction and having
>>>>>>>> B.ref_a=[] would violate the integrity of the database from client
>>>>>>>> perspective.  This would force the client to always validate that
>>>>>>>> strong reference fields are non-NULL.  This is not really an option
>>>>>>>> because the information in the original reference is required for
>>>>>>>> incrementally processing the record deletion.
>>>>>>>>
>>>>>>>> [0] with ovn-monitor-all=true, the following command triggers a
>>> crash
>>>>>>>>     in ovn-controller because a strong reference field becomes
>>> NULL:
>>>>>>>>     $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp
>>>>> 00:00:00:00:00:01 1.0.0.1/24 <http://1.0.0.1/24>
>>>>>>>>     $ ovn-nbctl lr-del r
>>>>>>>>
>>>>>>>> Reported-at: https://bugzilla.redhat.com/1932642
>>>>> <https://bugzilla.redhat.com/1932642>
>>>>>>>> Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for
>>>>> deleted rows.")
>>>>>>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com
>>>>> <mailto:dceara@redhat.com>>
>>>>>>>> ---
>>>>>>>>  lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
>>>>>>>>  tests/ovsdb-idl.at <http://ovsdb-idl.at>  |  203
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>  tests/test-ovsdb.c  |   50 +++++++++++++
>>>>>>>>  tests/test-ovsdb.py |   14 ++++
>>>>>>>>  4 files changed, 370 insertions(+), 32 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
>>>>>>>> index 9e1e787..1542da9 100644
>>>>>>>> --- a/lib/ovsdb-idl.c
>>>>>>>> +++ b/lib/ovsdb-idl.c
>>>>>>>> @@ -92,6 +92,7 @@ struct ovsdb_idl {
>>>>>>>>      struct ovsdb_idl_txn *txn;
>>>>>>>>      struct hmap outstanding_txns;
>>>>>>>>      bool verify_write_only;
>>>>>>>> +    struct ovs_list orphan_rows; /* All deleted but still
>>>>> referenced rows. */
>>>>>>>
>>>>>>> The name here is a bit confusing.  I mean it's not a list
>>>>>>> of orphan rows, at least it's not a list of all the orphan
>>>>>>> rows.  So, we should, probably, rename it to something like
>>>>>>> 'deleted_rows' instead and also rename all related functions.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>
>>>>>> I agree with Ilya that the "orphan_rows" is confusing. However,
>>>>> "deleted_rows" may also be confusing, because the current code also
>>>>> maintains deleted rows in the track_list of each table (no matter if
>>>>> tracking is enabled or not). If tracking is enabled, the rows are
>>>>> destroyed (for real) when ovsdb_idl_track_clear() is called; if not,
> it
>>>>> is destroyed in ovsdb_idl_row_destroy_postprocess(). The part of logic
>>>>> was really confusing. This patch changes that behavior further by
>>>>> introducing this orphan_rows list as a temporary place (for some
>>>>> situations) before moving to the track_list. Since this patch already
>>>>> did some refactoring (which is great), I'd suggest doing a little more
>>>>> to avoid the above confusion: for deleted rows, shall we just put the
>>>>> untracked rows in a "deleted_untracked_rows" list (i.e. rename the
>>>>> "orphan_rows" of this patch), and let the "track_list" only hold the
>>>>> tracked ones, and update the ovsdb_idl_row_destroy_postprocess() to
>>>>> destroy deleted_untracked_rows without worrying the track_list.
>>>>> track_list should exist only if tracking is enabled, and should be
> taken
>>>>> care by ovsdb_idl_track_clear().
>>>>
>>>> Thanks, Ilya and Han, for the review and suggestions.
>>>>
>>>> Renaming 'orphan_rows' definitely makes sense, I'll do that.  However,
>>>> regarding using a different list for tables without change tracking I'm
>>>> not so sure if that's OK.  Even if the table doesn't have change
>>>> tracking its deleted rows should still first go to the "temporary"
>>>> orphan_rows list.
>>>>
>>>> For example:
>>>>
>>>> Table T1 (no change tracking):
>>>> - Row a = {A.uuid=<U1>}
>>>>
>>>> Table T2 (change tracking enabled):
>>>> - Row b = {B.uuid=<U2>, B.ref_to_A=<U1>}
>>>>
>>>> If in a transaction both rows 'a' and 'b' are deleted, and updates are
>>>> received in this order, if we wouldn't add 'a' to 'orphan_rows' then
>>>> we'd be reparsing backrefs for 'a' immediately leading to:
>>>>
>>>> Row b (updated) = {B.uuid=<U2>, B.ref_to_A=None}
>>>>
>>>> Then the update to delete 'b' is processed:
>>>>
>>>> Row b (deleted) = {B.uuid=<U2>, B.ref_to_A=None}
>>>>
>>>> The above is wrong and breaks consistency if ref_to_A is a strong
>>> reference.
>>>>
>>> Sorry that I didn't make it clear enough. The "deleted_untracked_rows"
> list
>>> in my suggestion is just a renaming of your "orphan_rows". The real
> change
>>> I suggested was that let's don't use it as a temporary place to hold
> these
>>> rows. We can make it clear that deleted_untracked_rows is for deleted
>>> untracked rows and the "track_list" of each table contains only tracked
>>> rows.
>>
>> I think there's some confusion: 'orphan_rows' doesn't contain deleted
>> untracked rows.  What it actually stores is "rows deleted in the current
>> IDL run regardless of change tracking enabled or not".
>>
> I think we are saying the same thing from different perspectives. For
> "deleted untracked rows" I am not talking about whether change tracking is
> enabled or not, but about whether the deleted row is in "track_list" or
> not. In this patch, a deleted row is in either "track_list" (what I called
> "tracked") or "orphan_rows" (what I called "deleted and untracked"). You
> always call "ovsdb_idl_row_untrack_change" before putting a row to
> "orphan_rows", so I think it would be clear to rename "orphan_rows" as
> "deleted_untracked_rows".

Oh, I see now, it was a misunderstanding indeed.  I thought we were
referring to change tracking being enabled for the table or not.  I can
change the name to "deleted_untracked_rows" in the next revision.

> 
>>>
>>> In the above example, row a (without change tracking) will be put to
>>> deleted_untracked_rows, but no need to move to "track_list" later.
>>>
>>> In ovsdb_idl_row_destroy_postprocess, we can destroy
>>> deleted_untracked_rows, and no need to touch "track_list".
>>>
>>> Would this be more clear?
>>
>> The problem is with deleted rows that are part of tables with change
>> tracking enabled.  Taking the example from the commit log:
>>
>> Considering two DB rows, 'a' from table A and 'b' from table B (with
>> column 'ref_a' a reference to table A), both tables with change tracking
>> enabled:
>> a = {A._uuid=<U1>}
>> b = {B._uuid=<U2>, B.ref_a=<U1>}
>>
>> If row 'a' and row 'b' are deleted in the same update then the IDL user
>> should be notified of the following events.
>> - for table A:
>>   - deleted records: a = {A._uuid=<U1>}
>> - for table B:
>>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
>>
>> To achieve this we need to make sure all "delete" updates are processed
>> before reparsing any backrefs.  This allows to check in
>> ovsdb_idl_row_reparse_backrefs() if an arc's source (B in the example
>> above) is also deleted in the current run, and skip unparsing B.ref_a so
>> that that the IDL user (e.g., ovn-controller) can still access it.
>>
> 
> In fact I didn't see the check you mentioned above. What's in the code is:
> +        /* Skip reparsing deleted refs.  If ref's table has change tracking
> +         * enabled, the users need the original information (and that will
> be
> +         * cleaned up by ovsdb_idl_track_clear()).  Otherwise, 'row' is
> +         * already orphan and is not accessible to the IDL user and will be
> +         * cleaned up at the end of the current IDL run.
> +         */
> +        if (ovsdb_idl_row_get_seqno(row, OVSDB_IDL_CHANGE_DELETE)) {
> +            continue;
> +        }
> +
> 
> 1) "row" is "a" in the above example, and "ref" is "b". It seems you should
> check if "ref" is deleted?

You're right, this should be "ref" instead of "row, thanks for spotting
this!  I'll add a test case to exercise this scenario to avoid
regressions in the future.

Also, looking more at the code in my patch, OVSDB_IDL_CHANGE_DELETE will
never be set for rows that are on the "orphan_rows" list, so this
condition would never be true.

I'll rework this part and make sure that OVSDB_IDL_CHANGE_DELETE is
always set for rows that are on the "orphan_rows" list.

> 
> 2) The comments says "if ref's table has change tracking enabled", but in
> fact the above code didn't check if change tracking is enabled or not.
> Should we check it?

I can try to rephrase the comment.  What I meant was:

"Skip reparsing deleted refs.  This is OK because either
- ref's table has change tracking enabled and the users need the
  original datum in 'ref', which will be cleaned up by
  ovsdb_idl_track_clear().
OR
- ref's table has change tracking disabled, 'ref' is deleted so its
  datum will be cleaned up in ovsdb_idl_row_destroy_postprocess()."

> 
> On the other hand, I understand the problem you are trying to solve, but I
> don't see why it is related to my comment regarding clearly distinguishing
> deleted_untracked_rows v.s. track_list.

It's not, with your explanation about deleted_untracked_rows above it's
now clear to me.

> 
>>>
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>
>>>>> I took a second look at the patch, and it seems when tracking is not
>>>>> enabled it would result in the rows in "orphan_rows" leaked, because
> in
>>>>> ovsdb_idl_process_orphans() the rows are moved to "track_list" ONLY IF
>>>>> tracking is enabled for the table. If I am correct, would the above
>>>>> suggested refactoring help avoiding such kinds of problems?
>>>>
>>>> Actually, there's no leak AFAICT.  ovsdb_idl_process_orphans() moves
>>>> rows to 'track_list' in two scenarios:
>>>> a. they're not referenced anymore (ovs_list_is_empty(&row->dst_arcs))
>>>> b. they're in a table with change tracking enabled.
>>>>
>>>> If the table doesn't have change tracking enabled and the row is not
>>>> referenced anymore it will be added to 'track_list' and will be cleaned
>>>> up properly by ovsdb_idl_row_destroy_postprocess() (case "a" above).
>>>>
>>>> If the table doesn't have change tracking enabled but the row (let's
>>>> call it 'R1') is still referenced (i.e., it became "orphan") then we
>>>> shouldn't delete it yet.  When, in the future, the rows referring to
>>>> this orphan row are deleted/updated (let's call them 'R2'),
>>>> ovsdb_idl_row_clear_arcs(R2, true) is called and the orphan row R1 is
>>>> now properly deleted.
>>>>
>>>
>>> Ok, it seems no leak at all. I think I was confused by this check:
>>> +        /* Orphan rows that are still unreferenced or are part of
> tables
>>> that
>>> +         * have hange tracking enabled should be added to their table's
>>> +         * 'track_list'.
>>> +         */
>>> +        if (ovs_list_is_empty(&row->dst_arcs)
>>> +                || ovsdb_idl_track_is_set(row->table)) {
>>> +            ovsdb_idl_row_track_change(row, OVSDB_IDL_CHANGE_DELETE);
>>> +        }
>>>
>>> For R1, since "ovs_list_is_empty(&row->dst_arcs)" is true, it will be
> moved
>>> to track_list and finally get destroyed.
>>
>> R1 is still referenced by R2 at this point in the example above, that is
>> there is an arc from R2 to R1.
> 
> Ok, my mistake.
> 
>>
>>> But I think this "if" can be removed because deleted rows should always
> go
>>> to "track_list" so that they can be destroyed (assume not doing the
>>> refactoring I suggested earlier).
>>
>> R1 should not be destroyed until R2 is destroyed too.  The complexity
>> here exists because for tables with change tracking enabled, the current
>> design is to always add deleted rows (even if they're still referenced)
>> to 'track_list'.  But, even for those, ovsdb_idl_track_clear() will not
>> destroy them as long as they're still referenced.
> 
> I am confused again. According to the above code:
> 1) If "row" is not referenced, ovs_list_is_empty(&row->dst_arcs) is true
> and the row will be put to track_list.
> 2) If "row" is referenced
>    2.1)  if the table is tracked, ovsdb_idl_track_is_set(row->table) is
> true and the row will be put to track_list.
>    2.2)  if the table is not tracked, the row will NOT be put to
> track_list. The row will NOT be put to track_list.
> 
> If I understand correctly, in the current implementation (and in the
> patch), everything in track_list will finally get released (either in
> post_processing or in track_clear).

Those rows (R1) are "orphan", their contents were deleted from the IDL
database view, but there still exist other non-orphan rows (R2) that
refer to them.

> For 2.2), since the row is NOT put to track_list, how will it be released?

When R2 is deleted, ovsdb_idl_delete_row(R2) is called which calls
ovsdb_idl_row_clear_arcs(R2, true).

R2 has an arc to R1, and if R2 was the only row referring to R1 then
ovsdb_idl_row_clear_arcs(R2, true) calls ovsdb_idl_row_destroy(R1).

At this point ovs_list_is_empty(&R1->dst_arcs) == true so R1 is finally
removed from its table's hashtable, added to its table's 'track_list'
and will be completely freed in ovsdb_idl_row_destroy_postprocess().

> 
> And for "But, even for those, ovsdb_idl_track_clear() will not destroy them
> as long as they're still referenced.", do you mean it is possible that a
> row is NOT destroyed even after the IDL iteration after user calling
> track_clear()? If yes, why is that necessary? And how is it achieved (in
> track_clear() it goes through each track_list and release all is_orphaned()
> rows, so how could it not get released)?

The reason is the same as above, there might still exist an R2 referring R1.

ovsdb_idl_track_clear(idl) calls ovsdb_idl_track_clear__(idl, false)
which means that rows that are on 'track_list' but still referenced will
not be freed because they're still in their table's hashtable:

https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1260

> 
>>
>>> Before this check, ovsdb_idl_row_reparse_backrefs(row) is called to make
>>> sure the backrefs are cleared, so "ovs_list_is_empty(&row->dst_arcs)"
>>> should always be true, right?
>>
>> No, even if there was an update to delete R1, R2 might still exist
>> (e.g., monitor condition change causes R1 to be deleted from the local
>> database view) and then ovsdb_idl_row_reparse_backrefs(R1) would create
>> an arc from R2 to R1.
>>
>> I agree that the existing change tracking code is complex and can
>> potentially be simplified and/or refactored.  However, that's a big
>> effort and I think should not be in the scope of this bug fix.  I did
>> have to refactor some things but I limited myself to only changing parts
>> that are absolutely required to fix the bug.
>>
> Sure, I agree that refactoring shouldn't be required in this patch. I am
> totally ok with it being merged without the refactoring I suggested, but
> regardless of the refactoring, could you still have my above concerns
> clarified? Thanks a lot for your patience!

Thanks for all the comments, they help question the implementation and
spot issues!

> 
> Thanks,
> Han

Regards,
Dumitru

> 
>> For the time being, without this fix ovn-controller is crashing because
>> of the inconsistent view of the database the IDL change tracking code
>> creates.
>>
>> I think this should be fixed as soon as possible.  In my opinion all
>> other refactoring and hardening should be done as follow up.
>>
>> Regards,
>> Dumitru
>>
>>>
>>>>>
>>>>> Thanks,
>>>>> Han
>>>>>
>>>>>>>
>>>>>>> For the rest of the code... It seems fine, but it's really hard
>>>>>>> to review and I still have a feeling that something might go
>>>>>>> wrong (not more wrong than in current code, though), but I can't
>>>>>>> think of any example.  So, it's, probably, OK.
>>>>
>>>> Thanks, I'm not sure how we can improve the stability/reliability of
>>>> this code without major refactoring/rewrite.  Until then though, the
>>>> fact that we break consistency of the IDL client's view of the database
>>>> is something that needs to be addressed because it's taken for granted
>>>> by ovn-controller in quite a few cases.
>>>>
>>>>>>>
>>>>>>> Best regards, Ilya Maximets.
>>>>
>>>> Regards,
>>>> Dumitru
>>>>
>>>
>>
>
Han Zhou March 23, 2021, 7:15 p.m. UTC | #9
On Tue, Mar 23, 2021 at 5:00 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 3/22/21 8:45 PM, Han Zhou wrote:
> > On Mon, Mar 22, 2021 at 4:27 AM Dumitru Ceara <dceara@redhat.com> wrote:
> >>
> >> On 3/19/21 6:55 PM, Han Zhou wrote:
> >>> On Fri, Mar 19, 2021 at 1:42 AM Dumitru Ceara <dceara@redhat.com>
wrote:
> >>>>
> >>>> On 3/19/21 2:41 AM, Han Zhou wrote:
> >>>>>
> >>>>>
> >>>>> On Thu, Mar 18, 2021 at 6:27 PM Han Zhou <hzhou@ovn.org
> >>>>> <mailto:hzhou@ovn.org>> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Mar 18, 2021 at 11:07 AM Ilya Maximets <i.maximets@ovn.org
> >>>>> <mailto:i.maximets@ovn.org>> wrote:
> >>>>>>>
> >>>>>>> On 3/12/21 1:07 PM, Dumitru Ceara wrote:
> >>>>>>>> Considering two DB rows, 'a' from table A and 'b' from table B
> >>> (with
> >>>>>>>> column 'ref_a' a reference to table A):
> >>>>>>>> a = {A._uuid=<U1>}
> >>>>>>>> b = {B._uuid=<U2>, B.ref_a=<U1>}
> >>>>>>>>
> >>>>>>>> Assuming both records are present in the IDL client's in-memory
> >>>>> view of
> >>>>>>>> the database, depending whether row 'b' is also deleted in the
same
> >>>>>>>> transaction or not, deletion of row 'a' should generate the
> >>> following
> >>>>>>>> tracked changes:
> >>>>>>>>
> >>>>>>>> 1. only row 'a' is deleted:
> >>>>>>>> - for table A:
> >>>>>>>>   - deleted records: a = {A._uuid=<U1>}
> >>>>>>>> - for table B:
> >>>>>>>>   - updated records: b = {B._uuid=<U2>, B.ref_a=[]}
> >>>>>>>>
> >>>>>>>> 2. row 'a' and row 'b' are deleted in the same update:
> >>>>>>>> - for table A:
> >>>>>>>>   - deleted records: a = {A._uuid=<U1>}
> >>>>>>>> - for table B:
> >>>>>>>>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
> >>>>>>>>
> >>>>>>>> To ensure this, we now delay reparsing row backrefs for deleted
> >>> rows
> >>>>>>>> until all updates in the current run have been processed.
> >>>>>>>>
> >>>>>>>> Without this change, in scenario 2 above, the tracked changes for
> >>>>> table
> >>>>>>>> B would be:
> >>>>>>>> - deleted records: b = {B._uuid=<U2>, B.ref_a=[]}
> >>>>>>>>
> >>>>>>>> In particular, for strong references, row 'a' can never be
deleted
> >>> in
> >>>>>>>> a transaction that happens strictly before row 'b' is deleted.
In
> >>>>> some
> >>>>>>>> cases [0] both rows are deleted in the same transaction and
having
> >>>>>>>> B.ref_a=[] would violate the integrity of the database from
client
> >>>>>>>> perspective.  This would force the client to always validate that
> >>>>>>>> strong reference fields are non-NULL.  This is not really an
option
> >>>>>>>> because the information in the original reference is required for
> >>>>>>>> incrementally processing the record deletion.
> >>>>>>>>
> >>>>>>>> [0] with ovn-monitor-all=true, the following command triggers a
> >>> crash
> >>>>>>>>     in ovn-controller because a strong reference field becomes
> >>> NULL:
> >>>>>>>>     $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp
> >>>>> 00:00:00:00:00:01 1.0.0.1/24 <http://1.0.0.1/24>
> >>>>>>>>     $ ovn-nbctl lr-del r
> >>>>>>>>
> >>>>>>>> Reported-at: https://bugzilla.redhat.com/1932642
> >>>>> <https://bugzilla.redhat.com/1932642>
> >>>>>>>> Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for
> >>>>> deleted rows.")
> >>>>>>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com
> >>>>> <mailto:dceara@redhat.com>>
> >>>>>>>> ---
> >>>>>>>>  lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
> >>>>>>>>  tests/ovsdb-idl.at <http://ovsdb-idl.at>  |  203
> >>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>  tests/test-ovsdb.c  |   50 +++++++++++++
> >>>>>>>>  tests/test-ovsdb.py |   14 ++++
> >>>>>>>>  4 files changed, 370 insertions(+), 32 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> >>>>>>>> index 9e1e787..1542da9 100644
> >>>>>>>> --- a/lib/ovsdb-idl.c
> >>>>>>>> +++ b/lib/ovsdb-idl.c
> >>>>>>>> @@ -92,6 +92,7 @@ struct ovsdb_idl {
> >>>>>>>>      struct ovsdb_idl_txn *txn;
> >>>>>>>>      struct hmap outstanding_txns;
> >>>>>>>>      bool verify_write_only;
> >>>>>>>> +    struct ovs_list orphan_rows; /* All deleted but still
> >>>>> referenced rows. */
> >>>>>>>
> >>>>>>> The name here is a bit confusing.  I mean it's not a list
> >>>>>>> of orphan rows, at least it's not a list of all the orphan
> >>>>>>> rows.  So, we should, probably, rename it to something like
> >>>>>>> 'deleted_rows' instead and also rename all related functions.
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>>
> >>>>>>
> >>>>>> I agree with Ilya that the "orphan_rows" is confusing. However,
> >>>>> "deleted_rows" may also be confusing, because the current code also
> >>>>> maintains deleted rows in the track_list of each table (no matter if
> >>>>> tracking is enabled or not). If tracking is enabled, the rows are
> >>>>> destroyed (for real) when ovsdb_idl_track_clear() is called; if not,
> > it
> >>>>> is destroyed in ovsdb_idl_row_destroy_postprocess(). The part of
logic
> >>>>> was really confusing. This patch changes that behavior further by
> >>>>> introducing this orphan_rows list as a temporary place (for some
> >>>>> situations) before moving to the track_list. Since this patch
already
> >>>>> did some refactoring (which is great), I'd suggest doing a little
more
> >>>>> to avoid the above confusion: for deleted rows, shall we just put
the
> >>>>> untracked rows in a "deleted_untracked_rows" list (i.e. rename the
> >>>>> "orphan_rows" of this patch), and let the "track_list" only hold the
> >>>>> tracked ones, and update the ovsdb_idl_row_destroy_postprocess() to
> >>>>> destroy deleted_untracked_rows without worrying the track_list.
> >>>>> track_list should exist only if tracking is enabled, and should be
> > taken
> >>>>> care by ovsdb_idl_track_clear().
> >>>>
> >>>> Thanks, Ilya and Han, for the review and suggestions.
> >>>>
> >>>> Renaming 'orphan_rows' definitely makes sense, I'll do that.
However,
> >>>> regarding using a different list for tables without change tracking
I'm
> >>>> not so sure if that's OK.  Even if the table doesn't have change
> >>>> tracking its deleted rows should still first go to the "temporary"
> >>>> orphan_rows list.
> >>>>
> >>>> For example:
> >>>>
> >>>> Table T1 (no change tracking):
> >>>> - Row a = {A.uuid=<U1>}
> >>>>
> >>>> Table T2 (change tracking enabled):
> >>>> - Row b = {B.uuid=<U2>, B.ref_to_A=<U1>}
> >>>>
> >>>> If in a transaction both rows 'a' and 'b' are deleted, and updates
are
> >>>> received in this order, if we wouldn't add 'a' to 'orphan_rows' then
> >>>> we'd be reparsing backrefs for 'a' immediately leading to:
> >>>>
> >>>> Row b (updated) = {B.uuid=<U2>, B.ref_to_A=None}
> >>>>
> >>>> Then the update to delete 'b' is processed:
> >>>>
> >>>> Row b (deleted) = {B.uuid=<U2>, B.ref_to_A=None}
> >>>>
> >>>> The above is wrong and breaks consistency if ref_to_A is a strong
> >>> reference.
> >>>>
> >>> Sorry that I didn't make it clear enough. The "deleted_untracked_rows"
> > list
> >>> in my suggestion is just a renaming of your "orphan_rows". The real
> > change
> >>> I suggested was that let's don't use it as a temporary place to hold
> > these
> >>> rows. We can make it clear that deleted_untracked_rows is for deleted
> >>> untracked rows and the "track_list" of each table contains only
tracked
> >>> rows.
> >>
> >> I think there's some confusion: 'orphan_rows' doesn't contain deleted
> >> untracked rows.  What it actually stores is "rows deleted in the
current
> >> IDL run regardless of change tracking enabled or not".
> >>
> > I think we are saying the same thing from different perspectives. For
> > "deleted untracked rows" I am not talking about whether change tracking
is
> > enabled or not, but about whether the deleted row is in "track_list" or
> > not. In this patch, a deleted row is in either "track_list" (what I
called
> > "tracked") or "orphan_rows" (what I called "deleted and untracked"). You
> > always call "ovsdb_idl_row_untrack_change" before putting a row to
> > "orphan_rows", so I think it would be clear to rename "orphan_rows" as
> > "deleted_untracked_rows".
>
> Oh, I see now, it was a misunderstanding indeed.  I thought we were
> referring to change tracking being enabled for the table or not.  I can
> change the name to "deleted_untracked_rows" in the next revision.
>
> >
> >>>
> >>> In the above example, row a (without change tracking) will be put to
> >>> deleted_untracked_rows, but no need to move to "track_list" later.
> >>>
> >>> In ovsdb_idl_row_destroy_postprocess, we can destroy
> >>> deleted_untracked_rows, and no need to touch "track_list".
> >>>
> >>> Would this be more clear?
> >>
> >> The problem is with deleted rows that are part of tables with change
> >> tracking enabled.  Taking the example from the commit log:
> >>
> >> Considering two DB rows, 'a' from table A and 'b' from table B (with
> >> column 'ref_a' a reference to table A), both tables with change
tracking
> >> enabled:
> >> a = {A._uuid=<U1>}
> >> b = {B._uuid=<U2>, B.ref_a=<U1>}
> >>
> >> If row 'a' and row 'b' are deleted in the same update then the IDL user
> >> should be notified of the following events.
> >> - for table A:
> >>   - deleted records: a = {A._uuid=<U1>}
> >> - for table B:
> >>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
> >>
> >> To achieve this we need to make sure all "delete" updates are processed
> >> before reparsing any backrefs.  This allows to check in
> >> ovsdb_idl_row_reparse_backrefs() if an arc's source (B in the example
> >> above) is also deleted in the current run, and skip unparsing B.ref_a
so
> >> that that the IDL user (e.g., ovn-controller) can still access it.
> >>
> >
> > In fact I didn't see the check you mentioned above. What's in the code
is:
> > +        /* Skip reparsing deleted refs.  If ref's table has change
tracking
> > +         * enabled, the users need the original information (and that
will
> > be
> > +         * cleaned up by ovsdb_idl_track_clear()).  Otherwise, 'row' is
> > +         * already orphan and is not accessible to the IDL user and
will be
> > +         * cleaned up at the end of the current IDL run.
> > +         */
> > +        if (ovsdb_idl_row_get_seqno(row, OVSDB_IDL_CHANGE_DELETE)) {
> > +            continue;
> > +        }
> > +
> >
> > 1) "row" is "a" in the above example, and "ref" is "b". It seems you
should
> > check if "ref" is deleted?
>
> You're right, this should be "ref" instead of "row, thanks for spotting
> this!  I'll add a test case to exercise this scenario to avoid
> regressions in the future.
>
> Also, looking more at the code in my patch, OVSDB_IDL_CHANGE_DELETE will
> never be set for rows that are on the "orphan_rows" list, so this
> condition would never be true.
>
> I'll rework this part and make sure that OVSDB_IDL_CHANGE_DELETE is
> always set for rows that are on the "orphan_rows" list.
>
> >
> > 2) The comments says "if ref's table has change tracking enabled", but
in
> > fact the above code didn't check if change tracking is enabled or not.
> > Should we check it?
>
> I can try to rephrase the comment.  What I meant was:
>
> "Skip reparsing deleted refs.  This is OK because either
> - ref's table has change tracking enabled and the users need the
>   original datum in 'ref', which will be cleaned up by
>   ovsdb_idl_track_clear().
> OR
> - ref's table has change tracking disabled, 'ref' is deleted so its
>   datum will be cleaned up in ovsdb_idl_row_destroy_postprocess()."
>

For deleted refs, e.g. R2 reference R1, and both R1 and R2 are deleted,
then during the call ovsdb_idl_delete_row(R2), the arc R2 -> R1 is already
deleted, so here we won't even see R2 in ovsdb_idl_row_reparse_backref(R1),
right? So the above check is useless?

> >
> > On the other hand, I understand the problem you are trying to solve,
but I
> > don't see why it is related to my comment regarding clearly
distinguishing
> > deleted_untracked_rows v.s. track_list.
>
> It's not, with your explanation about deleted_untracked_rows above it's
> now clear to me.
>
> >
> >>>
> >>>>>>
> >>>>>> What do you think?
> >>>>>>
> >>>>>
> >>>>> I took a second look at the patch, and it seems when tracking is not
> >>>>> enabled it would result in the rows in "orphan_rows" leaked, because
> > in
> >>>>> ovsdb_idl_process_orphans() the rows are moved to "track_list" ONLY
IF
> >>>>> tracking is enabled for the table. If I am correct, would the above
> >>>>> suggested refactoring help avoiding such kinds of problems?
> >>>>
> >>>> Actually, there's no leak AFAICT.  ovsdb_idl_process_orphans() moves
> >>>> rows to 'track_list' in two scenarios:
> >>>> a. they're not referenced anymore (ovs_list_is_empty(&row->dst_arcs))
> >>>> b. they're in a table with change tracking enabled.
> >>>>
> >>>> If the table doesn't have change tracking enabled and the row is not
> >>>> referenced anymore it will be added to 'track_list' and will be
cleaned
> >>>> up properly by ovsdb_idl_row_destroy_postprocess() (case "a" above).
> >>>>
> >>>> If the table doesn't have change tracking enabled but the row (let's
> >>>> call it 'R1') is still referenced (i.e., it became "orphan") then we
> >>>> shouldn't delete it yet.  When, in the future, the rows referring to
> >>>> this orphan row are deleted/updated (let's call them 'R2'),
> >>>> ovsdb_idl_row_clear_arcs(R2, true) is called and the orphan row R1 is
> >>>> now properly deleted.
> >>>>
> >>>
> >>> Ok, it seems no leak at all. I think I was confused by this check:
> >>> +        /* Orphan rows that are still unreferenced or are part of
> > tables
> >>> that
> >>> +         * have hange tracking enabled should be added to their
table's
> >>> +         * 'track_list'.
> >>> +         */
> >>> +        if (ovs_list_is_empty(&row->dst_arcs)
> >>> +                || ovsdb_idl_track_is_set(row->table)) {
> >>> +            ovsdb_idl_row_track_change(row, OVSDB_IDL_CHANGE_DELETE);
> >>> +        }
> >>>
> >>> For R1, since "ovs_list_is_empty(&row->dst_arcs)" is true, it will be
> > moved
> >>> to track_list and finally get destroyed.
> >>
> >> R1 is still referenced by R2 at this point in the example above, that
is
> >> there is an arc from R2 to R1.
> >
> > Ok, my mistake.
> >
> >>
> >>> But I think this "if" can be removed because deleted rows should
always
> > go
> >>> to "track_list" so that they can be destroyed (assume not doing the
> >>> refactoring I suggested earlier).
> >>
> >> R1 should not be destroyed until R2 is destroyed too.  The complexity
> >> here exists because for tables with change tracking enabled, the
current
> >> design is to always add deleted rows (even if they're still referenced)
> >> to 'track_list'.  But, even for those, ovsdb_idl_track_clear() will not
> >> destroy them as long as they're still referenced.
> >
> > I am confused again. According to the above code:
> > 1) If "row" is not referenced, ovs_list_is_empty(&row->dst_arcs) is true
> > and the row will be put to track_list.
> > 2) If "row" is referenced
> >    2.1)  if the table is tracked, ovsdb_idl_track_is_set(row->table) is
> > true and the row will be put to track_list.
> >    2.2)  if the table is not tracked, the row will NOT be put to
> > track_list. The row will NOT be put to track_list.
> >
> > If I understand correctly, in the current implementation (and in the
> > patch), everything in track_list will finally get released (either in
> > post_processing or in track_clear).
>
> Those rows (R1) are "orphan", their contents were deleted from the IDL
> database view, but there still exist other non-orphan rows (R2) that
> refer to them.
>
> > For 2.2), since the row is NOT put to track_list, how will it be
released?
>
> When R2 is deleted, ovsdb_idl_delete_row(R2) is called which calls
> ovsdb_idl_row_clear_arcs(R2, true).

E.g.:
In IDL iteration T1: R1 is deleted but R2 is not deleted, so during the
call ovsdb_idl_row_reparse_backrefs(R1), it calls
ovsdb_idl_row_clear_arcs(R2), which clears the arc R2 -> R1, so R1 is not
referenced any more.
In IDL iteration T2: R2 is deleted, but since the arc R2 -> R1 is lost,
ovsdb_idl_row_clear_arcs(R2, true) will not find R1, and R1 is leaked.
Did I misunderstand anything again?

>
> R2 has an arc to R1, and if R2 was the only row referring to R1 then
> ovsdb_idl_row_clear_arcs(R2, true) calls ovsdb_idl_row_destroy(R1).
>
> At this point ovs_list_is_empty(&R1->dst_arcs) == true so R1 is finally
> removed from its table's hashtable, added to its table's 'track_list'
> and will be completely freed in ovsdb_idl_row_destroy_postprocess().
>
> >
> > And for "But, even for those, ovsdb_idl_track_clear() will not destroy
them
> > as long as they're still referenced.", do you mean it is possible that a
> > row is NOT destroyed even after the IDL iteration after user calling
> > track_clear()? If yes, why is that necessary? And how is it achieved (in
> > track_clear() it goes through each track_list and release all
is_orphaned()
> > rows, so how could it not get released)?
>
> The reason is the same as above, there might still exist an R2 referring
R1.
>
> ovsdb_idl_track_clear(idl) calls ovsdb_idl_track_clear__(idl, false)
> which means that rows that are on 'track_list' but still referenced will
> not be freed because they're still in their table's hashtable:
>
> https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1260
>

Thanks to this pointer. However, I think R1 will be released as long as it
is put to track_list. If R1 is deleted but R2 is not deleted, in
ovsdb_idl_row_reparse_backrefs(R1), it calls ovsdb_idl_row_clear_arcs(R2),
which clears the arc R2 -> R1, and so in ovsdb_idl_track_clear__() there is
no reference to R1 and it will be released. Please correct me if I am
wrong. (I think this behavior is desired, because the deleted R1 is not
really needed after this IDL iteration after tracked change is accessed by
user)

> >
> >>
> >>> Before this check, ovsdb_idl_row_reparse_backrefs(row) is called to
make
> >>> sure the backrefs are cleared, so "ovs_list_is_empty(&row->dst_arcs)"
> >>> should always be true, right?
> >>
> >> No, even if there was an update to delete R1, R2 might still exist
> >> (e.g., monitor condition change causes R1 to be deleted from the local
> >> database view) and then ovsdb_idl_row_reparse_backrefs(R1) would create
> >> an arc from R2 to R1.
> >>
> >> I agree that the existing change tracking code is complex and can
> >> potentially be simplified and/or refactored.  However, that's a big
> >> effort and I think should not be in the scope of this bug fix.  I did
> >> have to refactor some things but I limited myself to only changing
parts
> >> that are absolutely required to fix the bug.
> >>
> > Sure, I agree that refactoring shouldn't be required in this patch. I am
> > totally ok with it being merged without the refactoring I suggested, but
> > regardless of the refactoring, could you still have my above concerns
> > clarified? Thanks a lot for your patience!
>
> Thanks for all the comments, they help question the implementation and
> spot issues!
>
> >
> > Thanks,
> > Han
>
> Regards,
> Dumitru
>
> >
> >> For the time being, without this fix ovn-controller is crashing because
> >> of the inconsistent view of the database the IDL change tracking code
> >> creates.
> >>
> >> I think this should be fixed as soon as possible.  In my opinion all
> >> other refactoring and hardening should be done as follow up.
> >>
> >> Regards,
> >> Dumitru
> >>
> >>>
> >>>>>
> >>>>> Thanks,
> >>>>> Han
> >>>>>
> >>>>>>>
> >>>>>>> For the rest of the code... It seems fine, but it's really hard
> >>>>>>> to review and I still have a feeling that something might go
> >>>>>>> wrong (not more wrong than in current code, though), but I can't
> >>>>>>> think of any example.  So, it's, probably, OK.
> >>>>
> >>>> Thanks, I'm not sure how we can improve the stability/reliability of
> >>>> this code without major refactoring/rewrite.  Until then though, the
> >>>> fact that we break consistency of the IDL client's view of the
database
> >>>> is something that needs to be addressed because it's taken for
granted
> >>>> by ovn-controller in quite a few cases.
> >>>>
> >>>>>>>
> >>>>>>> Best regards, Ilya Maximets.
> >>>>
> >>>> Regards,
> >>>> Dumitru
> >>>>
> >>>
> >>
> >
>
Dumitru Ceara March 23, 2021, 10:26 p.m. UTC | #10
On 3/23/21 8:15 PM, Han Zhou wrote:
> On Tue, Mar 23, 2021 at 5:00 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 3/22/21 8:45 PM, Han Zhou wrote:
>>> On Mon, Mar 22, 2021 at 4:27 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>>>
>>>> On 3/19/21 6:55 PM, Han Zhou wrote:
>>>>> On Fri, Mar 19, 2021 at 1:42 AM Dumitru Ceara <dceara@redhat.com>
> wrote:
>>>>>>
>>>>>> On 3/19/21 2:41 AM, Han Zhou wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 18, 2021 at 6:27 PM Han Zhou <hzhou@ovn.org
>>>>>>> <mailto:hzhou@ovn.org>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Mar 18, 2021 at 11:07 AM Ilya Maximets <i.maximets@ovn.org
>>>>>>> <mailto:i.maximets@ovn.org>> wrote:
>>>>>>>>>
>>>>>>>>> On 3/12/21 1:07 PM, Dumitru Ceara wrote:
>>>>>>>>>> Considering two DB rows, 'a' from table A and 'b' from table B
>>>>> (with
>>>>>>>>>> column 'ref_a' a reference to table A):
>>>>>>>>>> a = {A._uuid=<U1>}
>>>>>>>>>> b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>>>>>>>>
>>>>>>>>>> Assuming both records are present in the IDL client's in-memory
>>>>>>> view of
>>>>>>>>>> the database, depending whether row 'b' is also deleted in the
> same
>>>>>>>>>> transaction or not, deletion of row 'a' should generate the
>>>>> following
>>>>>>>>>> tracked changes:
>>>>>>>>>>
>>>>>>>>>> 1. only row 'a' is deleted:
>>>>>>>>>> - for table A:
>>>>>>>>>>   - deleted records: a = {A._uuid=<U1>}
>>>>>>>>>> - for table B:
>>>>>>>>>>   - updated records: b = {B._uuid=<U2>, B.ref_a=[]}
>>>>>>>>>>
>>>>>>>>>> 2. row 'a' and row 'b' are deleted in the same update:
>>>>>>>>>> - for table A:
>>>>>>>>>>   - deleted records: a = {A._uuid=<U1>}
>>>>>>>>>> - for table B:
>>>>>>>>>>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>>>>>>>>
>>>>>>>>>> To ensure this, we now delay reparsing row backrefs for deleted
>>>>> rows
>>>>>>>>>> until all updates in the current run have been processed.
>>>>>>>>>>
>>>>>>>>>> Without this change, in scenario 2 above, the tracked changes for
>>>>>>> table
>>>>>>>>>> B would be:
>>>>>>>>>> - deleted records: b = {B._uuid=<U2>, B.ref_a=[]}
>>>>>>>>>>
>>>>>>>>>> In particular, for strong references, row 'a' can never be
> deleted
>>>>> in
>>>>>>>>>> a transaction that happens strictly before row 'b' is deleted.
> In
>>>>>>> some
>>>>>>>>>> cases [0] both rows are deleted in the same transaction and
> having
>>>>>>>>>> B.ref_a=[] would violate the integrity of the database from
> client
>>>>>>>>>> perspective.  This would force the client to always validate that
>>>>>>>>>> strong reference fields are non-NULL.  This is not really an
> option
>>>>>>>>>> because the information in the original reference is required for
>>>>>>>>>> incrementally processing the record deletion.
>>>>>>>>>>
>>>>>>>>>> [0] with ovn-monitor-all=true, the following command triggers a
>>>>> crash
>>>>>>>>>>     in ovn-controller because a strong reference field becomes
>>>>> NULL:
>>>>>>>>>>     $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp
>>>>>>> 00:00:00:00:00:01 1.0.0.1/24 <http://1.0.0.1/24>
>>>>>>>>>>     $ ovn-nbctl lr-del r
>>>>>>>>>>
>>>>>>>>>> Reported-at: https://bugzilla.redhat.com/1932642
>>>>>>> <https://bugzilla.redhat.com/1932642>
>>>>>>>>>> Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for
>>>>>>> deleted rows.")
>>>>>>>>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com
>>>>>>> <mailto:dceara@redhat.com>>
>>>>>>>>>> ---
>>>>>>>>>>  lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
>>>>>>>>>>  tests/ovsdb-idl.at <http://ovsdb-idl.at>  |  203
>>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>>  tests/test-ovsdb.c  |   50 +++++++++++++
>>>>>>>>>>  tests/test-ovsdb.py |   14 ++++
>>>>>>>>>>  4 files changed, 370 insertions(+), 32 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
>>>>>>>>>> index 9e1e787..1542da9 100644
>>>>>>>>>> --- a/lib/ovsdb-idl.c
>>>>>>>>>> +++ b/lib/ovsdb-idl.c
>>>>>>>>>> @@ -92,6 +92,7 @@ struct ovsdb_idl {
>>>>>>>>>>      struct ovsdb_idl_txn *txn;
>>>>>>>>>>      struct hmap outstanding_txns;
>>>>>>>>>>      bool verify_write_only;
>>>>>>>>>> +    struct ovs_list orphan_rows; /* All deleted but still
>>>>>>> referenced rows. */
>>>>>>>>>
>>>>>>>>> The name here is a bit confusing.  I mean it's not a list
>>>>>>>>> of orphan rows, at least it's not a list of all the orphan
>>>>>>>>> rows.  So, we should, probably, rename it to something like
>>>>>>>>> 'deleted_rows' instead and also rename all related functions.
>>>>>>>>>
>>>>>>>>> What do you think?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I agree with Ilya that the "orphan_rows" is confusing. However,
>>>>>>> "deleted_rows" may also be confusing, because the current code also
>>>>>>> maintains deleted rows in the track_list of each table (no matter if
>>>>>>> tracking is enabled or not). If tracking is enabled, the rows are
>>>>>>> destroyed (for real) when ovsdb_idl_track_clear() is called; if not,
>>> it
>>>>>>> is destroyed in ovsdb_idl_row_destroy_postprocess(). The part of
> logic
>>>>>>> was really confusing. This patch changes that behavior further by
>>>>>>> introducing this orphan_rows list as a temporary place (for some
>>>>>>> situations) before moving to the track_list. Since this patch
> already
>>>>>>> did some refactoring (which is great), I'd suggest doing a little
> more
>>>>>>> to avoid the above confusion: for deleted rows, shall we just put
> the
>>>>>>> untracked rows in a "deleted_untracked_rows" list (i.e. rename the
>>>>>>> "orphan_rows" of this patch), and let the "track_list" only hold the
>>>>>>> tracked ones, and update the ovsdb_idl_row_destroy_postprocess() to
>>>>>>> destroy deleted_untracked_rows without worrying the track_list.
>>>>>>> track_list should exist only if tracking is enabled, and should be
>>> taken
>>>>>>> care by ovsdb_idl_track_clear().
>>>>>>
>>>>>> Thanks, Ilya and Han, for the review and suggestions.
>>>>>>
>>>>>> Renaming 'orphan_rows' definitely makes sense, I'll do that.
> However,
>>>>>> regarding using a different list for tables without change tracking
> I'm
>>>>>> not so sure if that's OK.  Even if the table doesn't have change
>>>>>> tracking its deleted rows should still first go to the "temporary"
>>>>>> orphan_rows list.
>>>>>>
>>>>>> For example:
>>>>>>
>>>>>> Table T1 (no change tracking):
>>>>>> - Row a = {A.uuid=<U1>}
>>>>>>
>>>>>> Table T2 (change tracking enabled):
>>>>>> - Row b = {B.uuid=<U2>, B.ref_to_A=<U1>}
>>>>>>
>>>>>> If in a transaction both rows 'a' and 'b' are deleted, and updates
> are
>>>>>> received in this order, if we wouldn't add 'a' to 'orphan_rows' then
>>>>>> we'd be reparsing backrefs for 'a' immediately leading to:
>>>>>>
>>>>>> Row b (updated) = {B.uuid=<U2>, B.ref_to_A=None}
>>>>>>
>>>>>> Then the update to delete 'b' is processed:
>>>>>>
>>>>>> Row b (deleted) = {B.uuid=<U2>, B.ref_to_A=None}
>>>>>>
>>>>>> The above is wrong and breaks consistency if ref_to_A is a strong
>>>>> reference.
>>>>>>
>>>>> Sorry that I didn't make it clear enough. The "deleted_untracked_rows"
>>> list
>>>>> in my suggestion is just a renaming of your "orphan_rows". The real
>>> change
>>>>> I suggested was that let's don't use it as a temporary place to hold
>>> these
>>>>> rows. We can make it clear that deleted_untracked_rows is for deleted
>>>>> untracked rows and the "track_list" of each table contains only
> tracked
>>>>> rows.
>>>>
>>>> I think there's some confusion: 'orphan_rows' doesn't contain deleted
>>>> untracked rows.  What it actually stores is "rows deleted in the
> current
>>>> IDL run regardless of change tracking enabled or not".
>>>>
>>> I think we are saying the same thing from different perspectives. For
>>> "deleted untracked rows" I am not talking about whether change tracking
> is
>>> enabled or not, but about whether the deleted row is in "track_list" or
>>> not. In this patch, a deleted row is in either "track_list" (what I
> called
>>> "tracked") or "orphan_rows" (what I called "deleted and untracked"). You
>>> always call "ovsdb_idl_row_untrack_change" before putting a row to
>>> "orphan_rows", so I think it would be clear to rename "orphan_rows" as
>>> "deleted_untracked_rows".
>>
>> Oh, I see now, it was a misunderstanding indeed.  I thought we were
>> referring to change tracking being enabled for the table or not.  I can
>> change the name to "deleted_untracked_rows" in the next revision.
>>
>>>
>>>>>
>>>>> In the above example, row a (without change tracking) will be put to
>>>>> deleted_untracked_rows, but no need to move to "track_list" later.
>>>>>
>>>>> In ovsdb_idl_row_destroy_postprocess, we can destroy
>>>>> deleted_untracked_rows, and no need to touch "track_list".
>>>>>
>>>>> Would this be more clear?
>>>>
>>>> The problem is with deleted rows that are part of tables with change
>>>> tracking enabled.  Taking the example from the commit log:
>>>>
>>>> Considering two DB rows, 'a' from table A and 'b' from table B (with
>>>> column 'ref_a' a reference to table A), both tables with change
> tracking
>>>> enabled:
>>>> a = {A._uuid=<U1>}
>>>> b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>>
>>>> If row 'a' and row 'b' are deleted in the same update then the IDL user
>>>> should be notified of the following events.
>>>> - for table A:
>>>>   - deleted records: a = {A._uuid=<U1>}
>>>> - for table B:
>>>>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>>
>>>> To achieve this we need to make sure all "delete" updates are processed
>>>> before reparsing any backrefs.  This allows to check in
>>>> ovsdb_idl_row_reparse_backrefs() if an arc's source (B in the example
>>>> above) is also deleted in the current run, and skip unparsing B.ref_a
> so
>>>> that that the IDL user (e.g., ovn-controller) can still access it.
>>>>
>>>
>>> In fact I didn't see the check you mentioned above. What's in the code
> is:
>>> +        /* Skip reparsing deleted refs.  If ref's table has change
> tracking
>>> +         * enabled, the users need the original information (and that
> will
>>> be
>>> +         * cleaned up by ovsdb_idl_track_clear()).  Otherwise, 'row' is
>>> +         * already orphan and is not accessible to the IDL user and
> will be
>>> +         * cleaned up at the end of the current IDL run.
>>> +         */
>>> +        if (ovsdb_idl_row_get_seqno(row, OVSDB_IDL_CHANGE_DELETE)) {
>>> +            continue;
>>> +        }
>>> +
>>>
>>> 1) "row" is "a" in the above example, and "ref" is "b". It seems you
> should
>>> check if "ref" is deleted?
>>
>> You're right, this should be "ref" instead of "row, thanks for spotting
>> this!  I'll add a test case to exercise this scenario to avoid
>> regressions in the future.
>>
>> Also, looking more at the code in my patch, OVSDB_IDL_CHANGE_DELETE will
>> never be set for rows that are on the "orphan_rows" list, so this
>> condition would never be true.
>>
>> I'll rework this part and make sure that OVSDB_IDL_CHANGE_DELETE is
>> always set for rows that are on the "orphan_rows" list.
>>
>>>
>>> 2) The comments says "if ref's table has change tracking enabled", but
> in
>>> fact the above code didn't check if change tracking is enabled or not.
>>> Should we check it?
>>
>> I can try to rephrase the comment.  What I meant was:
>>
>> "Skip reparsing deleted refs.  This is OK because either
>> - ref's table has change tracking enabled and the users need the
>>   original datum in 'ref', which will be cleaned up by
>>   ovsdb_idl_track_clear().
>> OR
>> - ref's table has change tracking disabled, 'ref' is deleted so its
>>   datum will be cleaned up in ovsdb_idl_row_destroy_postprocess()."
>>
> 
> For deleted refs, e.g. R2 reference R1, and both R1 and R2 are deleted,
> then during the call ovsdb_idl_delete_row(R2), the arc R2 -> R1 is already
> deleted, so here we won't even see R2 in ovsdb_idl_row_reparse_backref(R1),
> right? So the above check is useless?

You're right, I can remove this check altogether now that we delay
reparsing of backrefs for all deleted rows.  I'll update it in v4.

> 
>>>
>>> On the other hand, I understand the problem you are trying to solve,
> but I
>>> don't see why it is related to my comment regarding clearly
> distinguishing
>>> deleted_untracked_rows v.s. track_list.
>>
>> It's not, with your explanation about deleted_untracked_rows above it's
>> now clear to me.
>>
>>>
>>>>>
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>
>>>>>>> I took a second look at the patch, and it seems when tracking is not
>>>>>>> enabled it would result in the rows in "orphan_rows" leaked, because
>>> in
>>>>>>> ovsdb_idl_process_orphans() the rows are moved to "track_list" ONLY
> IF
>>>>>>> tracking is enabled for the table. If I am correct, would the above
>>>>>>> suggested refactoring help avoiding such kinds of problems?
>>>>>>
>>>>>> Actually, there's no leak AFAICT.  ovsdb_idl_process_orphans() moves
>>>>>> rows to 'track_list' in two scenarios:
>>>>>> a. they're not referenced anymore (ovs_list_is_empty(&row->dst_arcs))
>>>>>> b. they're in a table with change tracking enabled.
>>>>>>
>>>>>> If the table doesn't have change tracking enabled and the row is not
>>>>>> referenced anymore it will be added to 'track_list' and will be
> cleaned
>>>>>> up properly by ovsdb_idl_row_destroy_postprocess() (case "a" above).
>>>>>>
>>>>>> If the table doesn't have change tracking enabled but the row (let's
>>>>>> call it 'R1') is still referenced (i.e., it became "orphan") then we
>>>>>> shouldn't delete it yet.  When, in the future, the rows referring to
>>>>>> this orphan row are deleted/updated (let's call them 'R2'),
>>>>>> ovsdb_idl_row_clear_arcs(R2, true) is called and the orphan row R1 is
>>>>>> now properly deleted.
>>>>>>
>>>>>
>>>>> Ok, it seems no leak at all. I think I was confused by this check:
>>>>> +        /* Orphan rows that are still unreferenced or are part of
>>> tables
>>>>> that
>>>>> +         * have hange tracking enabled should be added to their
> table's
>>>>> +         * 'track_list'.
>>>>> +         */
>>>>> +        if (ovs_list_is_empty(&row->dst_arcs)
>>>>> +                || ovsdb_idl_track_is_set(row->table)) {
>>>>> +            ovsdb_idl_row_track_change(row, OVSDB_IDL_CHANGE_DELETE);
>>>>> +        }
>>>>>
>>>>> For R1, since "ovs_list_is_empty(&row->dst_arcs)" is true, it will be
>>> moved
>>>>> to track_list and finally get destroyed.
>>>>
>>>> R1 is still referenced by R2 at this point in the example above, that
> is
>>>> there is an arc from R2 to R1.
>>>
>>> Ok, my mistake.
>>>
>>>>
>>>>> But I think this "if" can be removed because deleted rows should
> always
>>> go
>>>>> to "track_list" so that they can be destroyed (assume not doing the
>>>>> refactoring I suggested earlier).
>>>>
>>>> R1 should not be destroyed until R2 is destroyed too.  The complexity
>>>> here exists because for tables with change tracking enabled, the
> current
>>>> design is to always add deleted rows (even if they're still referenced)
>>>> to 'track_list'.  But, even for those, ovsdb_idl_track_clear() will not
>>>> destroy them as long as they're still referenced.
>>>
>>> I am confused again. According to the above code:
>>> 1) If "row" is not referenced, ovs_list_is_empty(&row->dst_arcs) is true
>>> and the row will be put to track_list.
>>> 2) If "row" is referenced
>>>    2.1)  if the table is tracked, ovsdb_idl_track_is_set(row->table) is
>>> true and the row will be put to track_list.
>>>    2.2)  if the table is not tracked, the row will NOT be put to
>>> track_list. The row will NOT be put to track_list.
>>>
>>> If I understand correctly, in the current implementation (and in the
>>> patch), everything in track_list will finally get released (either in
>>> post_processing or in track_clear).
>>
>> Those rows (R1) are "orphan", their contents were deleted from the IDL
>> database view, but there still exist other non-orphan rows (R2) that
>> refer to them.
>>
>>> For 2.2), since the row is NOT put to track_list, how will it be
> released?
>>
>> When R2 is deleted, ovsdb_idl_delete_row(R2) is called which calls
>> ovsdb_idl_row_clear_arcs(R2, true).
> 
> E.g.:
> In IDL iteration T1: R1 is deleted but R2 is not deleted, so during the
> call ovsdb_idl_row_reparse_backrefs(R1), it calls
> ovsdb_idl_row_clear_arcs(R2), which clears the arc R2 -> R1, so R1 is not
> referenced any more.

In ovsdb_idl_row_reparse_backrefs(R1), we also call
ovsdb_idl_row_parse(R2) after clearing arcs from R2 -> R1.

This ends up calling ovsdb_idl_get_row_arc(R2, R1.uuid) which will find
the orphaned R1 still in the hashtable and readd the arc, R2 -> R1.

This arc is needed for the situation when R1 is reinserted (e.g., at T3)
to also mark R2 as "modified" when that happens.

> In IDL iteration T2: R2 is deleted, but since the arc R2 -> R1 is lost,
> ovsdb_idl_row_clear_arcs(R2, true) will not find R1, and R1 is leaked.
> Did I misunderstand anything again?

As mentioned above, the R2 -> R1 arc still exists and
ovsdb_idl_row_clear_arcs(R2, true) will delete R1.

> 
>>
>> R2 has an arc to R1, and if R2 was the only row referring to R1 then
>> ovsdb_idl_row_clear_arcs(R2, true) calls ovsdb_idl_row_destroy(R1).
>>
>> At this point ovs_list_is_empty(&R1->dst_arcs) == true so R1 is finally
>> removed from its table's hashtable, added to its table's 'track_list'
>> and will be completely freed in ovsdb_idl_row_destroy_postprocess().
>>
>>>
>>> And for "But, even for those, ovsdb_idl_track_clear() will not destroy
> them
>>> as long as they're still referenced.", do you mean it is possible that a
>>> row is NOT destroyed even after the IDL iteration after user calling
>>> track_clear()? If yes, why is that necessary? And how is it achieved (in
>>> track_clear() it goes through each track_list and release all
> is_orphaned()
>>> rows, so how could it not get released)?
>>
>> The reason is the same as above, there might still exist an R2 referring
> R1.
>>
>> ovsdb_idl_track_clear(idl) calls ovsdb_idl_track_clear__(idl, false)
>> which means that rows that are on 'track_list' but still referenced will
>> not be freed because they're still in their table's hashtable:
>>
>> https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1260
>>
> 
> Thanks to this pointer. However, I think R1 will be released as long as it
> is put to track_list. If R1 is deleted but R2 is not deleted, in
> ovsdb_idl_row_reparse_backrefs(R1), it calls ovsdb_idl_row_clear_arcs(R2),
> which clears the arc R2 -> R1, and so in ovsdb_idl_track_clear__() there is
> no reference to R1 and it will be released. Please correct me if I am
> wrong. (I think this behavior is desired, because the deleted R1 is not
> really needed after this IDL iteration after tracked change is accessed by
> user)

As before, R2 -> R1 gets recreated in ovsdb_idl_row_reparse_backrefs(R1)
and orphaned R1 is needed until either R2 is deleted or R1 is reinserted
(e.g., monitor condition change).

If it's OK with you, I can send a v4 incorporating the changes we
discussed until now (no refactoring though).  I'm also adding more test
cases.  Hopefully that makes the review easier.

Thanks,
Dumitru

> 
>>>
>>>>
>>>>> Before this check, ovsdb_idl_row_reparse_backrefs(row) is called to
> make
>>>>> sure the backrefs are cleared, so "ovs_list_is_empty(&row->dst_arcs)"
>>>>> should always be true, right?
>>>>
>>>> No, even if there was an update to delete R1, R2 might still exist
>>>> (e.g., monitor condition change causes R1 to be deleted from the local
>>>> database view) and then ovsdb_idl_row_reparse_backrefs(R1) would create
>>>> an arc from R2 to R1.
>>>>
>>>> I agree that the existing change tracking code is complex and can
>>>> potentially be simplified and/or refactored.  However, that's a big
>>>> effort and I think should not be in the scope of this bug fix.  I did
>>>> have to refactor some things but I limited myself to only changing
> parts
>>>> that are absolutely required to fix the bug.
>>>>
>>> Sure, I agree that refactoring shouldn't be required in this patch. I am
>>> totally ok with it being merged without the refactoring I suggested, but
>>> regardless of the refactoring, could you still have my above concerns
>>> clarified? Thanks a lot for your patience!
>>
>> Thanks for all the comments, they help question the implementation and
>> spot issues!
>>
>>>
>>> Thanks,
>>> Han
>>
>> Regards,
>> Dumitru
>>
>>>
>>>> For the time being, without this fix ovn-controller is crashing because
>>>> of the inconsistent view of the database the IDL change tracking code
>>>> creates.
>>>>
>>>> I think this should be fixed as soon as possible.  In my opinion all
>>>> other refactoring and hardening should be done as follow up.
>>>>
>>>> Regards,
>>>> Dumitru
>>>>
>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Han
>>>>>>>
>>>>>>>>>
>>>>>>>>> For the rest of the code... It seems fine, but it's really hard
>>>>>>>>> to review and I still have a feeling that something might go
>>>>>>>>> wrong (not more wrong than in current code, though), but I can't
>>>>>>>>> think of any example.  So, it's, probably, OK.
>>>>>>
>>>>>> Thanks, I'm not sure how we can improve the stability/reliability of
>>>>>> this code without major refactoring/rewrite.  Until then though, the
>>>>>> fact that we break consistency of the IDL client's view of the
> database
>>>>>> is something that needs to be addressed because it's taken for
> granted
>>>>>> by ovn-controller in quite a few cases.
>>>>>>
>>>>>>>>>
>>>>>>>>> Best regards, Ilya Maximets.
>>>>>>
>>>>>> Regards,
>>>>>> Dumitru
>>>>>>
>>>>>
>>>>
>>>
>>
>
Han Zhou March 24, 2021, 2:16 a.m. UTC | #11
On Tue, Mar 23, 2021 at 3:26 PM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 3/23/21 8:15 PM, Han Zhou wrote:
> > On Tue, Mar 23, 2021 at 5:00 AM Dumitru Ceara <dceara@redhat.com> wrote:
> >>
> >> On 3/22/21 8:45 PM, Han Zhou wrote:
> >>> On Mon, Mar 22, 2021 at 4:27 AM Dumitru Ceara <dceara@redhat.com>
wrote:
> >>>>
> >>>> On 3/19/21 6:55 PM, Han Zhou wrote:
> >>>>> On Fri, Mar 19, 2021 at 1:42 AM Dumitru Ceara <dceara@redhat.com>
> > wrote:
> >>>>>>
> >>>>>> On 3/19/21 2:41 AM, Han Zhou wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Mar 18, 2021 at 6:27 PM Han Zhou <hzhou@ovn.org
> >>>>>>> <mailto:hzhou@ovn.org>> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, Mar 18, 2021 at 11:07 AM Ilya Maximets <
i.maximets@ovn.org
> >>>>>>> <mailto:i.maximets@ovn.org>> wrote:
> >>>>>>>>>
> >>>>>>>>> On 3/12/21 1:07 PM, Dumitru Ceara wrote:
> >>>>>>>>>> Considering two DB rows, 'a' from table A and 'b' from table B
> >>>>> (with
> >>>>>>>>>> column 'ref_a' a reference to table A):
> >>>>>>>>>> a = {A._uuid=<U1>}
> >>>>>>>>>> b = {B._uuid=<U2>, B.ref_a=<U1>}
> >>>>>>>>>>
> >>>>>>>>>> Assuming both records are present in the IDL client's in-memory
> >>>>>>> view of
> >>>>>>>>>> the database, depending whether row 'b' is also deleted in the
> > same
> >>>>>>>>>> transaction or not, deletion of row 'a' should generate the
> >>>>> following
> >>>>>>>>>> tracked changes:
> >>>>>>>>>>
> >>>>>>>>>> 1. only row 'a' is deleted:
> >>>>>>>>>> - for table A:
> >>>>>>>>>>   - deleted records: a = {A._uuid=<U1>}
> >>>>>>>>>> - for table B:
> >>>>>>>>>>   - updated records: b = {B._uuid=<U2>, B.ref_a=[]}
> >>>>>>>>>>
> >>>>>>>>>> 2. row 'a' and row 'b' are deleted in the same update:
> >>>>>>>>>> - for table A:
> >>>>>>>>>>   - deleted records: a = {A._uuid=<U1>}
> >>>>>>>>>> - for table B:
> >>>>>>>>>>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
> >>>>>>>>>>
> >>>>>>>>>> To ensure this, we now delay reparsing row backrefs for deleted
> >>>>> rows
> >>>>>>>>>> until all updates in the current run have been processed.
> >>>>>>>>>>
> >>>>>>>>>> Without this change, in scenario 2 above, the tracked changes
for
> >>>>>>> table
> >>>>>>>>>> B would be:
> >>>>>>>>>> - deleted records: b = {B._uuid=<U2>, B.ref_a=[]}
> >>>>>>>>>>
> >>>>>>>>>> In particular, for strong references, row 'a' can never be
> > deleted
> >>>>> in
> >>>>>>>>>> a transaction that happens strictly before row 'b' is deleted.
> > In
> >>>>>>> some
> >>>>>>>>>> cases [0] both rows are deleted in the same transaction and
> > having
> >>>>>>>>>> B.ref_a=[] would violate the integrity of the database from
> > client
> >>>>>>>>>> perspective.  This would force the client to always validate
that
> >>>>>>>>>> strong reference fields are non-NULL.  This is not really an
> > option
> >>>>>>>>>> because the information in the original reference is required
for
> >>>>>>>>>> incrementally processing the record deletion.
> >>>>>>>>>>
> >>>>>>>>>> [0] with ovn-monitor-all=true, the following command triggers a
> >>>>> crash
> >>>>>>>>>>     in ovn-controller because a strong reference field becomes
> >>>>> NULL:
> >>>>>>>>>>     $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp
> >>>>>>> 00:00:00:00:00:01 1.0.0.1/24 <http://1.0.0.1/24>
> >>>>>>>>>>     $ ovn-nbctl lr-del r
> >>>>>>>>>>
> >>>>>>>>>> Reported-at: https://bugzilla.redhat.com/1932642
> >>>>>>> <https://bugzilla.redhat.com/1932642>
> >>>>>>>>>> Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for
> >>>>>>> deleted rows.")
> >>>>>>>>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com
> >>>>>>> <mailto:dceara@redhat.com>>
> >>>>>>>>>> ---
> >>>>>>>>>>  lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
> >>>>>>>>>>  tests/ovsdb-idl.at <http://ovsdb-idl.at>  |  203
> >>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>>>  tests/test-ovsdb.c  |   50 +++++++++++++
> >>>>>>>>>>  tests/test-ovsdb.py |   14 ++++
> >>>>>>>>>>  4 files changed, 370 insertions(+), 32 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> >>>>>>>>>> index 9e1e787..1542da9 100644
> >>>>>>>>>> --- a/lib/ovsdb-idl.c
> >>>>>>>>>> +++ b/lib/ovsdb-idl.c
> >>>>>>>>>> @@ -92,6 +92,7 @@ struct ovsdb_idl {
> >>>>>>>>>>      struct ovsdb_idl_txn *txn;
> >>>>>>>>>>      struct hmap outstanding_txns;
> >>>>>>>>>>      bool verify_write_only;
> >>>>>>>>>> +    struct ovs_list orphan_rows; /* All deleted but still
> >>>>>>> referenced rows. */
> >>>>>>>>>
> >>>>>>>>> The name here is a bit confusing.  I mean it's not a list
> >>>>>>>>> of orphan rows, at least it's not a list of all the orphan
> >>>>>>>>> rows.  So, we should, probably, rename it to something like
> >>>>>>>>> 'deleted_rows' instead and also rename all related functions.
> >>>>>>>>>
> >>>>>>>>> What do you think?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> I agree with Ilya that the "orphan_rows" is confusing. However,
> >>>>>>> "deleted_rows" may also be confusing, because the current code
also
> >>>>>>> maintains deleted rows in the track_list of each table (no matter
if
> >>>>>>> tracking is enabled or not). If tracking is enabled, the rows are
> >>>>>>> destroyed (for real) when ovsdb_idl_track_clear() is called; if
not,
> >>> it
> >>>>>>> is destroyed in ovsdb_idl_row_destroy_postprocess(). The part of
> > logic
> >>>>>>> was really confusing. This patch changes that behavior further by
> >>>>>>> introducing this orphan_rows list as a temporary place (for some
> >>>>>>> situations) before moving to the track_list. Since this patch
> > already
> >>>>>>> did some refactoring (which is great), I'd suggest doing a little
> > more
> >>>>>>> to avoid the above confusion: for deleted rows, shall we just put
> > the
> >>>>>>> untracked rows in a "deleted_untracked_rows" list (i.e. rename the
> >>>>>>> "orphan_rows" of this patch), and let the "track_list" only hold
the
> >>>>>>> tracked ones, and update the ovsdb_idl_row_destroy_postprocess()
to
> >>>>>>> destroy deleted_untracked_rows without worrying the track_list.
> >>>>>>> track_list should exist only if tracking is enabled, and should be
> >>> taken
> >>>>>>> care by ovsdb_idl_track_clear().
> >>>>>>
> >>>>>> Thanks, Ilya and Han, for the review and suggestions.
> >>>>>>
> >>>>>> Renaming 'orphan_rows' definitely makes sense, I'll do that.
> > However,
> >>>>>> regarding using a different list for tables without change tracking
> > I'm
> >>>>>> not so sure if that's OK.  Even if the table doesn't have change
> >>>>>> tracking its deleted rows should still first go to the "temporary"
> >>>>>> orphan_rows list.
> >>>>>>
> >>>>>> For example:
> >>>>>>
> >>>>>> Table T1 (no change tracking):
> >>>>>> - Row a = {A.uuid=<U1>}
> >>>>>>
> >>>>>> Table T2 (change tracking enabled):
> >>>>>> - Row b = {B.uuid=<U2>, B.ref_to_A=<U1>}
> >>>>>>
> >>>>>> If in a transaction both rows 'a' and 'b' are deleted, and updates
> > are
> >>>>>> received in this order, if we wouldn't add 'a' to 'orphan_rows'
then
> >>>>>> we'd be reparsing backrefs for 'a' immediately leading to:
> >>>>>>
> >>>>>> Row b (updated) = {B.uuid=<U2>, B.ref_to_A=None}
> >>>>>>
> >>>>>> Then the update to delete 'b' is processed:
> >>>>>>
> >>>>>> Row b (deleted) = {B.uuid=<U2>, B.ref_to_A=None}
> >>>>>>
> >>>>>> The above is wrong and breaks consistency if ref_to_A is a strong
> >>>>> reference.
> >>>>>>
> >>>>> Sorry that I didn't make it clear enough. The
"deleted_untracked_rows"
> >>> list
> >>>>> in my suggestion is just a renaming of your "orphan_rows". The real
> >>> change
> >>>>> I suggested was that let's don't use it as a temporary place to hold
> >>> these
> >>>>> rows. We can make it clear that deleted_untracked_rows is for
deleted
> >>>>> untracked rows and the "track_list" of each table contains only
> > tracked
> >>>>> rows.
> >>>>
> >>>> I think there's some confusion: 'orphan_rows' doesn't contain deleted
> >>>> untracked rows.  What it actually stores is "rows deleted in the
> > current
> >>>> IDL run regardless of change tracking enabled or not".
> >>>>
> >>> I think we are saying the same thing from different perspectives. For
> >>> "deleted untracked rows" I am not talking about whether change
tracking
> > is
> >>> enabled or not, but about whether the deleted row is in "track_list"
or
> >>> not. In this patch, a deleted row is in either "track_list" (what I
> > called
> >>> "tracked") or "orphan_rows" (what I called "deleted and untracked").
You
> >>> always call "ovsdb_idl_row_untrack_change" before putting a row to
> >>> "orphan_rows", so I think it would be clear to rename "orphan_rows" as
> >>> "deleted_untracked_rows".
> >>
> >> Oh, I see now, it was a misunderstanding indeed.  I thought we were
> >> referring to change tracking being enabled for the table or not.  I can
> >> change the name to "deleted_untracked_rows" in the next revision.
> >>
> >>>
> >>>>>
> >>>>> In the above example, row a (without change tracking) will be put to
> >>>>> deleted_untracked_rows, but no need to move to "track_list" later.
> >>>>>
> >>>>> In ovsdb_idl_row_destroy_postprocess, we can destroy
> >>>>> deleted_untracked_rows, and no need to touch "track_list".
> >>>>>
> >>>>> Would this be more clear?
> >>>>
> >>>> The problem is with deleted rows that are part of tables with change
> >>>> tracking enabled.  Taking the example from the commit log:
> >>>>
> >>>> Considering two DB rows, 'a' from table A and 'b' from table B (with
> >>>> column 'ref_a' a reference to table A), both tables with change
> > tracking
> >>>> enabled:
> >>>> a = {A._uuid=<U1>}
> >>>> b = {B._uuid=<U2>, B.ref_a=<U1>}
> >>>>
> >>>> If row 'a' and row 'b' are deleted in the same update then the IDL
user
> >>>> should be notified of the following events.
> >>>> - for table A:
> >>>>   - deleted records: a = {A._uuid=<U1>}
> >>>> - for table B:
> >>>>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
> >>>>
> >>>> To achieve this we need to make sure all "delete" updates are
processed
> >>>> before reparsing any backrefs.  This allows to check in
> >>>> ovsdb_idl_row_reparse_backrefs() if an arc's source (B in the example
> >>>> above) is also deleted in the current run, and skip unparsing B.ref_a
> > so
> >>>> that that the IDL user (e.g., ovn-controller) can still access it.
> >>>>
> >>>
> >>> In fact I didn't see the check you mentioned above. What's in the code
> > is:
> >>> +        /* Skip reparsing deleted refs.  If ref's table has change
> > tracking
> >>> +         * enabled, the users need the original information (and that
> > will
> >>> be
> >>> +         * cleaned up by ovsdb_idl_track_clear()).  Otherwise, 'row'
is
> >>> +         * already orphan and is not accessible to the IDL user and
> > will be
> >>> +         * cleaned up at the end of the current IDL run.
> >>> +         */
> >>> +        if (ovsdb_idl_row_get_seqno(row, OVSDB_IDL_CHANGE_DELETE)) {
> >>> +            continue;
> >>> +        }
> >>> +
> >>>
> >>> 1) "row" is "a" in the above example, and "ref" is "b". It seems you
> > should
> >>> check if "ref" is deleted?
> >>
> >> You're right, this should be "ref" instead of "row, thanks for spotting
> >> this!  I'll add a test case to exercise this scenario to avoid
> >> regressions in the future.
> >>
> >> Also, looking more at the code in my patch, OVSDB_IDL_CHANGE_DELETE
will
> >> never be set for rows that are on the "orphan_rows" list, so this
> >> condition would never be true.
> >>
> >> I'll rework this part and make sure that OVSDB_IDL_CHANGE_DELETE is
> >> always set for rows that are on the "orphan_rows" list.
> >>
> >>>
> >>> 2) The comments says "if ref's table has change tracking enabled", but
> > in
> >>> fact the above code didn't check if change tracking is enabled or not.
> >>> Should we check it?
> >>
> >> I can try to rephrase the comment.  What I meant was:
> >>
> >> "Skip reparsing deleted refs.  This is OK because either
> >> - ref's table has change tracking enabled and the users need the
> >>   original datum in 'ref', which will be cleaned up by
> >>   ovsdb_idl_track_clear().
> >> OR
> >> - ref's table has change tracking disabled, 'ref' is deleted so its
> >>   datum will be cleaned up in ovsdb_idl_row_destroy_postprocess()."
> >>
> >
> > For deleted refs, e.g. R2 reference R1, and both R1 and R2 are deleted,
> > then during the call ovsdb_idl_delete_row(R2), the arc R2 -> R1 is
already
> > deleted, so here we won't even see R2 in
ovsdb_idl_row_reparse_backref(R1),
> > right? So the above check is useless?
>
> You're right, I can remove this check altogether now that we delay
> reparsing of backrefs for all deleted rows.  I'll update it in v4.
>
> >
> >>>
> >>> On the other hand, I understand the problem you are trying to solve,
> > but I
> >>> don't see why it is related to my comment regarding clearly
> > distinguishing
> >>> deleted_untracked_rows v.s. track_list.
> >>
> >> It's not, with your explanation about deleted_untracked_rows above it's
> >> now clear to me.
> >>
> >>>
> >>>>>
> >>>>>>>>
> >>>>>>>> What do you think?
> >>>>>>>>
> >>>>>>>
> >>>>>>> I took a second look at the patch, and it seems when tracking is
not
> >>>>>>> enabled it would result in the rows in "orphan_rows" leaked,
because
> >>> in
> >>>>>>> ovsdb_idl_process_orphans() the rows are moved to "track_list"
ONLY
> > IF
> >>>>>>> tracking is enabled for the table. If I am correct, would the
above
> >>>>>>> suggested refactoring help avoiding such kinds of problems?
> >>>>>>
> >>>>>> Actually, there's no leak AFAICT.  ovsdb_idl_process_orphans()
moves
> >>>>>> rows to 'track_list' in two scenarios:
> >>>>>> a. they're not referenced anymore
(ovs_list_is_empty(&row->dst_arcs))
> >>>>>> b. they're in a table with change tracking enabled.
> >>>>>>
> >>>>>> If the table doesn't have change tracking enabled and the row is
not
> >>>>>> referenced anymore it will be added to 'track_list' and will be
> > cleaned
> >>>>>> up properly by ovsdb_idl_row_destroy_postprocess() (case "a"
above).
> >>>>>>
> >>>>>> If the table doesn't have change tracking enabled but the row
(let's
> >>>>>> call it 'R1') is still referenced (i.e., it became "orphan") then
we
> >>>>>> shouldn't delete it yet.  When, in the future, the rows referring
to
> >>>>>> this orphan row are deleted/updated (let's call them 'R2'),
> >>>>>> ovsdb_idl_row_clear_arcs(R2, true) is called and the orphan row R1
is
> >>>>>> now properly deleted.
> >>>>>>
> >>>>>
> >>>>> Ok, it seems no leak at all. I think I was confused by this check:
> >>>>> +        /* Orphan rows that are still unreferenced or are part of
> >>> tables
> >>>>> that
> >>>>> +         * have hange tracking enabled should be added to their
> > table's
> >>>>> +         * 'track_list'.
> >>>>> +         */
> >>>>> +        if (ovs_list_is_empty(&row->dst_arcs)
> >>>>> +                || ovsdb_idl_track_is_set(row->table)) {
> >>>>> +            ovsdb_idl_row_track_change(row,
OVSDB_IDL_CHANGE_DELETE);
> >>>>> +        }
> >>>>>
> >>>>> For R1, since "ovs_list_is_empty(&row->dst_arcs)" is true, it will
be
> >>> moved
> >>>>> to track_list and finally get destroyed.
> >>>>
> >>>> R1 is still referenced by R2 at this point in the example above, that
> > is
> >>>> there is an arc from R2 to R1.
> >>>
> >>> Ok, my mistake.
> >>>
> >>>>
> >>>>> But I think this "if" can be removed because deleted rows should
> > always
> >>> go
> >>>>> to "track_list" so that they can be destroyed (assume not doing the
> >>>>> refactoring I suggested earlier).
> >>>>
> >>>> R1 should not be destroyed until R2 is destroyed too.  The complexity
> >>>> here exists because for tables with change tracking enabled, the
> > current
> >>>> design is to always add deleted rows (even if they're still
referenced)
> >>>> to 'track_list'.  But, even for those, ovsdb_idl_track_clear() will
not
> >>>> destroy them as long as they're still referenced.
> >>>
> >>> I am confused again. According to the above code:
> >>> 1) If "row" is not referenced, ovs_list_is_empty(&row->dst_arcs) is
true
> >>> and the row will be put to track_list.
> >>> 2) If "row" is referenced
> >>>    2.1)  if the table is tracked, ovsdb_idl_track_is_set(row->table)
is
> >>> true and the row will be put to track_list.
> >>>    2.2)  if the table is not tracked, the row will NOT be put to
> >>> track_list. The row will NOT be put to track_list.
> >>>
> >>> If I understand correctly, in the current implementation (and in the
> >>> patch), everything in track_list will finally get released (either in
> >>> post_processing or in track_clear).
> >>
> >> Those rows (R1) are "orphan", their contents were deleted from the IDL
> >> database view, but there still exist other non-orphan rows (R2) that
> >> refer to them.
> >>
> >>> For 2.2), since the row is NOT put to track_list, how will it be
> > released?
> >>
> >> When R2 is deleted, ovsdb_idl_delete_row(R2) is called which calls
> >> ovsdb_idl_row_clear_arcs(R2, true).
> >
> > E.g.:
> > In IDL iteration T1: R1 is deleted but R2 is not deleted, so during the
> > call ovsdb_idl_row_reparse_backrefs(R1), it calls
> > ovsdb_idl_row_clear_arcs(R2), which clears the arc R2 -> R1, so R1 is
not
> > referenced any more.
>
> In ovsdb_idl_row_reparse_backrefs(R1), we also call
> ovsdb_idl_row_parse(R2) after clearing arcs from R2 -> R1.
>
> This ends up calling ovsdb_idl_get_row_arc(R2, R1.uuid) which will find
> the orphaned R1 still in the hashtable and readd the arc, R2 -> R1.
>
> This arc is needed for the situation when R1 is reinserted (e.g., at T3)
> to also mark R2 as "modified" when that happens.
>
> > In IDL iteration T2: R2 is deleted, but since the arc R2 -> R1 is lost,
> > ovsdb_idl_row_clear_arcs(R2, true) will not find R1, and R1 is leaked.
> > Did I misunderstand anything again?
>
> As mentioned above, the R2 -> R1 arc still exists and
> ovsdb_idl_row_clear_arcs(R2, true) will delete R1.
>

Thanks for the explanation. I misunderstood the criteria of adding arcs
back in ovsdb_idl_row_parse().

> >
> >>
> >> R2 has an arc to R1, and if R2 was the only row referring to R1 then
> >> ovsdb_idl_row_clear_arcs(R2, true) calls ovsdb_idl_row_destroy(R1).
> >>
> >> At this point ovs_list_is_empty(&R1->dst_arcs) == true so R1 is finally
> >> removed from its table's hashtable, added to its table's 'track_list'
> >> and will be completely freed in ovsdb_idl_row_destroy_postprocess().
> >>
> >>>
> >>> And for "But, even for those, ovsdb_idl_track_clear() will not destroy
> > them
> >>> as long as they're still referenced.", do you mean it is possible
that a
> >>> row is NOT destroyed even after the IDL iteration after user calling
> >>> track_clear()? If yes, why is that necessary? And how is it achieved
(in
> >>> track_clear() it goes through each track_list and release all
> > is_orphaned()
> >>> rows, so how could it not get released)?
> >>
> >> The reason is the same as above, there might still exist an R2
referring
> > R1.
> >>
> >> ovsdb_idl_track_clear(idl) calls ovsdb_idl_track_clear__(idl, false)
> >> which means that rows that are on 'track_list' but still referenced
will
> >> not be freed because they're still in their table's hashtable:
> >>
> >> https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1260
> >>
> >
> > Thanks to this pointer. However, I think R1 will be released as long as
it
> > is put to track_list. If R1 is deleted but R2 is not deleted, in
> > ovsdb_idl_row_reparse_backrefs(R1), it calls
ovsdb_idl_row_clear_arcs(R2),
> > which clears the arc R2 -> R1, and so in ovsdb_idl_track_clear__()
there is
> > no reference to R1 and it will be released. Please correct me if I am
> > wrong. (I think this behavior is desired, because the deleted R1 is not
> > really needed after this IDL iteration after tracked change is accessed
by
> > user)
>
> As before, R2 -> R1 gets recreated in ovsdb_idl_row_reparse_backrefs(R1)
> and orphaned R1 is needed until either R2 is deleted or R1 is reinserted
> (e.g., monitor condition change).
>
> If it's OK with you, I can send a v4 incorporating the changes we
> discussed until now (no refactoring though).  I'm also adding more test
> cases.  Hopefully that makes the review easier.

Sounds good.

Thanks,
Han

>
> Thanks,
> Dumitru
>
> >
> >>>
> >>>>
> >>>>> Before this check, ovsdb_idl_row_reparse_backrefs(row) is called to
> > make
> >>>>> sure the backrefs are cleared, so
"ovs_list_is_empty(&row->dst_arcs)"
> >>>>> should always be true, right?
> >>>>
> >>>> No, even if there was an update to delete R1, R2 might still exist
> >>>> (e.g., monitor condition change causes R1 to be deleted from the
local
> >>>> database view) and then ovsdb_idl_row_reparse_backrefs(R1) would
create
> >>>> an arc from R2 to R1.
> >>>>
> >>>> I agree that the existing change tracking code is complex and can
> >>>> potentially be simplified and/or refactored.  However, that's a big
> >>>> effort and I think should not be in the scope of this bug fix.  I did
> >>>> have to refactor some things but I limited myself to only changing
> > parts
> >>>> that are absolutely required to fix the bug.
> >>>>
> >>> Sure, I agree that refactoring shouldn't be required in this patch. I
am
> >>> totally ok with it being merged without the refactoring I suggested,
but
> >>> regardless of the refactoring, could you still have my above concerns
> >>> clarified? Thanks a lot for your patience!
> >>
> >> Thanks for all the comments, they help question the implementation and
> >> spot issues!
> >>
> >>>
> >>> Thanks,
> >>> Han
> >>
> >> Regards,
> >> Dumitru
> >>
> >>>
> >>>> For the time being, without this fix ovn-controller is crashing
because
> >>>> of the inconsistent view of the database the IDL change tracking code
> >>>> creates.
> >>>>
> >>>> I think this should be fixed as soon as possible.  In my opinion all
> >>>> other refactoring and hardening should be done as follow up.
> >>>>
> >>>> Regards,
> >>>> Dumitru
> >>>>
> >>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Han
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>> For the rest of the code... It seems fine, but it's really hard
> >>>>>>>>> to review and I still have a feeling that something might go
> >>>>>>>>> wrong (not more wrong than in current code, though), but I can't
> >>>>>>>>> think of any example.  So, it's, probably, OK.
> >>>>>>
> >>>>>> Thanks, I'm not sure how we can improve the stability/reliability
of
> >>>>>> this code without major refactoring/rewrite.  Until then though,
the
> >>>>>> fact that we break consistency of the IDL client's view of the
> > database
> >>>>>> is something that needs to be addressed because it's taken for
> > granted
> >>>>>> by ovn-controller in quite a few cases.
> >>>>>>
> >>>>>>>>>
> >>>>>>>>> Best regards, Ilya Maximets.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Dumitru
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
Dumitru Ceara March 24, 2021, 9:38 a.m. UTC | #12
On 3/24/21 3:16 AM, Han Zhou wrote:
>> If it's OK with you, I can send a v4 incorporating the changes we
>> discussed until now (no refactoring though).  I'm also adding more test
>> cases.  Hopefully that makes the review easier.
> Sounds good.
> 
> Thanks,
> Han
> 

V4 available for review:
https://patchwork.ozlabs.org/project/openvswitch/list/?series=235642

Thanks,
Dumitru
diff mbox series

Patch

diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
index 9e1e787..1542da9 100644
--- a/lib/ovsdb-idl.c
+++ b/lib/ovsdb-idl.c
@@ -92,6 +92,7 @@  struct ovsdb_idl {
     struct ovsdb_idl_txn *txn;
     struct hmap outstanding_txns;
     bool verify_write_only;
+    struct ovs_list orphan_rows; /* All deleted but still referenced rows. */
 };
 
 static struct ovsdb_cs_ops ovsdb_idl_cs_ops;
@@ -144,6 +145,7 @@  static bool ovsdb_idl_modify_row(struct ovsdb_idl_row *,
                                  const struct shash *values, bool xor);
 static void ovsdb_idl_parse_update(struct ovsdb_idl *,
                                    const struct ovsdb_cs_update_event *);
+static void ovsdb_idl_process_orphans(struct ovsdb_idl *);
 
 static void ovsdb_idl_txn_process_reply(struct ovsdb_idl *,
                                         const struct jsonrpc_msg *);
@@ -163,6 +165,10 @@  static void ovsdb_idl_row_unparse(struct ovsdb_idl_row *);
 static void ovsdb_idl_row_clear_old(struct ovsdb_idl_row *);
 static void ovsdb_idl_row_clear_new(struct ovsdb_idl_row *);
 static void ovsdb_idl_row_clear_arcs(struct ovsdb_idl_row *, bool destroy_dsts);
+static void ovsdb_idl_row_reparse_backrefs(struct ovsdb_idl_row *);
+static void ovsdb_idl_row_track_change(struct ovsdb_idl_row *,
+                                       enum ovsdb_idl_change);
+static void ovsdb_idl_row_untrack_change(struct ovsdb_idl_row *);
 
 static void ovsdb_idl_txn_abort_all(struct ovsdb_idl *);
 static bool ovsdb_idl_txn_extract_mutations(struct ovsdb_idl_row *,
@@ -248,6 +254,7 @@  ovsdb_idl_create_unconnected(const struct ovsdb_idl_class *class,
         .txn = NULL,
         .outstanding_txns = HMAP_INITIALIZER(&idl->outstanding_txns),
         .verify_write_only = false,
+        .orphan_rows = OVS_LIST_INITIALIZER(&idl->orphan_rows),
     };
 
     uint8_t default_mode = (monitor_everything_by_default
@@ -351,6 +358,12 @@  ovsdb_idl_set_leader_only(struct ovsdb_idl *idl, bool leader_only)
 static void
 ovsdb_idl_clear(struct ovsdb_idl *db)
 {
+    /* Process orphan rows, removing them from the 'orphan_rows' list. */
+    ovsdb_idl_process_orphans(db);
+
+    /* Cleanup all rows; each row gets added to its own table's
+     * 'track_list'.
+     */
     for (size_t i = 0; i < db->class_->n_tables; i++) {
         struct ovsdb_idl_table *table = &db->tables[i];
         struct ovsdb_idl_row *row, *next_row;
@@ -367,17 +380,26 @@  ovsdb_idl_clear(struct ovsdb_idl *db)
                 ovsdb_idl_row_unparse(row);
             }
             LIST_FOR_EACH_SAFE (arc, next_arc, src_node, &row->src_arcs) {
+                ovs_list_remove(&arc->src_node);
+                ovs_list_remove(&arc->dst_node);
+                free(arc);
+            }
+            LIST_FOR_EACH_SAFE (arc, next_arc, dst_node, &row->dst_arcs) {
+                ovs_list_remove(&arc->src_node);
+                ovs_list_remove(&arc->dst_node);
                 free(arc);
             }
-            /* No need to do anything with dst_arcs: some node has those arcs
-             * as forward arcs and will destroy them itself. */
 
             ovsdb_idl_row_destroy(row);
         }
     }
+
+    /* Free rows deleted from tables with change tracking disabled. */
     ovsdb_idl_row_destroy_postprocess(db);
 
+    /* Free rows deleted from tables with change tracking enabled. */
     ovsdb_idl_track_clear__(db, true);
+    ovs_assert(ovs_list_is_empty(&db->orphan_rows));
     db->change_seqno++;
 }
 
@@ -415,7 +437,7 @@  ovsdb_idl_run(struct ovsdb_idl *idl)
         }
         ovsdb_cs_event_destroy(event);
     }
-
+    ovsdb_idl_process_orphans(idl);
     ovsdb_idl_row_destroy_postprocess(idl);
 }
 
@@ -1226,13 +1248,8 @@  ovsdb_idl_track_clear__(struct ovsdb_idl *idl, bool flush_all)
                     free(row->updated);
                     row->updated = NULL;
                 }
+                ovsdb_idl_row_untrack_change(row);
 
-                row->change_seqno[OVSDB_IDL_CHANGE_INSERT] =
-                    row->change_seqno[OVSDB_IDL_CHANGE_MODIFY] =
-                    row->change_seqno[OVSDB_IDL_CHANGE_DELETE] = 0;
-
-                ovs_list_remove(&row->track_node);
-                ovs_list_init(&row->track_node);
                 if (ovsdb_idl_row_is_orphan(row)) {
                     ovsdb_idl_row_unparse(row);
                     if (row->tracked_old_datum) {
@@ -1350,6 +1367,32 @@  ovsdb_idl_parse_update(struct ovsdb_idl *idl,
     }
 }
 
+/* Reparses references to rows that have been deleted in the current IDL run.
+ *
+ * To ensure that reference sources that are deleted are not reparsed,
+ * this function must be called after all updates have been processed in
+ * the current IDL run, i.e., after all calls to ovsdb_idl_parse_update().
+ */
+static void
+ovsdb_idl_process_orphans(struct ovsdb_idl *db)
+{
+    struct ovsdb_idl_row *row, *next;
+
+    LIST_FOR_EACH_SAFE (row, next, track_node, &db->orphan_rows) {
+        ovsdb_idl_row_untrack_change(row);
+        ovsdb_idl_row_reparse_backrefs(row);
+
+        /* Orphan rows that are still unreferenced or are part of tables that
+         * have hange tracking enabled should be added to their table's
+         * 'track_list'.
+         */
+        if (ovs_list_is_empty(&row->dst_arcs)
+                || ovsdb_idl_track_is_set(row->table)) {
+            ovsdb_idl_row_track_change(row, OVSDB_IDL_CHANGE_DELETE);
+        }
+    }
+}
+
 static struct ovsdb_idl_row *
 ovsdb_idl_get_row(struct ovsdb_idl_table *table, const struct uuid *uuid)
 {
@@ -1403,6 +1446,7 @@  ovsdb_idl_process_update(struct ovsdb_idl_table *table,
             ovsdb_idl_insert_row(ovsdb_idl_row_create(table, uuid),
                                  ru->columns);
         } else if (ovsdb_idl_row_is_orphan(row)) {
+            ovsdb_idl_row_untrack_change(row);
             ovsdb_idl_insert_row(row, ru->columns);
         } else {
             VLOG_ERR_RL(&semantic_rl, "cannot add existing row "UUID_FMT" to "
@@ -1450,13 +1494,8 @@  add_tracked_change_for_references(struct ovsdb_idl_row *row)
 
         if (ovs_list_is_empty(&ref->track_node) &&
             ovsdb_idl_track_is_set(ref->table)) {
-                ovs_list_push_back(&ref->table->track_list,
-                                   &ref->track_node);
-
-            ref->change_seqno[OVSDB_IDL_CHANGE_MODIFY]
-                = ref->table->change_seqno[OVSDB_IDL_CHANGE_MODIFY]
-                = ref->table->idl->change_seqno + 1;
 
+            ovsdb_idl_row_track_change(ref, OVSDB_IDL_CHANGE_MODIFY);
             add_tracked_change_for_references(ref);
         }
     }
@@ -2016,12 +2055,48 @@  ovsdb_idl_row_reparse_backrefs(struct ovsdb_idl_row *row)
     LIST_FOR_EACH_SAFE (arc, next, dst_node, &row->dst_arcs) {
         struct ovsdb_idl_row *ref = arc->src;
 
+        /* Skip reparsing deleted refs.  If ref's table has change tracking
+         * enabled, the users need the original information (and that will be
+         * cleaned up by ovsdb_idl_track_clear()).  Otherwise, 'row' is
+         * already orphan and is not accessible to the IDL user and will be
+         * cleaned up at the end of the current IDL run.
+         */
+        if (ovsdb_idl_row_get_seqno(row, OVSDB_IDL_CHANGE_DELETE)) {
+            continue;
+        }
+
         ovsdb_idl_row_unparse(ref);
         ovsdb_idl_row_clear_arcs(ref, false);
         ovsdb_idl_row_parse(ref);
     }
 }
 
+static void
+ovsdb_idl_row_track_change(struct ovsdb_idl_row *row,
+                           enum ovsdb_idl_change change)
+{
+    row->change_seqno[change]
+        = row->table->change_seqno[change]
+        = row->table->idl->change_seqno + 1;
+    if (ovs_list_is_empty(&row->track_node)) {
+        ovs_list_push_back(&row->table->track_list, &row->track_node);
+    }
+}
+
+static void
+ovsdb_idl_row_untrack_change(struct ovsdb_idl_row *row)
+{
+    if (ovs_list_is_empty(&row->track_node)) {
+        return;
+    }
+
+    row->change_seqno[OVSDB_IDL_CHANGE_INSERT] =
+        row->change_seqno[OVSDB_IDL_CHANGE_MODIFY] =
+        row->change_seqno[OVSDB_IDL_CHANGE_DELETE] = 0;
+    ovs_list_remove(&row->track_node);
+    ovs_list_init(&row->track_node);
+}
+
 static struct ovsdb_idl_row *
 ovsdb_idl_row_create__(const struct ovsdb_idl_table_class *class)
 {
@@ -2048,22 +2123,25 @@  ovsdb_idl_row_create(struct ovsdb_idl_table *table, const struct uuid *uuid)
     return row;
 }
 
+/* If 'row' is not referenced anymore, removes 'row' from the table hmap,
+ * clears the old datum and adds 'row' to the table's track_list.
+ *
+ * If 'row' is still referenced, i.e., became "orphan", queues 'row' for
+ * reparsing after all updates have been processed by adding it to the
+ * 'orphan_rows' list.
+ */
 static void
 ovsdb_idl_row_destroy(struct ovsdb_idl_row *row)
 {
-    if (row) {
-        ovsdb_idl_row_clear_old(row);
+    ovsdb_idl_row_clear_old(row);
+    if (ovs_list_is_empty(&row->dst_arcs)) {
         hmap_remove(&row->table->rows, &row->hmap_node);
         ovsdb_idl_destroy_all_map_op_lists(row);
         ovsdb_idl_destroy_all_set_op_lists(row);
-        if (ovsdb_idl_track_is_set(row->table)) {
-            row->change_seqno[OVSDB_IDL_CHANGE_DELETE]
-                = row->table->change_seqno[OVSDB_IDL_CHANGE_DELETE]
-                = row->table->idl->change_seqno + 1;
-        }
-        if (ovs_list_is_empty(&row->track_node)) {
-            ovs_list_push_back(&row->table->track_list, &row->track_node);
-        }
+        ovsdb_idl_row_track_change(row, OVSDB_IDL_CHANGE_DELETE);
+    } else {
+        ovsdb_idl_row_untrack_change(row);
+        ovs_list_push_back(&row->table->idl->orphan_rows, &row->track_node);
     }
 }
 
@@ -2153,12 +2231,7 @@  ovsdb_idl_delete_row(struct ovsdb_idl_row *row)
 {
     ovsdb_idl_remove_from_indexes(row);
     ovsdb_idl_row_clear_arcs(row, true);
-    ovsdb_idl_row_clear_old(row);
-    if (ovs_list_is_empty(&row->dst_arcs)) {
-        ovsdb_idl_row_destroy(row);
-    } else {
-        ovsdb_idl_row_reparse_backrefs(row);
-    }
+    ovsdb_idl_row_destroy(row);
 }
 
 /* Returns true if a column with mode OVSDB_IDL_MODE_RW changed, false
diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
index 610680c..7ac3c20 100644
--- a/tests/ovsdb-idl.at
+++ b/tests/ovsdb-idl.at
@@ -141,7 +141,7 @@  m4_define([OVSDB_CHECK_IDL_REGISTER_COLUMNS_PY],
    AT_CHECK([ovsdb_start_idltest])
    m4_if([$2], [], [],
      [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])])
-   AT_CHECK([$PYTHON3 $srcdir/test-ovsdb.py  -t10 idl $srcdir/idltest.ovsschema unix:socket ?simple:b,ba,i,ia,r,ra,s,sa,u,ua?link1:i,k,ka,l2?link2:i,l1?singleton:name $3],
+   AT_CHECK([$PYTHON3 $srcdir/test-ovsdb.py  -t10 idl $srcdir/idltest.ovsschema unix:socket ?simple:b,ba,i,ia,r,ra,s,sa,u,ua?simple6:name,weak_ref?link1:i,k,ka,l2?link2:i,l1?singleton:name $3],
             [0], [stdout], [ignore])
    AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
             [0], [$4])
@@ -874,6 +874,35 @@  OVSDB_CHECK_IDL([singleton idl, constraints],
 006: done
 ]])
 
+dnl This test creates a database with references and checks that deleting both
+dnl source and destination rows of a reference in a single update doesn't leak
+dnl rows that got orphaned when processing the update.
+OVSDB_CHECK_IDL([simple idl, references, multiple deletes],
+  [['["idltest",
+      {"op": "insert",
+       "table": "simple",
+       "row": {"s": "row0_s"},
+       "uuid-name": "weak_row0"},
+      {"op": "insert",
+       "table": "simple6",
+       "row": {"name": "first_row",
+               "weak_ref": ["set",
+                             [["named-uuid", "weak_row0"]]
+                           ]}}]']],
+  [['["idltest",
+      {"op": "delete",
+       "table": "simple",
+       "where": [["s", "==", "row0_s"]]},
+      {"op": "delete",
+       "table": "simple6",
+       "where": [["name", "==", "first_row"]]}]']],
+  [[000: table simple6: name=first_row weak_ref=[<0>] uuid=<1>
+000: table simple: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<0>
+001: {"error":null,"result":[{"count":1},{"count":1}]}
+002: empty
+003: done
+]])
+
 OVSDB_CHECK_IDL_PY([external-linking idl, insert ops],
   [],
   [['linktest']],
@@ -1257,6 +1286,7 @@  OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated, orphan rows, cond
 003: table simple: inserted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
 003: table simple: updated columns: s
 004: change conditions
+005: table simple: deleted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
 005: table simple: inserted row: i=0 r=0 b=false s=row1_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<3>
 005: table simple: updated columns: s
 006: change conditions
@@ -1269,6 +1299,177 @@  OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated, orphan rows, cond
 010: done
 ]])
 
+dnl This test checks that deleting the destination of a weak reference
+dnl without deleting the source, through monitor condition change, updates
+dnl the source tracked record.
+OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated, references, conditional delete],
+  [['["idltest",
+      {"op": "insert",
+       "table": "simple",
+       "row": {"s": "row0_s", "i": 0},
+       "uuid-name": "weak_row0"},
+      {"op": "insert",
+       "table": "simple",
+       "row": {"s": "row1_s", "i": 1},
+       "uuid-name": "weak_row1"},
+      {"op": "insert",
+       "table": "simple6",
+       "row": {"name": "first_row",
+               "weak_ref": ["set",
+                             [["named-uuid", "weak_row0"],
+                              ["named-uuid", "weak_row1"]]
+                           ]}}]']],
+  [['condition simple []' \
+    'condition simple [["s","==","row0_s"]]' \
+    'condition simple [["s","==","row1_s"]]' \
+    '["idltest",
+      {"op": "update",
+      "table": "simple6",
+      "where": [],
+      "row": {"name": "new_name"}}]' \
+    '["idltest",
+      {"op": "delete",
+      "table": "simple6",
+      "where": []}]']],
+  [[000: change conditions
+001: table simple6: inserted row: name=first_row weak_ref=[] uuid=<0>
+001: table simple6: updated columns: name weak_ref
+002: change conditions
+003: table simple6: name=first_row weak_ref=[<1>] uuid=<0>
+003: table simple: inserted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
+003: table simple: updated columns: s
+004: change conditions
+005: table simple6: name=first_row weak_ref=[<3>] uuid=<0>
+005: table simple: deleted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
+005: table simple: inserted row: i=1 r=0 b=false s=row1_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<3>
+005: table simple: updated columns: i s
+006: {"error":null,"result":[{"count":1}]}
+007: table simple6: name=new_name weak_ref=[<3>] uuid=<0>
+007: table simple6: updated columns: name
+008: {"error":null,"result":[{"count":1}]}
+009: table simple: i=1 r=0 b=false s=row1_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<3>
+010: done
+]])
+
+dnl This test checks that deleting the destination of a reference updates the
+dnl source tracked record.
+OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated, references, single delete],
+  [['["idltest",
+      {"op": "insert",
+       "table": "simple",
+       "row": {"s": "row0_s"},
+       "uuid-name": "uuid_row0_s"},
+      {"op": "insert",
+       "table": "simple6",
+       "row": {"name": "row0_s6",
+               "weak_ref": ["set",
+                             [["named-uuid", "uuid_row0_s"]]
+                           ]}}]']],
+  [['condition simple [true]; simple6 [true]' \
+    '["idltest",
+      {"op": "delete",
+       "table": "simple",
+       "where": []}]' \
+    '["idltest",
+      {"op": "insert",
+       "table": "simple",
+       "row": {"s": "row0_s"}}]']],
+  [[000: change conditions
+001: table simple6: inserted row: name=row0_s6 weak_ref=[<0>] uuid=<1>
+001: table simple6: updated columns: name weak_ref
+001: table simple: inserted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<0>
+001: table simple: updated columns: s
+002: {"error":null,"result":[{"count":1}]}
+003: table simple6: name=row0_s6 weak_ref=[] uuid=<1>
+003: table simple6: updated columns: weak_ref
+003: table simple: deleted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<0>
+004: {"error":null,"result":[{"uuid":["uuid","<3>"]}]}
+005: table simple6: name=row0_s6 weak_ref=[] uuid=<1>
+005: table simple: inserted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<3>
+005: table simple: updated columns: s
+006: done
+]])
+
+dnl This test checks that deleting both the destination and source of the
+dnl reference doesn't remove the reference in the source tracked record.
+OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated, weak references, multiple deletes],
+  [['["idltest",
+      {"op": "insert",
+       "table": "simple",
+       "row": {"s": "row0_s"},
+       "uuid-name": "uuid_row0_s"},
+      {"op": "insert",
+       "table": "simple6",
+       "row": {"name": "row0_s6",
+               "weak_ref": ["set",
+                             [["named-uuid", "uuid_row0_s"]]
+                           ]}}]']],
+  [['condition simple [true]; simple6 [true]' \
+    '["idltest",
+      {"op": "delete",
+       "table": "simple",
+       "where": []},
+      {"op": "delete",
+       "table": "simple6",
+       "where": []}]' \
+    '["idltest",
+      {"op": "insert",
+       "table": "simple",
+       "row": {"s": "row0_s"}}]']],
+  [[000: change conditions
+001: table simple6: inserted row: name=row0_s6 weak_ref=[<0>] uuid=<1>
+001: table simple6: updated columns: name weak_ref
+001: table simple: inserted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<0>
+001: table simple: updated columns: s
+002: {"error":null,"result":[{"count":1},{"count":1}]}
+003: table simple6: deleted row: name=row0_s6 weak_ref=[<0>] uuid=<1>
+003: table simple: deleted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<0>
+004: {"error":null,"result":[{"uuid":["uuid","<3>"]}]}
+005: table simple: inserted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<3>
+005: table simple: updated columns: s
+006: done
+]])
+
+dnl This test checks that deleting both the destination and source of the
+dnl reference doesn't remove the reference the source tracked record.
+OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated, strong references, multiple deletes],
+  [['["idltest",
+      {"op": "insert",
+       "table": "simple4",
+       "row": {"name": "row0_s4"},
+       "uuid-name": "uuid_row0_s4"},
+      {"op": "insert",
+       "table": "simple3",
+       "row": {"name": "row0_s3",
+               "uref": ["set",
+                         [["named-uuid", "uuid_row0_s4"]]
+                       ]}}]']],
+  [['condition simple [true]; simple3 [true]; simple4 [true]' \
+    '["idltest",
+      {"op": "delete",
+       "table": "simple3",
+       "where": []},
+      {"op": "delete",
+       "table": "simple4",
+       "where": []}]' \
+    '["idltest",
+      {"op": "insert",
+       "table": "simple",
+       "row": {"s": "row0_s"}}]']],
+  [[000: change conditions
+001: table simple3: inserted row: name=row0_s3 uset=[] uref=[[<0>]] uuid=<1>
+001: table simple3: updated columns: name uref
+001: table simple4: inserted row: name=row0_s4 uuid=<0>
+001: table simple4: updated columns: name
+002: {"error":null,"result":[{"count":1},{"count":1}]}
+003: table simple3: deleted row: name=row0_s3 uset=[] uref=[[<0>]] uuid=<1>
+003: table simple4: deleted row: name=row0_s4 uuid=<0>
+004: {"error":null,"result":[{"uuid":["uuid","<2>"]}]}
+005: table simple: inserted row: i=0 r=0 b=false s=row0_s u=<3> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<2>
+005: table simple: updated columns: s
+006: done
+]])
+
 OVSDB_CHECK_IDL_TRACK([track, simple idl, initially empty, various ops],
   [],
   [['["idltest",
diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
index f50e88d..72d0163 100644
--- a/tests/test-ovsdb.c
+++ b/tests/test-ovsdb.c
@@ -1947,6 +1947,23 @@  print_idl_row_updated_simple3(const struct idltest_simple3 *s3, int step)
 }
 
 static void
+print_idl_row_updated_simple4(const struct idltest_simple4 *s4, int step)
+{
+    struct ds updates = DS_EMPTY_INITIALIZER;
+    for (size_t i = 0; i < IDLTEST_SIMPLE4_N_COLUMNS; i++) {
+        if (idltest_simple4_is_updated(s4, i)) {
+            ds_put_format(&updates, " %s", idltest_simple4_columns[i].name);
+        }
+    }
+    if (updates.length) {
+        print_and_log("%03d: table %s: updated columns:%s",
+                      step, s4->header_.table->class_->name,
+                      ds_cstr(&updates));
+        ds_destroy(&updates);
+    }
+}
+
+static void
 print_idl_row_updated_simple6(const struct idltest_simple6 *s6, int step)
 {
     struct ds updates = DS_EMPTY_INITIALIZER;
@@ -2090,6 +2107,20 @@  print_idl_row_simple3(const struct idltest_simple3 *s3, int step)
 }
 
 static void
+print_idl_row_simple4(const struct idltest_simple4 *s4, int step)
+{
+    struct ds msg = DS_EMPTY_INITIALIZER;
+    ds_put_format(&msg, "name=%s", s4->name);
+
+    char *row_msg = format_idl_row(&s4->header_, step, ds_cstr(&msg));
+    print_and_log("%s", row_msg);
+    ds_destroy(&msg);
+    free(row_msg);
+
+    print_idl_row_updated_simple4(s4, step);
+}
+
+static void
 print_idl_row_simple6(const struct idltest_simple6 *s6, int step)
 {
     struct ds msg = DS_EMPTY_INITIALIZER;
@@ -2126,6 +2157,7 @@  print_idl_row_singleton(const struct idltest_singleton *sng, int step)
 static void
 print_idl(struct ovsdb_idl *idl, int step)
 {
+    const struct idltest_simple6 *s6;
     const struct idltest_simple *s;
     const struct idltest_link1 *l1;
     const struct idltest_link2 *l2;
@@ -2144,6 +2176,10 @@  print_idl(struct ovsdb_idl *idl, int step)
         print_idl_row_link2(l2, step);
         n++;
     }
+    IDLTEST_SIMPLE6_FOR_EACH (s6, idl) {
+        print_idl_row_simple6(s6, step);
+        n++;
+    }
     IDLTEST_SINGLETON_FOR_EACH (sng, idl) {
         print_idl_row_singleton(sng, step);
         n++;
@@ -2156,6 +2192,8 @@  print_idl(struct ovsdb_idl *idl, int step)
 static void
 print_idl_track(struct ovsdb_idl *idl, int step)
 {
+    const struct idltest_simple3 *s3;
+    const struct idltest_simple4 *s4;
     const struct idltest_simple6 *s6;
     const struct idltest_simple *s;
     const struct idltest_link1 *l1;
@@ -2174,6 +2212,14 @@  print_idl_track(struct ovsdb_idl *idl, int step)
         print_idl_row_link2(l2, step);
         n++;
     }
+    IDLTEST_SIMPLE3_FOR_EACH_TRACKED (s3, idl) {
+        print_idl_row_simple3(s3, step);
+        n++;
+    }
+    IDLTEST_SIMPLE4_FOR_EACH_TRACKED (s4, idl) {
+        print_idl_row_simple4(s4, step);
+        n++;
+    }
     IDLTEST_SIMPLE6_FOR_EACH_TRACKED (s6, idl) {
         print_idl_row_simple6(s6, step);
         n++;
@@ -2404,6 +2450,10 @@  find_table_class(const char *name)
         return &idltest_table_link1;
     } else if (!strcmp(name, "link2")) {
         return &idltest_table_link2;
+    } else if (!strcmp(name, "simple3")) {
+        return &idltest_table_simple3;
+    } else if (!strcmp(name, "simple4")) {
+        return &idltest_table_simple4;
     } else if (!strcmp(name, "simple6")) {
         return &idltest_table_simple6;
     }
diff --git a/tests/test-ovsdb.py b/tests/test-ovsdb.py
index bc7ea4f..b8b8e03 100644
--- a/tests/test-ovsdb.py
+++ b/tests/test-ovsdb.py
@@ -162,6 +162,8 @@  def get_simple_printable_row_string(row, columns):
             if isinstance(value, dict):
                 value = sorted((row_to_uuid(k), row_to_uuid(v))
                                for k, v in value.items())
+            elif isinstance(value, list):
+                value = sorted(row_to_uuid(v) for v in value)
             s += "%s=%s " % (column, value)
     s = s.strip()
     s = re.sub('""|,|u?\'', "", s)
@@ -193,6 +195,11 @@  def get_simple5_table_printable_row(row):
     return get_simple_printable_row_string(row, simple5_columns)
 
 
+def get_simple6_table_printable_row(row):
+    simple6_columns = ["name", "weak_ref"]
+    return get_simple_printable_row_string(row, simple6_columns)
+
+
 def get_link1_table_printable_row(row):
     s = ["i=%s k=" % row.i]
     if hasattr(row, "k") and row.k:
@@ -253,6 +260,13 @@  def print_idl(idl, step):
                       get_simple5_table_printable_row(row))
             n += 1
 
+    if "simple6" in idl.tables:
+        simple6 = idl.tables["simple6"].rows
+        for row in simple6.values():
+            print_row("simple6", row, step,
+                      get_simple6_table_printable_row(row))
+            n += 1
+
     if "link1" in idl.tables:
         l1 = idl.tables["link1"].rows
         for row in l1.values():