diff mbox

[ovs-dev] OVSDB: Fix segfalut during replication.

Message ID 1474403228-56688-1-git-send-email-azhou@ovn.org
State Accepted
Headers show

Commit Message

Andy Zhou Sept. 20, 2016, 8:27 p.m. UTC
The newly added replication logic makes it possible for a monitor to
receive delete and insertion of the same row back to back, which
was not possible before. Add logic (and comment) to handle this
case to avoid follow crash reported by Valgrind:

    #0  0x0000000000453edd in ovsdb_datum_compare_3way
            (a=0x5efbe60, b=0x0, type=0x5e6a848) at lib/ovsdb-data.c:1626
    #1  0x0000000000453ea4 in ovsdb_datum_equals
            (a=0x5efbe60, b=0x0, type=0x5e6a848) at lib/ovsdb-data.c:1616
    #2  0x000000000041b651 in update_monitor_row_data
            (mt=0x5eda4a0, row=0x5efbe00, data=0x0) at ovsdb/monitor.c:310
    #3  0x000000000041ed14 in ovsdb_monitor_changes_update
            (old=0x0, new=0x5efbe00, mt=0x5eda4a0, changes=0x5ef7180)
            at ovsdb/monitor.c:1255
    #4  0x000000000041f12e in ovsdb_monitor_change_cb
            (old=0x0, new=0x5efbe00, changed=0x5efc218, aux_=0xffefff040)
            at ovsdb/monitor.c:1339
    #5  0x000000000042ded9 in ovsdb_txn_for_each_change
            (txn=0x5efbd90, cb=0x41ef50 <ovsdb_monitor_change_cb>,
             aux=0xffefff040) at ovsdb/transaction.c:906
    #6  0x0000000000420155 in ovsdb_monitor_commit
            (replica=0x5eda2c0, txn=0x5efbd90, durable=false)
            at ovsdb/monitor.c:1553
    #7  0x000000000042dc04 in ovsdb_txn_commit_
            (txn=0x5efbd90, durable=false) at ovsdb/transaction.c:868
    #8  0x000000000042ddd4 in ovsdb_txn_commit (txn=0x5efbd90, durable=false)
            at ovsdb/transaction.c:893
    #9  0x0000000000422e0c in process_notification
            (table_updates=0x5efad10, db=0x5e6bd40) at ovsdb/replication.c:575
    #10 0x0000000000420ff3 in replication_run () at ovsdb/replication.c:184
    #11 0x0000000000405cc8 in main_loop
            (jsonrpc=0x5e67770, all_dbs=0xffefff3a0, unixctl=0x5ebd980,
             remotes=0xffefff360, run_process=0x0, exiting=0xffefff3c0,
            is_backup=0xffefff2de) at ovsdb/ovsdb-server.c:198
    #12 0x0000000000406edb in main (argc=1, argv=0xffefff550)
            at ovsdb/ovsdb-server.c:429

Reported-by: Joe Stringer <joe@ovn.org>
Reported-at: http://openvswitch.org/pipermail/dev/2016-September/079315.html
Reported-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Reported-at: http://openvswitch.org/pipermail/dev/2016-September/079586.html
Co-authored-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Andy Zhou <azhou@ovn.org>
---
 ovsdb/monitor.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

Comments

Joe Stringer Sept. 20, 2016, 11:39 p.m. UTC | #1
On 20 September 2016 at 13:27, Andy Zhou <azhou@ovn.org> wrote:
> The newly added replication logic makes it possible for a monitor to
> receive delete and insertion of the same row back to back, which
> was not possible before. Add logic (and comment) to handle this
> case to avoid follow crash reported by Valgrind:
>
>     #0  0x0000000000453edd in ovsdb_datum_compare_3way
>             (a=0x5efbe60, b=0x0, type=0x5e6a848) at lib/ovsdb-data.c:1626
>     #1  0x0000000000453ea4 in ovsdb_datum_equals
>             (a=0x5efbe60, b=0x0, type=0x5e6a848) at lib/ovsdb-data.c:1616
>     #2  0x000000000041b651 in update_monitor_row_data
>             (mt=0x5eda4a0, row=0x5efbe00, data=0x0) at ovsdb/monitor.c:310
>     #3  0x000000000041ed14 in ovsdb_monitor_changes_update
>             (old=0x0, new=0x5efbe00, mt=0x5eda4a0, changes=0x5ef7180)
>             at ovsdb/monitor.c:1255
>     #4  0x000000000041f12e in ovsdb_monitor_change_cb
>             (old=0x0, new=0x5efbe00, changed=0x5efc218, aux_=0xffefff040)
>             at ovsdb/monitor.c:1339
>     #5  0x000000000042ded9 in ovsdb_txn_for_each_change
>             (txn=0x5efbd90, cb=0x41ef50 <ovsdb_monitor_change_cb>,
>              aux=0xffefff040) at ovsdb/transaction.c:906
>     #6  0x0000000000420155 in ovsdb_monitor_commit
>             (replica=0x5eda2c0, txn=0x5efbd90, durable=false)
>             at ovsdb/monitor.c:1553
>     #7  0x000000000042dc04 in ovsdb_txn_commit_
>             (txn=0x5efbd90, durable=false) at ovsdb/transaction.c:868
>     #8  0x000000000042ddd4 in ovsdb_txn_commit (txn=0x5efbd90, durable=false)
>             at ovsdb/transaction.c:893
>     #9  0x0000000000422e0c in process_notification
>             (table_updates=0x5efad10, db=0x5e6bd40) at ovsdb/replication.c:575
>     #10 0x0000000000420ff3 in replication_run () at ovsdb/replication.c:184
>     #11 0x0000000000405cc8 in main_loop
>             (jsonrpc=0x5e67770, all_dbs=0xffefff3a0, unixctl=0x5ebd980,
>              remotes=0xffefff360, run_process=0x0, exiting=0xffefff3c0,
>             is_backup=0xffefff2de) at ovsdb/ovsdb-server.c:198
>     #12 0x0000000000406edb in main (argc=1, argv=0xffefff550)
>             at ovsdb/ovsdb-server.c:429
>
> Reported-by: Joe Stringer <joe@ovn.org>
> Reported-at: http://openvswitch.org/pipermail/dev/2016-September/079315.html
> Reported-by: Alin Serdean <aserdean@cloudbasesolutions.com>
> Reported-at: http://openvswitch.org/pipermail/dev/2016-September/079586.html
> Co-authored-by: Joe Stringer <joe@ovn.org>
> Signed-off-by: Andy Zhou <azhou@ovn.org>

Signed-off-by: Joe Stringer <joe@ovn.org>

> ---
>  ovsdb/monitor.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/ovsdb/monitor.c b/ovsdb/monitor.c
> index a590943..7e6ddcb 100644
> --- a/ovsdb/monitor.c
> +++ b/ovsdb/monitor.c
> @@ -1252,7 +1252,46 @@ ovsdb_monitor_changes_update(const struct ovsdb_row *old,
>          change->new = clone_monitor_row_data(mt, new);
>      } else {
>          if (new) {
> -            update_monitor_row_data(mt, new, change->new);
> +            if (!change->new) {

Usually we do "if (positive condition) { ... } else { ... }", so
readers don't have to double-negate in their mind to figure out what
the "else" condition is.

I'll leave the review of the reasoning below to others.
Andy Zhou Sept. 20, 2016, 11:53 p.m. UTC | #2
On Tue, Sep 20, 2016 at 4:39 PM, Joe Stringer <joe@ovn.org> wrote:

> On 20 September 2016 at 13:27, Andy Zhou <azhou@ovn.org> wrote:
> > The newly added replication logic makes it possible for a monitor to
> > receive delete and insertion of the same row back to back, which
> > was not possible before. Add logic (and comment) to handle this
> > case to avoid follow crash reported by Valgrind:
> >
> >     #0  0x0000000000453edd in ovsdb_datum_compare_3way
> >             (a=0x5efbe60, b=0x0, type=0x5e6a848) at lib/ovsdb-data.c:1626
> >     #1  0x0000000000453ea4 in ovsdb_datum_equals
> >             (a=0x5efbe60, b=0x0, type=0x5e6a848) at lib/ovsdb-data.c:1616
> >     #2  0x000000000041b651 in update_monitor_row_data
> >             (mt=0x5eda4a0, row=0x5efbe00, data=0x0) at
> ovsdb/monitor.c:310
> >     #3  0x000000000041ed14 in ovsdb_monitor_changes_update
> >             (old=0x0, new=0x5efbe00, mt=0x5eda4a0, changes=0x5ef7180)
> >             at ovsdb/monitor.c:1255
> >     #4  0x000000000041f12e in ovsdb_monitor_change_cb
> >             (old=0x0, new=0x5efbe00, changed=0x5efc218, aux_=0xffefff040)
> >             at ovsdb/monitor.c:1339
> >     #5  0x000000000042ded9 in ovsdb_txn_for_each_change
> >             (txn=0x5efbd90, cb=0x41ef50 <ovsdb_monitor_change_cb>,
> >              aux=0xffefff040) at ovsdb/transaction.c:906
> >     #6  0x0000000000420155 in ovsdb_monitor_commit
> >             (replica=0x5eda2c0, txn=0x5efbd90, durable=false)
> >             at ovsdb/monitor.c:1553
> >     #7  0x000000000042dc04 in ovsdb_txn_commit_
> >             (txn=0x5efbd90, durable=false) at ovsdb/transaction.c:868
> >     #8  0x000000000042ddd4 in ovsdb_txn_commit (txn=0x5efbd90,
> durable=false)
> >             at ovsdb/transaction.c:893
> >     #9  0x0000000000422e0c in process_notification
> >             (table_updates=0x5efad10, db=0x5e6bd40) at
> ovsdb/replication.c:575
> >     #10 0x0000000000420ff3 in replication_run () at
> ovsdb/replication.c:184
> >     #11 0x0000000000405cc8 in main_loop
> >             (jsonrpc=0x5e67770, all_dbs=0xffefff3a0, unixctl=0x5ebd980,
> >              remotes=0xffefff360, run_process=0x0, exiting=0xffefff3c0,
> >             is_backup=0xffefff2de) at ovsdb/ovsdb-server.c:198
> >     #12 0x0000000000406edb in main (argc=1, argv=0xffefff550)
> >             at ovsdb/ovsdb-server.c:429
> >
> > Reported-by: Joe Stringer <joe@ovn.org>
> > Reported-at: http://openvswitch.org/pipermail/dev/2016-September/
> 079315.html
> > Reported-by: Alin Serdean <aserdean@cloudbasesolutions.com>
> > Reported-at: http://openvswitch.org/pipermail/dev/2016-September/
> 079586.html
> > Co-authored-by: Joe Stringer <joe@ovn.org>
> > Signed-off-by: Andy Zhou <azhou@ovn.org>
>
> Signed-off-by: Joe Stringer <joe@ovn.org>
>
> > ---
> >  ovsdb/monitor.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 40 insertions(+), 1 deletion(-)
> >
> > diff --git a/ovsdb/monitor.c b/ovsdb/monitor.c
> > index a590943..7e6ddcb 100644
> > --- a/ovsdb/monitor.c
> > +++ b/ovsdb/monitor.c
> > @@ -1252,7 +1252,46 @@ ovsdb_monitor_changes_update(const struct
> ovsdb_row *old,
> >          change->new = clone_monitor_row_data(mt, new);
> >      } else {
> >          if (new) {
> > -            update_monitor_row_data(mt, new, change->new);
> > +            if (!change->new) {
>
> Usually we do "if (positive condition) { ... } else { ... }", so
> readers don't have to double-negate in their mind to figure out what
> the "else" condition is.
>

I was trying to point out the "interesting" case first.
On the other hand, I won't object to flip them around, since the comment
section
is bigger than usual.

>
> I'll leave the review of the reasoning below to others.
Ben Pfaff Sept. 27, 2016, 3:30 a.m. UTC | #3
On Tue, Sep 20, 2016 at 01:27:08PM -0700, Andy Zhou wrote:
> The newly added replication logic makes it possible for a monitor to
> receive delete and insertion of the same row back to back, which
> was not possible before. Add logic (and comment) to handle this
> case to avoid follow crash reported by Valgrind:

On Tue, Sep 20, 2016 at 01:27:08PM -0700, Andy Zhou wrote:
> The newly added replication logic makes it possible for a monitor to
> receive delete and insertion of the same row back to back, which
> was not possible before. Add logic (and comment) to handle this
> case to avoid follow crash reported by Valgrind:

Thanks a lot.

Acked-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff Sept. 27, 2016, 3:31 a.m. UTC | #4
I'm really puzzled about how this is getting to the list.
dev@openvswitch.com bounces for me.  I had to change it to
dev@openvswitch.org to avoid that.  I see that openvswitch.com is
registered to the Linux Foundation though.
Andy Zhou Sept. 27, 2016, 6:04 p.m. UTC | #5
On Mon, Sep 26, 2016 at 8:30 PM, Ben Pfaff <blp@ovn.org> wrote:

> On Tue, Sep 20, 2016 at 01:27:08PM -0700, Andy Zhou wrote:
> > The newly added replication logic makes it possible for a monitor to
> > receive delete and insertion of the same row back to back, which
> > was not possible before. Add logic (and comment) to handle this
> > case to avoid follow crash reported by Valgrind:
>
> On Tue, Sep 20, 2016 at 01:27:08PM -0700, Andy Zhou wrote:
> > The newly added replication logic makes it possible for a monitor to
> > receive delete and insertion of the same row back to back, which
> > was not possible before. Add logic (and comment) to handle this
> > case to avoid follow crash reported by Valgrind:
>
> Thanks a lot.
>
> Acked-by: Ben Pfaff <blp@ovn.org>
>
Thanks for the review. Pushed to master and branch-2.6.
diff mbox

Patch

diff --git a/ovsdb/monitor.c b/ovsdb/monitor.c
index a590943..7e6ddcb 100644
--- a/ovsdb/monitor.c
+++ b/ovsdb/monitor.c
@@ -1252,7 +1252,46 @@  ovsdb_monitor_changes_update(const struct ovsdb_row *old,
         change->new = clone_monitor_row_data(mt, new);
     } else {
         if (new) {
-            update_monitor_row_data(mt, new, change->new);
+            if (!change->new) {
+                /* Reinsert the row that was just deleted.
+                 *
+                 * This path won't be hit without replication.  Whenever OVSDB
+                 * server inserts a new row, It always generates a new UUID
+                 * that is different from the row just deleted.
+                 *
+                 * With replication, this path can be hit in a corner
+                 * case when two OVSDB servers are set up to replicate
+                 * each other. Not that is a useful set up, but can
+                 * happen in practice.
+                 *
+                 * An example of how this path can be hit is documented below.
+                 * The details is not as important to the correctness of the
+                 * logic, but added here to convince ourselves that this path
+                 * can be hit.
+                 *
+                 * Imagine two OVSDB servers that replicates from each
+                 * other. For each replication session, there is a
+                 * corresponding monitor at the other end of the replication
+                 * JSONRPC connection.
+                 *
+                 * The events can lead to a back to back deletion and
+                 * insertion operation of the same row for the monitor of
+                 * the first server are:
+                 *
+                 * 1. A row is inserted in the first OVSDB server.
+                 * 2. The row is then replicated to the remote OVSDB server.
+                 * 3. The row is now  deleted by the local OVSDB server. This
+                 *    deletion operation is replicated to the local monitor
+                 *    of the OVSDB server.
+                 * 4. The monitor now receives the same row, as an insertion,
+                 *    from the replication server. Because of
+                 *    replication, the row carries the same UUID as the row
+                 *    just deleted.
+                 */
+                change->new = clone_monitor_row_data(mt, new);
+            } else {
+                update_monitor_row_data(mt, new, change->new);
+            }
         } else {
             free_monitor_row_data(mt, change->new);
             change->new = NULL;