diff mbox

[ovs-dev] What looks like a bug in OVSDB change seqno tracking

Message ID 20160314161129.GH17227@ovn.org
State Not Applicable
Headers show

Commit Message

Ben Pfaff March 14, 2016, 4:11 p.m. UTC
On Tue, Mar 08, 2016 at 11:01:21PM -0600, Ryan Moats wrote:
> 
> 
> 
> Ben Pfaff <blp@ovn.org> wrote on 03/08/2016 10:07:19 PM:
> 
> > From: Ben Pfaff <blp@ovn.org>
> > To: Ryan Moats/Omaha/IBM@IBMUS
> > Cc: dev@openvswitch.org
> > Date: 03/08/2016 10:07 PM
> > Subject: Re: [ovs-dev] What looks like a bug in OVSDB change seqno
> tracking
> >
> > On Tue, Mar 08, 2016 at 08:35:10PM -0600, Ryan Moats wrote:
> > > While working on finishing up the next version of the incremental flow
> > > processing patch, I stubbed
> > > my toe on a situation where the port column of rows in the multicast
> group
> > > table of ovnsb were
> > > being updated without the row's change seqno being updated.
> > >
> > > A quick look at the code base shows that weak references (like the port
> > > column) are updated via
> > > calls to ovsdb_txn_row_modify while row change seqnos are updated via
> calls
> > > to ovsdb_idl_row_*
> > > and the two code paths don't appear to intersect.
> >
> > ovsdb_txn_row_modify() is part of ovsdb-server, and ovsdb_idl_row_*() is
> > part of ovsdb-client, so they don't intersect per se.
> >
> > > While I consider such behavior a bug (because it means change seqnos
> can't
> > > be trusted), I wanted
> > > to ping the list to confirm my cursory analysis before delving in to
> trying
> > > to find a way to address
> > > this.
> >
> > Can you explain how to reproduce the issue?
> >
> 
> This gets a little complex, but I'll give it a try:
> 
> Preparation:
> 
> (1) Turn on change tracking for the entire SB database in
> ovn/controller/ovn-controller.c
> (2) In ovn/controller/physical.c, change SBREC_MULTICAST_GROUP_FOR_EACH to
> SBREC_MULTICAST_GROUP_FOR_EACH_TRACKED, lookup the seqnos for each row and
> dump them to the log
> (3) In ovn/controller/ofctrl.c, change ovn_flow_log to always log messages
> and add a line to log the flow that is passed to ofctrl_add_flow
> 
> [If need be, I can see about extracting out a patch set that will make all
> of the above changes]
> 
> Now, run ovn E2E test case with 3 HVs, 1 VIFs/HV, 1 GW, 1LS (case 2014 in
> my
> setup) - it should pass, but if one looks at the output of
> hv1/ovn-controller.log, one should see the following signatures:
> 
> Initially, the first multicast row will produce:
> 
> 2016-03-09T04:38:17.721Z|00105|physical|INFO|multicast row with 0 0 2
> 2016-03-09T04:38:17.721Z|00106|ofctrl|INFO|considering flow: table_id=33,
> priority=100, reg7=0xffff,metadata=0x1, actions=set_field:0x1->
> reg5,set_field:0x1->reg7,resubmit(,34),set_field:0xffff->reg7
> 
> After the first tunnel port appears, the signature changes to:
> 
> 2016-03-09T04:38:17.799Z|00150|physical|INFO|multicast row with 0 0 2
> 2016-03-09T04:38:17.799Z|00151|ofctrl|INFO|considering flow: table_id=33,
> priority=100, reg7=0xffff,metadata=0x1, actions=set_field:0x1->
> reg5,set_field:0x1->reg7,resubmit(,34),set_field:0xffff->reg7
> 2016-03-09T04:38:17.799Z|00152|ofctrl|INFO|considering flow: table_id=32,
> priority=100, reg7=0xffff,metadata=0x1, actions=set_field:0x1/0xffffff->
> tun_id,set_field:0xffff/0xffffffff->tun_metadata0,move:NXM_NX_REG6[0..14]->
> NXM_NX_TUN_METADATA0[16..30],output:2,resubmit(,33)
> 
> After the second tunnel port appears, the signature changes to:
> 
> 2016-03-09T04:38:20.028Z|00203|physical|INFO|multicast row with 0 0 2
> 2016-03-09T04:38:20.028Z|00204|ofctrl|INFO|considering flow: table_id=33,
> priority=100, reg7=0xffff,metadata=0x1, actions=set_field:0x1->
> reg5,set_field:0x1->reg7,resubmit(,34),set_field:0xffff->reg7
> 2016-03-09T04:38:20.028Z|00205|ofctrl|INFO|considering flow: table_id=32,
> priority=100, reg7=0xffff,metadata=0x1, actions=set_field:0x1/0xffffff->
> tun_id,output:3,set_field:0x1/0xffffff->
> tun_id,set_field:0xffff/0xffffffff->tun_metadata0,move:NXM_NX_REG6[0..14]->
> NXM_NX_TUN_METADATA0[16..30],output:2,resubmit(,33)
> 
> Looking at the multicast code, it processes all the values in the
> ports column of the multicast group row.  The above signature says
> to me that the the values in this column have changed, but the
> change seqnos for that row didn't change (specifically the mod_seqno
> was never set), which leads me to the conclusion that these seqnos
> can't be trusted to reveal when rows of a table change...
> 
> I'm hoping for a different interpretation...

As far as I can tell those rows get added once and never change, so that
the change_seqnos are correct.  I suspect that the problem here is that
your instructions don't ever call ovsdb_idl_track_clear().

I'm appending the diff that I used.

--8<--------------------------cut here-------------------------->8--

Comments

Ryan Moats March 16, 2016, 9:51 p.m. UTC | #1
Ben Pfaff <blp@ovn.org> wrote on 03/14/2016 11:11:29 AM:

> From: Ben Pfaff <blp@ovn.org>
> To: Ryan Moats/Omaha/IBM@IBMUS
> Cc: dev@openvswitch.org
> Date: 03/14/2016 11:11 AM
> Subject: Re: [ovs-dev] What looks like a bug in OVSDB change seqno
tracking
>
> On Tue, Mar 08, 2016 at 11:01:21PM -0600, Ryan Moats wrote:
> >
> >
> >
> > Ben Pfaff <blp@ovn.org> wrote on 03/08/2016 10:07:19 PM:
> >
> > > From: Ben Pfaff <blp@ovn.org>
> > > To: Ryan Moats/Omaha/IBM@IBMUS
> > > Cc: dev@openvswitch.org
> > > Date: 03/08/2016 10:07 PM
> > > Subject: Re: [ovs-dev] What looks like a bug in OVSDB change seqno
> > tracking
> > >
> > > On Tue, Mar 08, 2016 at 08:35:10PM -0600, Ryan Moats wrote:
> > > > While working on finishing up the next version of the incremental
flow
> > > > processing patch, I stubbed
> > > > my toe on a situation where the port column of rows in the
multicast
> > group
> > > > table of ovnsb were
> > > > being updated without the row's change seqno being updated.
> > > >
> > > > A quick look at the code base shows that weak references (like the
port
> > > > column) are updated via
> > > > calls to ovsdb_txn_row_modify while row change seqnos are updated
via
> > calls
> > > > to ovsdb_idl_row_*
> > > > and the two code paths don't appear to intersect.
> > >
> > > ovsdb_txn_row_modify() is part of ovsdb-server, and ovsdb_idl_row_*()
is
> > > part of ovsdb-client, so they don't intersect per se.
> > >
> > > > While I consider such behavior a bug (because it means change
seqnos
> > can't
> > > > be trusted), I wanted
> > > > to ping the list to confirm my cursory analysis before delving in
to
> > trying
> > > > to find a way to address
> > > > this.
> > >
> > > Can you explain how to reproduce the issue?
> > >
> >
> > This gets a little complex, but I'll give it a try:
> >
> > Preparation:
> >
> > (1) Turn on change tracking for the entire SB database in
> > ovn/controller/ovn-controller.c
> > (2) In ovn/controller/physical.c, change SBREC_MULTICAST_GROUP_FOR_EACH
to
> > SBREC_MULTICAST_GROUP_FOR_EACH_TRACKED, lookup the seqnos for each row
and
> > dump them to the log
> > (3) In ovn/controller/ofctrl.c, change ovn_flow_log to always log
messages
> > and add a line to log the flow that is passed to ofctrl_add_flow
> >
> > [If need be, I can see about extracting out a patch set that will make
all
> > of the above changes]
> >
> > Now, run ovn E2E test case with 3 HVs, 1 VIFs/HV, 1 GW, 1LS (case 2014
in
> > my
> > setup) - it should pass, but if one looks at the output of
> > hv1/ovn-controller.log, one should see the following signatures:
> >
> > Initially, the first multicast row will produce:
> >
> > 2016-03-09T04:38:17.721Z|00105|physical|INFO|multicast row with 0 0 2
> > 2016-03-09T04:38:17.721Z|00106|ofctrl|INFO|considering flow:
table_id=33,
> > priority=100, reg7=0xffff,metadata=0x1, actions=set_field:0x1->
> > reg5,set_field:0x1->reg7,resubmit(,34),set_field:0xffff->reg7
> >
> > After the first tunnel port appears, the signature changes to:
> >
> > 2016-03-09T04:38:17.799Z|00150|physical|INFO|multicast row with 0 0 2
> > 2016-03-09T04:38:17.799Z|00151|ofctrl|INFO|considering flow:
table_id=33,
> > priority=100, reg7=0xffff,metadata=0x1, actions=set_field:0x1->
> > reg5,set_field:0x1->reg7,resubmit(,34),set_field:0xffff->reg7
> > 2016-03-09T04:38:17.799Z|00152|ofctrl|INFO|considering flow:
table_id=32,
> > priority=100, reg7=0xffff,metadata=0x1,
actions=set_field:0x1/0xffffff->
> > tun_id,set_field:0xffff/0xffffffff->tun_metadata0,move:NXM_NX_REG6
[0..14]->
> > NXM_NX_TUN_METADATA0[16..30],output:2,resubmit(,33)
> >
> > After the second tunnel port appears, the signature changes to:
> >
> > 2016-03-09T04:38:20.028Z|00203|physical|INFO|multicast row with 0 0 2
> > 2016-03-09T04:38:20.028Z|00204|ofctrl|INFO|considering flow:
table_id=33,
> > priority=100, reg7=0xffff,metadata=0x1, actions=set_field:0x1->
> > reg5,set_field:0x1->reg7,resubmit(,34),set_field:0xffff->reg7
> > 2016-03-09T04:38:20.028Z|00205|ofctrl|INFO|considering flow:
table_id=32,
> > priority=100, reg7=0xffff,metadata=0x1,
actions=set_field:0x1/0xffffff->
> > tun_id,output:3,set_field:0x1/0xffffff->
> > tun_id,set_field:0xffff/0xffffffff->tun_metadata0,move:NXM_NX_REG6
[0..14]->
> > NXM_NX_TUN_METADATA0[16..30],output:2,resubmit(,33)
> >
> > Looking at the multicast code, it processes all the values in the
> > ports column of the multicast group row.  The above signature says
> > to me that the the values in this column have changed, but the
> > change seqnos for that row didn't change (specifically the mod_seqno
> > was never set), which leads me to the conclusion that these seqnos
> > can't be trusted to reveal when rows of a table change...
> >
> > I'm hoping for a different interpretation...
>
> As far as I can tell those rows get added once and never change, so that
> the change_seqnos are correct.  I suspect that the problem here is that
> your instructions don't ever call ovsdb_idl_track_clear().
>
> I'm appending the diff that I used.
>

No, that wasn't it (I pulled that statement from your patch and it still
worked the same way).  I've dug into the problem some more once I've run it
down, I'll
put out an updated incremental patch set tomorrow that will include
multicast
groups in what is processed incrementally...

Ryan (regXboi)
diff mbox

Patch

diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
index 3297fb3..791175f 100644
--- a/ovn/controller/ofctrl.c
+++ b/ovn/controller/ofctrl.c
@@ -1,4 +1,4 @@ 
-/* Copyright (c) 2015 Nicira, Inc.
+/* Copyright (c) 2015, 2016 Nicira, Inc.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -484,6 +484,8 @@  ofctrl_add_flow(struct hmap *desired_flows,
     f->ofpacts_len = actions->size;
     f->hmap_node.hash = ovn_flow_hash(f);
 
+    ovn_flow_log(f, "considering flow");
+
     if (ovn_flow_lookup(desired_flows, f)) {
         static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5);
         if (!VLOG_DROP_INFO(&rl)) {
diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index 5852e35..bd27ab2 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -259,6 +259,7 @@  main(int argc, char *argv[])
     char *ovnsb_remote = get_ovnsb_remote(ovs_idl_loop.idl);
     struct ovsdb_idl_loop ovnsb_idl_loop = OVSDB_IDL_LOOP_INITIALIZER(
         ovsdb_idl_create(ovnsb_remote, &sbrec_idl_class, true, true));
+    ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);
     ovsdb_idl_get_initial_snapshot(ovnsb_idl_loop.idl);
 
     /* Initialize connection tracking zones. */
@@ -338,6 +339,7 @@  main(int argc, char *argv[])
         }
         ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
         ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop);
+        ovsdb_idl_track_clear(ovnsb_idl_loop.idl);
         poll_block();
         if (should_service_stop()) {
             exiting = true;
diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
index 657c3e2..35ca727 100644
--- a/ovn/controller/physical.c
+++ b/ovn/controller/physical.c
@@ -1,4 +1,4 @@ 
-/* Copyright (c) 2015 Nicira, Inc.
+/* Copyright (c) 2015, 2016 Nicira, Inc.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -487,6 +487,15 @@  physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
     const struct sbrec_multicast_group *mc;
     struct ofpbuf remote_ofpacts;
     ofpbuf_init(&remote_ofpacts, 0);
+    SBREC_MULTICAST_GROUP_FOR_EACH_TRACKED (mc, ctx->ovnsb_idl) {
+        struct ds s = DS_EMPTY_INITIALIZER;
+        for (int i = 0; i < OVSDB_IDL_CHANGE_MAX; i++) {
+            ds_put_format(&s, " %u", mc->header_.change_seqno[i]);
+        }
+        VLOG_INFO("multicast row %x, n_ports=%u, with%s", mc->header_.uuid.parts[0], mc->n_ports, ds_cstr(&s));
+        ds_destroy(&s);
+
+    }
     SBREC_MULTICAST_GROUP_FOR_EACH (mc, ctx->ovnsb_idl) {
         struct sset remote_chassis = SSET_INITIALIZER(&remote_chassis);
         struct match match;