Message ID | 20160314161129.GH17227@ovn.org |
---|---|
State | Not Applicable |
Headers | show |
Ben Pfaff <blp@ovn.org> wrote on 03/14/2016 11:11:29 AM: > From: Ben Pfaff <blp@ovn.org> > To: Ryan Moats/Omaha/IBM@IBMUS > Cc: dev@openvswitch.org > Date: 03/14/2016 11:11 AM > Subject: Re: [ovs-dev] What looks like a bug in OVSDB change seqno tracking > > On Tue, Mar 08, 2016 at 11:01:21PM -0600, Ryan Moats wrote: > > > > > > > > Ben Pfaff <blp@ovn.org> wrote on 03/08/2016 10:07:19 PM: > > > > > From: Ben Pfaff <blp@ovn.org> > > > To: Ryan Moats/Omaha/IBM@IBMUS > > > Cc: dev@openvswitch.org > > > Date: 03/08/2016 10:07 PM > > > Subject: Re: [ovs-dev] What looks like a bug in OVSDB change seqno > > tracking > > > > > > On Tue, Mar 08, 2016 at 08:35:10PM -0600, Ryan Moats wrote: > > > > While working on finishing up the next version of the incremental flow > > > > processing patch, I stubbed > > > > my toe on a situation where the port column of rows in the multicast > > group > > > > table of ovnsb were > > > > being updated without the row's change seqno being updated. > > > > > > > > A quick look at the code base shows that weak references (like the port > > > > column) are updated via > > > > calls to ovsdb_txn_row_modify while row change seqnos are updated via > > calls > > > > to ovsdb_idl_row_* > > > > and the two code paths don't appear to intersect. > > > > > > ovsdb_txn_row_modify() is part of ovsdb-server, and ovsdb_idl_row_*() is > > > part of ovsdb-client, so they don't intersect per se. > > > > > > > While I consider such behavior a bug (because it means change seqnos > > can't > > > > be trusted), I wanted > > > > to ping the list to confirm my cursory analysis before delving in to > > trying > > > > to find a way to address > > > > this. > > > > > > Can you explain how to reproduce the issue? > > > > > > > This gets a little complex, but I'll give it a try: > > > > Preparation: > > > > (1) Turn on change tracking for the entire SB database in > > ovn/controller/ovn-controller.c > > (2) In ovn/controller/physical.c, change SBREC_MULTICAST_GROUP_FOR_EACH to > > SBREC_MULTICAST_GROUP_FOR_EACH_TRACKED, lookup the seqnos for each row and > > dump them to the log > > (3) In ovn/controller/ofctrl.c, change ovn_flow_log to always log messages > > and add a line to log the flow that is passed to ofctrl_add_flow > > > > [If need be, I can see about extracting out a patch set that will make all > > of the above changes] > > > > Now, run ovn E2E test case with 3 HVs, 1 VIFs/HV, 1 GW, 1LS (case 2014 in > > my > > setup) - it should pass, but if one looks at the output of > > hv1/ovn-controller.log, one should see the following signatures: > > > > Initially, the first multicast row will produce: > > > > 2016-03-09T04:38:17.721Z|00105|physical|INFO|multicast row with 0 0 2 > > 2016-03-09T04:38:17.721Z|00106|ofctrl|INFO|considering flow: table_id=33, > > priority=100, reg7=0xffff,metadata=0x1, actions=set_field:0x1-> > > reg5,set_field:0x1->reg7,resubmit(,34),set_field:0xffff->reg7 > > > > After the first tunnel port appears, the signature changes to: > > > > 2016-03-09T04:38:17.799Z|00150|physical|INFO|multicast row with 0 0 2 > > 2016-03-09T04:38:17.799Z|00151|ofctrl|INFO|considering flow: table_id=33, > > priority=100, reg7=0xffff,metadata=0x1, actions=set_field:0x1-> > > reg5,set_field:0x1->reg7,resubmit(,34),set_field:0xffff->reg7 > > 2016-03-09T04:38:17.799Z|00152|ofctrl|INFO|considering flow: table_id=32, > > priority=100, reg7=0xffff,metadata=0x1, actions=set_field:0x1/0xffffff-> > > tun_id,set_field:0xffff/0xffffffff->tun_metadata0,move:NXM_NX_REG6 [0..14]-> > > NXM_NX_TUN_METADATA0[16..30],output:2,resubmit(,33) > > > > After the second tunnel port appears, the signature changes to: > > > > 2016-03-09T04:38:20.028Z|00203|physical|INFO|multicast row with 0 0 2 > > 2016-03-09T04:38:20.028Z|00204|ofctrl|INFO|considering flow: table_id=33, > > priority=100, reg7=0xffff,metadata=0x1, actions=set_field:0x1-> > > reg5,set_field:0x1->reg7,resubmit(,34),set_field:0xffff->reg7 > > 2016-03-09T04:38:20.028Z|00205|ofctrl|INFO|considering flow: table_id=32, > > priority=100, reg7=0xffff,metadata=0x1, actions=set_field:0x1/0xffffff-> > > tun_id,output:3,set_field:0x1/0xffffff-> > > tun_id,set_field:0xffff/0xffffffff->tun_metadata0,move:NXM_NX_REG6 [0..14]-> > > NXM_NX_TUN_METADATA0[16..30],output:2,resubmit(,33) > > > > Looking at the multicast code, it processes all the values in the > > ports column of the multicast group row. The above signature says > > to me that the the values in this column have changed, but the > > change seqnos for that row didn't change (specifically the mod_seqno > > was never set), which leads me to the conclusion that these seqnos > > can't be trusted to reveal when rows of a table change... > > > > I'm hoping for a different interpretation... > > As far as I can tell those rows get added once and never change, so that > the change_seqnos are correct. I suspect that the problem here is that > your instructions don't ever call ovsdb_idl_track_clear(). > > I'm appending the diff that I used. > No, that wasn't it (I pulled that statement from your patch and it still worked the same way). I've dug into the problem some more once I've run it down, I'll put out an updated incremental patch set tomorrow that will include multicast groups in what is processed incrementally... Ryan (regXboi)
diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c index 3297fb3..791175f 100644 --- a/ovn/controller/ofctrl.c +++ b/ovn/controller/ofctrl.c @@ -1,4 +1,4 @@ -/* Copyright (c) 2015 Nicira, Inc. +/* Copyright (c) 2015, 2016 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -484,6 +484,8 @@ ofctrl_add_flow(struct hmap *desired_flows, f->ofpacts_len = actions->size; f->hmap_node.hash = ovn_flow_hash(f); + ovn_flow_log(f, "considering flow"); + if (ovn_flow_lookup(desired_flows, f)) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); if (!VLOG_DROP_INFO(&rl)) { diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c index 5852e35..bd27ab2 100644 --- a/ovn/controller/ovn-controller.c +++ b/ovn/controller/ovn-controller.c @@ -259,6 +259,7 @@ main(int argc, char *argv[]) char *ovnsb_remote = get_ovnsb_remote(ovs_idl_loop.idl); struct ovsdb_idl_loop ovnsb_idl_loop = OVSDB_IDL_LOOP_INITIALIZER( ovsdb_idl_create(ovnsb_remote, &sbrec_idl_class, true, true)); + ovsdb_idl_track_add_all(ovnsb_idl_loop.idl); ovsdb_idl_get_initial_snapshot(ovnsb_idl_loop.idl); /* Initialize connection tracking zones. */ @@ -338,6 +339,7 @@ main(int argc, char *argv[]) } ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop); ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); + ovsdb_idl_track_clear(ovnsb_idl_loop.idl); poll_block(); if (should_service_stop()) { exiting = true; diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c index 657c3e2..35ca727 100644 --- a/ovn/controller/physical.c +++ b/ovn/controller/physical.c @@ -1,4 +1,4 @@ -/* Copyright (c) 2015 Nicira, Inc. +/* Copyright (c) 2015, 2016 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -487,6 +487,15 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve, const struct sbrec_multicast_group *mc; struct ofpbuf remote_ofpacts; ofpbuf_init(&remote_ofpacts, 0); + SBREC_MULTICAST_GROUP_FOR_EACH_TRACKED (mc, ctx->ovnsb_idl) { + struct ds s = DS_EMPTY_INITIALIZER; + for (int i = 0; i < OVSDB_IDL_CHANGE_MAX; i++) { + ds_put_format(&s, " %u", mc->header_.change_seqno[i]); + } + VLOG_INFO("multicast row %x, n_ports=%u, with%s", mc->header_.uuid.parts[0], mc->n_ports, ds_cstr(&s)); + ds_destroy(&s); + + } SBREC_MULTICAST_GROUP_FOR_EACH (mc, ctx->ovnsb_idl) { struct sset remote_chassis = SSET_INITIALIZER(&remote_chassis); struct match match;