Message ID | 1449191509-12824-1-git-send-email-joe@ovn.org |
---|---|
State | Accepted, archived |
Headers | show |
On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote: > Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we > would use a sequence to see if anything changed in NB database to > compute and notify the SB database, and vice versa. This logic got > dropped with the refactor, causing a testsuite failure in the ovn-sbctl > test. Reintroduce the IDL sequence number checking. > > Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_* > functions") > Suggested-by: Numan Siddique <nusiddiq@redhat.com> > Signed-off-by: Joe Stringer <joe@ovn.org> > --- > ovn/northd/ovn-northd.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > I have this problem with the testsuite. This patch fixes it for me. Thanks! Tested-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote: > Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we > would use a sequence to see if anything changed in NB database to > compute and notify the SB database, and vice versa. This logic got > dropped with the refactor, causing a testsuite failure in the ovn-sbctl > test. Reintroduce the IDL sequence number checking. > > Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_* > functions") > Suggested-by: Numan Siddique <nusiddiq@redhat.com> > Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
> On Dec 3, 2015, at 10:55 PM, Ben Pfaff <blp@ovn.org> wrote: > > On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote: >> Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we >> would use a sequence to see if anything changed in NB database to >> compute and notify the SB database, and vice versa. This logic got >> dropped with the refactor, causing a testsuite failure in the ovn-sbctl >> test. Reintroduce the IDL sequence number checking. >> >> Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_* >> functions") >> Suggested-by: Numan Siddique <nusiddiq@redhat.com> >> Signed-off-by: Joe Stringer <joe@ovn.org> > > Acked-by: Ben Pfaff <blp@ovn.org> I pushed this myself so we can branch for 2.5 with your Acked-by and Cascardo's Tested-by. --Justin
On 3 December 2015 at 23:09, Justin Pettit <jpettit@ovn.org> wrote: > >> On Dec 3, 2015, at 10:55 PM, Ben Pfaff <blp@ovn.org> wrote: >> >> On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote: >>> Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we >>> would use a sequence to see if anything changed in NB database to >>> compute and notify the SB database, and vice versa. This logic got >>> dropped with the refactor, causing a testsuite failure in the ovn-sbctl >>> test. Reintroduce the IDL sequence number checking. >>> >>> Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_* >>> functions") >>> Suggested-by: Numan Siddique <nusiddiq@redhat.com> >>> Signed-off-by: Joe Stringer <joe@ovn.org> >> >> Acked-by: Ben Pfaff <blp@ovn.org> > > I pushed this myself so we can branch for 2.5 with your Acked-by and Cascardo's Tested-by. Thanks.
On 12/04/2015 12:39 PM, Justin Pettit wrote: >> On Dec 3, 2015, at 10:55 PM, Ben Pfaff <blp@ovn.org> wrote: >> >> On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote: >>> Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we >>> would use a sequence to see if anything changed in NB database to >>> compute and notify the SB database, and vice versa. This logic got >>> dropped with the refactor, causing a testsuite failure in the ovn-sbctl >>> test. Reintroduce the IDL sequence number checking. >>> >>> Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_* >>> functions") >>> Suggested-by: Numan Siddique <nusiddiq@redhat.com> >>> Signed-off-by: Joe Stringer <joe@ovn.org> >> Acked-by: Ben Pfaff <blp@ovn.org> > I pushed this myself so we can branch for 2.5 with your Acked-by and Cascardo's Tested-by. Thanks Justin. I see the "send error: Broken pipe" warn logs [1] in the file - tests/testsuite.dir/1713/ovn-nb/ovsdb-server.log when I run the test case [ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR] at tests/ovn.at (Line no 842) and make it fail at the end by putting "AT_CHECK([echo hi], [1])" at line 1106 before clean up. I tested this with out the refactored ovn-northd main loop code and I still see the same logs in the ovsdb-server.log Not sure what is the cause of this. May be the issue is somewhere else. Also with this patch, I can see the issue [2] again. I guess its not a major issue. [1] - 2015-12-04T07:16:30.071Z|00001|vlog|INFO|opened log file /home/nsiddique/workspace_cpp/openvswitch/src/ovs_numan/tests/testsuite.dir/1713/ovn-nb/ovsdb-server.log 2015-12-04T07:16:30.076Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.4.90 2015-12-04T07:16:30.151Z|00003|jsonrpc|WARN|unix: send error: Broken pipe 2015-12-04T07:16:30.151Z|00004|reconnect|WARN|unix: connection dropped (Broken pipe) 2015-12-04T07:16:30.157Z|00005|jsonrpc|WARN|unix: send error: Broken pipe 2015-12-04T07:16:30.157Z|00006|reconnect|WARN|unix: connection dropped (Broken pipe) 2015-12-04T07:16:30.162Z|00007|jsonrpc|WARN|unix: send error: Broken pipe 2015-12-04T07:16:30.162Z|00008|reconnect|WARN|unix: connection dropped (Broken pipe) 2015-12-04T07:16:30.176Z|00009|jsonrpc|WARN|unix: send error: Broken pipe 2015-12-04T07:16:30.176Z|00010|reconnect|WARN|unix: connection dropped (Broken pipe) 2015-12-04T07:16:30.188Z|00011|jsonrpc|WARN|unix: send error: Broken pipe 2015-12-04T07:16:30.188Z|00012|reconnect|WARN|unix: connection dropped (Broken pipe) 2015-12-04T07:16:30.242Z|00013|reconnect|WARN|unix: connection dropped (Broken pipe) 2015-12-04T07:16:30.250Z|00014|reconnect|WARN|unix: connection dropped (Broken pipe) 2015-12-04T07:16:30.303Z|00015|reconnect|WARN|unix: connection dropped (Broken pipe) 2015-12-04T07:16:30.336Z|00016|reconnect|WARN|unix: connection dropped (Broken pipe) 2015-12-04T07:16:40.084Z|00017|memory|INFO|6084 kB peak resident set size after 10.0 seconds 2015-12-04T07:16:40.084Z|00018|memory|INFO|cells:528 monitors:1 sessions:1 2015-12-04T07:16:53.106Z|00019|fatal_signal|WARN|terminating with signal 15 (Terminated) [2] - http://openvswitch.org/pipermail/discuss/2015-November/019445.html Thanks Numan
On 3 December 2015 at 23:49, Numan Siddique <nusiddiq@redhat.com> wrote: > On 12/04/2015 12:39 PM, Justin Pettit wrote: >>> On Dec 3, 2015, at 10:55 PM, Ben Pfaff <blp@ovn.org> wrote: >>> >>> On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote: >>>> Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we >>>> would use a sequence to see if anything changed in NB database to >>>> compute and notify the SB database, and vice versa. This logic got >>>> dropped with the refactor, causing a testsuite failure in the ovn-sbctl >>>> test. Reintroduce the IDL sequence number checking. >>>> >>>> Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_* >>>> functions") >>>> Suggested-by: Numan Siddique <nusiddiq@redhat.com> >>>> Signed-off-by: Joe Stringer <joe@ovn.org> >>> Acked-by: Ben Pfaff <blp@ovn.org> >> I pushed this myself so we can branch for 2.5 with your Acked-by and Cascardo's Tested-by. > Thanks Justin. > > I see the "send error: Broken pipe" warn logs [1] in the file - tests/testsuite.dir/1713/ovn-nb/ovsdb-server.log when I run the test case > [ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR] at tests/ovn.at (Line no 842) and make it fail at the end by putting > "AT_CHECK([echo hi], [1])" at line 1106 before clean up. Thanks for the report. Does that mean that we're filtering the broken pipe errors out in that test case? I'd expect the test case to fail if these logs showed up, without needing someone to modify it. > I tested this with out the refactored ovn-northd main loop code and I still see the same logs in the ovsdb-server.log > Not sure what is the cause of this. May be the issue is somewhere else. It sounds like this patch just makes the bug less likely (perhaps by virtue of sending less transactions). > Also with this patch, I can see the issue [2] again. I guess its not a major issue. Are you able to tell us a bit more about your setup and why you're able to reproduce it? I have only been able to reproduce the broken pipe issue on build systems like travis where the underlying CPU resource is shared with other users, and I suspect they're using nested virtualization which can exacerbate certain types of failures. I had been assuming that this is related to why it fails.
On 12/04/2015 01:47 PM, Joe Stringer wrote: > On 3 December 2015 at 23:49, Numan Siddique <nusiddiq@redhat.com> wrote: >> On 12/04/2015 12:39 PM, Justin Pettit wrote: >>>> On Dec 3, 2015, at 10:55 PM, Ben Pfaff <blp@ovn.org> wrote: >>>> >>>> On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote: >>>>> Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we >>>>> would use a sequence to see if anything changed in NB database to >>>>> compute and notify the SB database, and vice versa. This logic got >>>>> dropped with the refactor, causing a testsuite failure in the ovn-sbctl >>>>> test. Reintroduce the IDL sequence number checking. >>>>> >>>>> Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_* >>>>> functions") >>>>> Suggested-by: Numan Siddique <nusiddiq@redhat.com> >>>>> Signed-off-by: Joe Stringer <joe@ovn.org> >>>> Acked-by: Ben Pfaff <blp@ovn.org> >>> I pushed this myself so we can branch for 2.5 with your Acked-by and Cascardo's Tested-by. >> Thanks Justin. >> >> I see the "send error: Broken pipe" warn logs [1] in the file - tests/testsuite.dir/1713/ovn-nb/ovsdb-server.log when I run the test case >> [ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR] at tests/ovn.at (Line no 842) and make it fail at the end by putting >> "AT_CHECK([echo hi], [1])" at line 1106 before clean up. > Thanks for the report. Does that mean that we're filtering the broken > pipe errors out in that test case? I'd expect the test case to fail if > these logs showed up, without needing someone to modify it. I don't think that test case is calling "AT_CHECK([check_logs ...])" to check the logs. Hence it is passing. >> I tested this with out the refactored ovn-northd main loop code and I still see the same logs in the ovsdb-server.log >> Not sure what is the cause of this. May be the issue is somewhere else. > It sounds like this patch just makes the bug less likely (perhaps by > virtue of sending less transactions). > >> Also with this patch, I can see the issue [2] again. I guess its not a major issue. > Are you able to tell us a bit more about your setup and why you're > able to reproduce it? I have a single node devstack setup running on my guest Fedora 22 libvirt VM (hosted on my Fedora 23 laptop). To reproduce the issue, I applied the /"ovn: support ARP response for known IPs" //patch from Han Zhou/. Please see the attached ovn-northd.c file at line 1149. These are the steps which I followed to reproduce the issue. 1. Deploy the devstack with networking-ovn 2. Apply the ARP patch, compile ovn-northd and restart it. 3. When I run the command "sudo ovn-sbctl dump-flows", I don't see any logical flows related to ARP. 4. If i run "neutron port-create private" or any other neutron command which updates the ovn-nb.db, I see the below ARP logical flows reflected when i dump the logical flows. ---- table=3( ls_in_l2_lkup), priority= 150, match=(arp.tpa == 10.0.0.1 && arp.op == 1), action=(eth.dst = eth.src; eth.src = fa:16:3e:d6:20:d7; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = fa:16:3e:d6:20:d7; arp.tpa = arp.spa; arp.spa = 10.0.0.1; outport = inport; inport = ""; /* Allow sending out inport. */ output;) table=3( ls_in_l2_lkup), priority= 150, match=(arp.tpa == 10.0.0.2 && arp.op == 1), action=(eth.dst = eth.src; eth.src = fa:16:3e:81:21:dc; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = fa:16:3e:81:21:dc; arp.tpa = arp.spa; arp.spa = 10.0.0.2; outport = inport; inport = ""; /* Allow sending out inport. */ output;) ---- 5. When I revert the changes and restart ovn-northd, I still see those ARP logical flows until the ovn-nb.db is updated. 6. Sometimes the issue is not seen, in which case I revert the ARP code in ovn-northd.c and try again until I see the issue. Please also see this - http://openvswitch.org/pipermail/discuss/2015-November/019444.html > I have only been able to reproduce the broken pipe issue on build > systems like travis where the underlying CPU resource is shared with > other users, and I suspect they're using nested virtualization which > can exacerbate certain types of failures. I had been assuming that > this is related to why it fails. I am able to reproduce the broken pip issue on my laptop running Fedora 23 by running "make check TESTSUITEFLAGS='1713'" where 1713 is the test case number for "[ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR]" Thanks Numan
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 01f289d1c746..7f3a92e3539d 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -1799,6 +1799,7 @@ int main(int argc, char *argv[]) { extern struct vlog_module VLM_reconnect; + unsigned int ovnnb_seqno, ovnsb_seqno; int res = EXIT_SUCCESS; struct unixctl_server *unixctl; int retval; @@ -1868,6 +1869,9 @@ main(int argc, char *argv[]) add_column_noalert(ovnsb_idl_loop.idl, &sbrec_port_binding_col_mac); ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_port_binding_col_chassis); + ovnnb_seqno = ovsdb_idl_get_seqno(ovnnb_idl_loop.idl); + ovnsb_seqno = ovsdb_idl_get_seqno(ovnsb_idl_loop.idl); + /* Main loop. */ exiting = false; while (!exiting) { @@ -1878,8 +1882,14 @@ main(int argc, char *argv[]) .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop), }; - ovnnb_db_run(&ctx); - ovnsb_db_run(&ctx); + if (ovnnb_seqno != ovsdb_idl_get_seqno(ctx.ovnnb_idl)) { + ovnnb_seqno = ovsdb_idl_get_seqno(ctx.ovnnb_idl); + ovnnb_db_run(&ctx); + } + if (ovnsb_seqno != ovsdb_idl_get_seqno(ctx.ovnsb_idl)) { + ovnsb_seqno = ovsdb_idl_get_seqno(ctx.ovnsb_idl); + ovnsb_db_run(&ctx); + } unixctl_server_run(unixctl); unixctl_server_wait(unixctl);
Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we would use a sequence to see if anything changed in NB database to compute and notify the SB database, and vice versa. This logic got dropped with the refactor, causing a testsuite failure in the ovn-sbctl test. Reintroduce the IDL sequence number checking. Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_* functions") Suggested-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Joe Stringer <joe@ovn.org> --- ovn/northd/ovn-northd.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)