diff mbox

[ovs-dev] ovn-northd: Only run idl loop if something changed.

Message ID 1449191509-12824-1-git-send-email-joe@ovn.org
State Accepted, archived
Headers show

Commit Message

Joe Stringer Dec. 4, 2015, 1:11 a.m. UTC
Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we
would use a sequence to see if anything changed in NB database to
compute and notify the SB database, and vice versa. This logic got
dropped with the refactor, causing a testsuite failure in the ovn-sbctl
test. Reintroduce the IDL sequence number checking.

Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_*
functions")
Suggested-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
---
 ovn/northd/ovn-northd.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Thadeu Lima de Souza Cascardo Dec. 4, 2015, 1:41 a.m. UTC | #1
On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote:
> Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we
> would use a sequence to see if anything changed in NB database to
> compute and notify the SB database, and vice versa. This logic got
> dropped with the refactor, causing a testsuite failure in the ovn-sbctl
> test. Reintroduce the IDL sequence number checking.
> 
> Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_*
> functions")
> Suggested-by: Numan Siddique <nusiddiq@redhat.com>
> Signed-off-by: Joe Stringer <joe@ovn.org>
> ---
>  ovn/northd/ovn-northd.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 

I have this problem with the testsuite. This patch fixes it for me. Thanks!

Tested-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Ben Pfaff Dec. 4, 2015, 6:55 a.m. UTC | #2
On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote:
> Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we
> would use a sequence to see if anything changed in NB database to
> compute and notify the SB database, and vice versa. This logic got
> dropped with the refactor, causing a testsuite failure in the ovn-sbctl
> test. Reintroduce the IDL sequence number checking.
> 
> Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_*
> functions")
> Suggested-by: Numan Siddique <nusiddiq@redhat.com>
> Signed-off-by: Joe Stringer <joe@ovn.org>

Acked-by: Ben Pfaff <blp@ovn.org>
Justin Pettit Dec. 4, 2015, 7:09 a.m. UTC | #3
> On Dec 3, 2015, at 10:55 PM, Ben Pfaff <blp@ovn.org> wrote:
> 
> On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote:
>> Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we
>> would use a sequence to see if anything changed in NB database to
>> compute and notify the SB database, and vice versa. This logic got
>> dropped with the refactor, causing a testsuite failure in the ovn-sbctl
>> test. Reintroduce the IDL sequence number checking.
>> 
>> Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_*
>> functions")
>> Suggested-by: Numan Siddique <nusiddiq@redhat.com>
>> Signed-off-by: Joe Stringer <joe@ovn.org>
> 
> Acked-by: Ben Pfaff <blp@ovn.org>

I pushed this myself so we can branch for 2.5 with your Acked-by and Cascardo's Tested-by.

--Justin
Joe Stringer Dec. 4, 2015, 7:24 a.m. UTC | #4
On 3 December 2015 at 23:09, Justin Pettit <jpettit@ovn.org> wrote:
>
>> On Dec 3, 2015, at 10:55 PM, Ben Pfaff <blp@ovn.org> wrote:
>>
>> On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote:
>>> Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we
>>> would use a sequence to see if anything changed in NB database to
>>> compute and notify the SB database, and vice versa. This logic got
>>> dropped with the refactor, causing a testsuite failure in the ovn-sbctl
>>> test. Reintroduce the IDL sequence number checking.
>>>
>>> Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_*
>>> functions")
>>> Suggested-by: Numan Siddique <nusiddiq@redhat.com>
>>> Signed-off-by: Joe Stringer <joe@ovn.org>
>>
>> Acked-by: Ben Pfaff <blp@ovn.org>
>
> I pushed this myself so we can branch for 2.5 with your Acked-by and Cascardo's Tested-by.

Thanks.
Numan Siddique Dec. 4, 2015, 7:49 a.m. UTC | #5
On 12/04/2015 12:39 PM, Justin Pettit wrote:
>> On Dec 3, 2015, at 10:55 PM, Ben Pfaff <blp@ovn.org> wrote:
>>
>> On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote:
>>> Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we
>>> would use a sequence to see if anything changed in NB database to
>>> compute and notify the SB database, and vice versa. This logic got
>>> dropped with the refactor, causing a testsuite failure in the ovn-sbctl
>>> test. Reintroduce the IDL sequence number checking.
>>>
>>> Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_*
>>> functions")
>>> Suggested-by: Numan Siddique <nusiddiq@redhat.com>
>>> Signed-off-by: Joe Stringer <joe@ovn.org>
>> Acked-by: Ben Pfaff <blp@ovn.org>
> I pushed this myself so we can branch for 2.5 with your Acked-by and Cascardo's Tested-by.
Thanks Justin.

I see the "send error: Broken pipe" warn logs [1] in the file - tests/testsuite.dir/1713/ovn-nb/ovsdb-server.log when I run the test case
[ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR] at tests/ovn.at (Line no 842) and make it fail at the end by putting
"AT_CHECK([echo hi], [1])" at line 1106 before clean up.

I tested this with out the refactored ovn-northd main loop code and I still see the same logs in the ovsdb-server.log
Not sure what is the cause of this. May be the issue is somewhere else.

Also with this patch, I can see the issue [2] again. I guess its not a major issue.

[1] -
2015-12-04T07:16:30.071Z|00001|vlog|INFO|opened log file /home/nsiddique/workspace_cpp/openvswitch/src/ovs_numan/tests/testsuite.dir/1713/ovn-nb/ovsdb-server.log
2015-12-04T07:16:30.076Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.4.90
2015-12-04T07:16:30.151Z|00003|jsonrpc|WARN|unix: send error: Broken pipe
2015-12-04T07:16:30.151Z|00004|reconnect|WARN|unix: connection dropped (Broken pipe)
2015-12-04T07:16:30.157Z|00005|jsonrpc|WARN|unix: send error: Broken pipe
2015-12-04T07:16:30.157Z|00006|reconnect|WARN|unix: connection dropped (Broken pipe)
2015-12-04T07:16:30.162Z|00007|jsonrpc|WARN|unix: send error: Broken pipe
2015-12-04T07:16:30.162Z|00008|reconnect|WARN|unix: connection dropped (Broken pipe)
2015-12-04T07:16:30.176Z|00009|jsonrpc|WARN|unix: send error: Broken pipe
2015-12-04T07:16:30.176Z|00010|reconnect|WARN|unix: connection dropped (Broken pipe)
2015-12-04T07:16:30.188Z|00011|jsonrpc|WARN|unix: send error: Broken pipe
2015-12-04T07:16:30.188Z|00012|reconnect|WARN|unix: connection dropped (Broken pipe)
2015-12-04T07:16:30.242Z|00013|reconnect|WARN|unix: connection dropped (Broken pipe)
2015-12-04T07:16:30.250Z|00014|reconnect|WARN|unix: connection dropped (Broken pipe)
2015-12-04T07:16:30.303Z|00015|reconnect|WARN|unix: connection dropped (Broken pipe)
2015-12-04T07:16:30.336Z|00016|reconnect|WARN|unix: connection dropped (Broken pipe)
2015-12-04T07:16:40.084Z|00017|memory|INFO|6084 kB peak resident set size after 10.0 seconds
2015-12-04T07:16:40.084Z|00018|memory|INFO|cells:528 monitors:1 sessions:1
2015-12-04T07:16:53.106Z|00019|fatal_signal|WARN|terminating with signal 15 (Terminated)


[2] - http://openvswitch.org/pipermail/discuss/2015-November/019445.html

Thanks
Numan
Joe Stringer Dec. 4, 2015, 8:17 a.m. UTC | #6
On 3 December 2015 at 23:49, Numan Siddique <nusiddiq@redhat.com> wrote:
> On 12/04/2015 12:39 PM, Justin Pettit wrote:
>>> On Dec 3, 2015, at 10:55 PM, Ben Pfaff <blp@ovn.org> wrote:
>>>
>>> On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote:
>>>> Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we
>>>> would use a sequence to see if anything changed in NB database to
>>>> compute and notify the SB database, and vice versa. This logic got
>>>> dropped with the refactor, causing a testsuite failure in the ovn-sbctl
>>>> test. Reintroduce the IDL sequence number checking.
>>>>
>>>> Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_*
>>>> functions")
>>>> Suggested-by: Numan Siddique <nusiddiq@redhat.com>
>>>> Signed-off-by: Joe Stringer <joe@ovn.org>
>>> Acked-by: Ben Pfaff <blp@ovn.org>
>> I pushed this myself so we can branch for 2.5 with your Acked-by and Cascardo's Tested-by.
> Thanks Justin.
>
> I see the "send error: Broken pipe" warn logs [1] in the file - tests/testsuite.dir/1713/ovn-nb/ovsdb-server.log when I run the test case
> [ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR] at tests/ovn.at (Line no 842) and make it fail at the end by putting
> "AT_CHECK([echo hi], [1])" at line 1106 before clean up.

Thanks for the report. Does that mean that we're filtering the broken
pipe errors out in that test case? I'd expect the test case to fail if
these logs showed up, without needing someone to modify it.

> I tested this with out the refactored ovn-northd main loop code and I still see the same logs in the ovsdb-server.log
> Not sure what is the cause of this. May be the issue is somewhere else.

It sounds like this patch just makes the bug less likely (perhaps by
virtue of sending less transactions).

> Also with this patch, I can see the issue [2] again. I guess its not a major issue.

Are you able to tell us a bit more about your setup and why you're
able to reproduce it?

I have only been able to reproduce the broken pipe issue on build
systems like travis where the underlying CPU resource is shared with
other users, and I suspect they're using nested virtualization which
can exacerbate certain types of failures. I had been assuming that
this is related to why it fails.
Numan Siddique Dec. 4, 2015, 2:22 p.m. UTC | #7
On 12/04/2015 01:47 PM, Joe Stringer wrote:
> On 3 December 2015 at 23:49, Numan Siddique <nusiddiq@redhat.com> wrote:
>> On 12/04/2015 12:39 PM, Justin Pettit wrote:
>>>> On Dec 3, 2015, at 10:55 PM, Ben Pfaff <blp@ovn.org> wrote:
>>>>
>>>> On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote:
>>>>> Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we
>>>>> would use a sequence to see if anything changed in NB database to
>>>>> compute and notify the SB database, and vice versa. This logic got
>>>>> dropped with the refactor, causing a testsuite failure in the ovn-sbctl
>>>>> test. Reintroduce the IDL sequence number checking.
>>>>>
>>>>> Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_*
>>>>> functions")
>>>>> Suggested-by: Numan Siddique <nusiddiq@redhat.com>
>>>>> Signed-off-by: Joe Stringer <joe@ovn.org>
>>>> Acked-by: Ben Pfaff <blp@ovn.org>
>>> I pushed this myself so we can branch for 2.5 with your Acked-by and Cascardo's Tested-by.
>> Thanks Justin.
>>
>> I see the "send error: Broken pipe" warn logs [1] in the file - tests/testsuite.dir/1713/ovn-nb/ovsdb-server.log when I run the test case
>> [ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR] at tests/ovn.at (Line no 842) and make it fail at the end by putting
>> "AT_CHECK([echo hi], [1])" at line 1106 before clean up.
> Thanks for the report. Does that mean that we're filtering the broken
> pipe errors out in that test case? I'd expect the test case to fail if
> these logs showed up, without needing someone to modify it.
I don't think that test case is calling "AT_CHECK([check_logs ...])" to check the logs. Hence it is passing.
>> I tested this with out the refactored ovn-northd main loop code and I still see the same logs in the ovsdb-server.log
>> Not sure what is the cause of this. May be the issue is somewhere else.
> It sounds like this patch just makes the bug less likely (perhaps by
> virtue of sending less transactions).
>
>> Also with this patch, I can see the issue [2] again. I guess its not a major issue.
> Are you able to tell us a bit more about your setup and why you're
> able to reproduce it?
I have a single node devstack setup running on my guest Fedora 22 libvirt VM (hosted on my Fedora 23 laptop).
To reproduce the issue, I applied the /"ovn: support ARP response for known IPs" //patch from Han Zhou/.
Please see the attached ovn-northd.c file at line 1149.

These are the steps which I followed to reproduce the issue.
 1. Deploy the devstack with networking-ovn
 2. Apply the ARP patch, compile ovn-northd and restart it.
 3. When I run the command "sudo ovn-sbctl dump-flows", I don't see any logical flows related to ARP.
 4. If i run "neutron port-create private" or any other neutron command which updates the ovn-nb.db, I see the below ARP logical flows reflected
    when i dump the logical flows.
 ----
      table=3(   ls_in_l2_lkup), priority=  150, match=(arp.tpa == 10.0.0.1 && arp.op == 1), action=(eth.dst = eth.src; eth.src = fa:16:3e:d6:20:d7; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = fa:16:3e:d6:20:d7; arp.tpa = arp.spa; arp.spa = 10.0.0.1; outport = inport; inport = ""; /* Allow sending out inport. */ output;)
  table=3(   ls_in_l2_lkup), priority=  150, match=(arp.tpa == 10.0.0.2 && arp.op == 1), action=(eth.dst = eth.src; eth.src = fa:16:3e:81:21:dc; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = fa:16:3e:81:21:dc; arp.tpa = arp.spa; arp.spa = 10.0.0.2; outport = inport; inport = ""; /* Allow sending out inport. */ output;)

 ----

 5. When I revert the changes and restart ovn-northd, I still see those ARP logical flows until the ovn-nb.db is updated.
 6. Sometimes the issue is not seen, in which case I revert the ARP code in ovn-northd.c and try again until I see the issue.

Please also see this - http://openvswitch.org/pipermail/discuss/2015-November/019444.html


> I have only been able to reproduce the broken pipe issue on build
> systems like travis where the underlying CPU resource is shared with
> other users, and I suspect they're using nested virtualization which
> can exacerbate certain types of failures. I had been assuming that
> this is related to why it fails.
I am able to reproduce the broken pip issue on my laptop running Fedora 23 by running "make check TESTSUITEFLAGS='1713'" where 1713 is the test case number for
"[ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR]"


Thanks
Numan
diff mbox

Patch

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 01f289d1c746..7f3a92e3539d 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -1799,6 +1799,7 @@  int
 main(int argc, char *argv[])
 {
     extern struct vlog_module VLM_reconnect;
+    unsigned int ovnnb_seqno, ovnsb_seqno;
     int res = EXIT_SUCCESS;
     struct unixctl_server *unixctl;
     int retval;
@@ -1868,6 +1869,9 @@  main(int argc, char *argv[])
     add_column_noalert(ovnsb_idl_loop.idl, &sbrec_port_binding_col_mac);
     ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_port_binding_col_chassis);
 
+    ovnnb_seqno = ovsdb_idl_get_seqno(ovnnb_idl_loop.idl);
+    ovnsb_seqno = ovsdb_idl_get_seqno(ovnsb_idl_loop.idl);
+
     /* Main loop. */
     exiting = false;
     while (!exiting) {
@@ -1878,8 +1882,14 @@  main(int argc, char *argv[])
             .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
         };
 
-        ovnnb_db_run(&ctx);
-        ovnsb_db_run(&ctx);
+        if (ovnnb_seqno != ovsdb_idl_get_seqno(ctx.ovnnb_idl)) {
+            ovnnb_seqno = ovsdb_idl_get_seqno(ctx.ovnnb_idl);
+            ovnnb_db_run(&ctx);
+        }
+        if (ovnsb_seqno != ovsdb_idl_get_seqno(ctx.ovnsb_idl)) {
+            ovnsb_seqno = ovsdb_idl_get_seqno(ctx.ovnsb_idl);
+            ovnsb_db_run(&ctx);
+        }
 
         unixctl_server_run(unixctl);
         unixctl_server_wait(unixctl);