Message ID | 1495046266-9007-1-git-send-email-azhou@ovn.org |
---|---|
State | Accepted |
Headers | show |
On Wed, May 17, 2017 at 11:37 AM, Andy Zhou <azhou@ovn.org> wrote: > Without the fix, this test currently consistently fail when running > on Travis CI. Connecting to the controller can take more time than > running locally. Because the exact connecting time is variable, the > exact output should not be used for correctness checking. > > Fixes: 85c55772a453(bridge: Fix controller status update) > Signed-off-by: Andy Zhou <azhou@ovn.org> > --- > tests/bridge.at | 23 +++++++++++++++-------- > 1 file changed, 15 insertions(+), 8 deletions(-) Is this the breakage in the TESTSUITE=1 builds that I'm seeing here: https://travis-ci.org/gvrose8192/ovs-experimental - Greg > > diff --git a/tests/bridge.at b/tests/bridge.at > index 58b27d445062..cc7619d3f035 100644 > --- a/tests/bridge.at > +++ b/tests/bridge.at > @@ -49,23 +49,30 @@ OVS_VSWITCHD_START( > > dnl Start ovs-testcontroller > AT_CHECK([ovs-testcontroller --detach punix:controller], [0], [ignore]) > +OVS_WAIT_UNTIL([test -e controller]) > > dnl Add the controller to both bridges, 5 seconds apart. > AT_CHECK([ovs-vsctl set-controller br0 unix:controller]) > +AT_CHECK([ovs-vsctl set-fail-mode br0 secure]) > AT_CHECK([ovs-appctl time/warp 5000], [0], [ignore]) > AT_CHECK([ovs-vsctl set-controller br1 unix:controller]) > +AT_CHECK([ovs-vsctl set-fail-mode br1 secure]) > > -dnl Wait for the controller connection to come up > -for i in `seq 0 7` > +dnl Wait for the controller connectionsi to be up > +for i in `seq 0 19` > do > - AT_CHECK([ovs-appctl time/warp 10], [0], [ignore]) > + if ovs-vsctl --column=is_connected list controller |grep "false"; then > + : > + else > + break > + fi > + ovs-appctl time/warp 1100 > done > > -dnl Make sure the connection status are different > -AT_CHECK([ovs-vsctl --columns=status list controller | sort], [0], [dnl > - > -status : {sec_since_connect="0", state=ACTIVE} > -status : {sec_since_connect="5", state=ACTIVE} > +dnl Make sure the connection status have two records and they are different. > +dnl (The exact output contains timing information that are machine dependent.) > +AT_CHECK([ovs-vsctl --column=status list controller | dnl > + grep "status" | sort -u |wc -l], [0], [2 > ]) > > OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) > -- > 1.8.3.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 17 May 2017 at 12:09, Greg Rose <gvrose8192@gmail.com> wrote: > On Wed, May 17, 2017 at 11:37 AM, Andy Zhou <azhou@ovn.org> wrote: >> Without the fix, this test currently consistently fail when running >> on Travis CI. Connecting to the controller can take more time than >> running locally. Because the exact connecting time is variable, the >> exact output should not be used for correctness checking. >> >> Fixes: 85c55772a453(bridge: Fix controller status update) >> Signed-off-by: Andy Zhou <azhou@ovn.org> >> --- >> tests/bridge.at | 23 +++++++++++++++-------- >> 1 file changed, 15 insertions(+), 8 deletions(-) > > Is this the breakage in the TESTSUITE=1 builds that I'm seeing here: > > https://travis-ci.org/gvrose8192/ovs-experimental Yup.
On 17 May 2017 at 11:37, Andy Zhou <azhou@ovn.org> wrote: > Without the fix, this test currently consistently fail when running > on Travis CI. Connecting to the controller can take more time than > running locally. Because the exact connecting time is variable, the > exact output should not be used for correctness checking. > > Fixes: 85c55772a453(bridge: Fix controller status update) > Signed-off-by: Andy Zhou <azhou@ovn.org> > --- Thanks, this seems to improve the situation. > tests/bridge.at | 23 +++++++++++++++-------- > 1 file changed, 15 insertions(+), 8 deletions(-) > > diff --git a/tests/bridge.at b/tests/bridge.at > index 58b27d445062..cc7619d3f035 100644 > --- a/tests/bridge.at > +++ b/tests/bridge.at > @@ -49,23 +49,30 @@ OVS_VSWITCHD_START( > > dnl Start ovs-testcontroller > AT_CHECK([ovs-testcontroller --detach punix:controller], [0], [ignore]) > +OVS_WAIT_UNTIL([test -e controller]) > > dnl Add the controller to both bridges, 5 seconds apart. > AT_CHECK([ovs-vsctl set-controller br0 unix:controller]) > +AT_CHECK([ovs-vsctl set-fail-mode br0 secure]) > AT_CHECK([ovs-appctl time/warp 5000], [0], [ignore]) > AT_CHECK([ovs-vsctl set-controller br1 unix:controller]) > +AT_CHECK([ovs-vsctl set-fail-mode br1 secure]) > > -dnl Wait for the controller connection to come up > -for i in `seq 0 7` > +dnl Wait for the controller connectionsi to be up > +for i in `seq 0 19` > do > - AT_CHECK([ovs-appctl time/warp 10], [0], [ignore]) > + if ovs-vsctl --column=is_connected list controller |grep "false"; then > + : > + else > + break > + fi > + ovs-appctl time/warp 1100 > done Can we substitute this for something like OVS_WAIT_WHILE([ovs-vsctl --column=is_connected list controller | grep "false"]) ?
On Wed, May 17, 2017 at 12:33 PM, Joe Stringer <joe@ovn.org> wrote: > On 17 May 2017 at 11:37, Andy Zhou <azhou@ovn.org> wrote: >> Without the fix, this test currently consistently fail when running >> on Travis CI. Connecting to the controller can take more time than >> running locally. Because the exact connecting time is variable, the >> exact output should not be used for correctness checking. >> >> Fixes: 85c55772a453(bridge: Fix controller status update) >> Signed-off-by: Andy Zhou <azhou@ovn.org> >> --- > > Thanks, this seems to improve the situation. > >> tests/bridge.at | 23 +++++++++++++++-------- >> 1 file changed, 15 insertions(+), 8 deletions(-) >> >> diff --git a/tests/bridge.at b/tests/bridge.at >> index 58b27d445062..cc7619d3f035 100644 >> --- a/tests/bridge.at >> +++ b/tests/bridge.at >> @@ -49,23 +49,30 @@ OVS_VSWITCHD_START( >> >> dnl Start ovs-testcontroller >> AT_CHECK([ovs-testcontroller --detach punix:controller], [0], [ignore]) >> +OVS_WAIT_UNTIL([test -e controller]) >> >> dnl Add the controller to both bridges, 5 seconds apart. >> AT_CHECK([ovs-vsctl set-controller br0 unix:controller]) >> +AT_CHECK([ovs-vsctl set-fail-mode br0 secure]) >> AT_CHECK([ovs-appctl time/warp 5000], [0], [ignore]) >> AT_CHECK([ovs-vsctl set-controller br1 unix:controller]) >> +AT_CHECK([ovs-vsctl set-fail-mode br1 secure]) >> >> -dnl Wait for the controller connection to come up >> -for i in `seq 0 7` >> +dnl Wait for the controller connectionsi to be up >> +for i in `seq 0 19` >> do >> - AT_CHECK([ovs-appctl time/warp 10], [0], [ignore]) >> + if ovs-vsctl --column=is_connected list controller |grep "false"; then >> + : >> + else >> + break >> + fi >> + ovs-appctl time/warp 1100 >> done > > Can we substitute this for something like OVS_WAIT_WHILE([ovs-vsctl > --column=is_connected list controller | grep "false"]) ? I have tested using OVS_WAIT_WHILE() and found some times 10 seconds is not enough. The patch uses ovs-appctl time/warp instead wall clock, so it should have less delay than using sleep.
On 17 May 2017 at 13:16, Andy Zhou <azhou@ovn.org> wrote: > On Wed, May 17, 2017 at 12:33 PM, Joe Stringer <joe@ovn.org> wrote: >> On 17 May 2017 at 11:37, Andy Zhou <azhou@ovn.org> wrote: >>> Without the fix, this test currently consistently fail when running >>> on Travis CI. Connecting to the controller can take more time than >>> running locally. Because the exact connecting time is variable, the >>> exact output should not be used for correctness checking. >>> >>> Fixes: 85c55772a453(bridge: Fix controller status update) >>> Signed-off-by: Andy Zhou <azhou@ovn.org> >>> --- >> >> Thanks, this seems to improve the situation. >> >>> tests/bridge.at | 23 +++++++++++++++-------- >>> 1 file changed, 15 insertions(+), 8 deletions(-) >>> >>> diff --git a/tests/bridge.at b/tests/bridge.at >>> index 58b27d445062..cc7619d3f035 100644 >>> --- a/tests/bridge.at >>> +++ b/tests/bridge.at >>> @@ -49,23 +49,30 @@ OVS_VSWITCHD_START( >>> >>> dnl Start ovs-testcontroller >>> AT_CHECK([ovs-testcontroller --detach punix:controller], [0], [ignore]) >>> +OVS_WAIT_UNTIL([test -e controller]) >>> >>> dnl Add the controller to both bridges, 5 seconds apart. >>> AT_CHECK([ovs-vsctl set-controller br0 unix:controller]) >>> +AT_CHECK([ovs-vsctl set-fail-mode br0 secure]) >>> AT_CHECK([ovs-appctl time/warp 5000], [0], [ignore]) >>> AT_CHECK([ovs-vsctl set-controller br1 unix:controller]) >>> +AT_CHECK([ovs-vsctl set-fail-mode br1 secure]) >>> >>> -dnl Wait for the controller connection to come up >>> -for i in `seq 0 7` >>> +dnl Wait for the controller connectionsi to be up >>> +for i in `seq 0 19` >>> do >>> - AT_CHECK([ovs-appctl time/warp 10], [0], [ignore]) >>> + if ovs-vsctl --column=is_connected list controller |grep "false"; then >>> + : >>> + else >>> + break >>> + fi >>> + ovs-appctl time/warp 1100 >>> done >> >> Can we substitute this for something like OVS_WAIT_WHILE([ovs-vsctl >> --column=is_connected list controller | grep "false"]) ? > > I have tested using OVS_WAIT_WHILE() and found some times 10 seconds > is not enough. The patch uses ovs-appctl time/warp instead wall clock, so it > should have less delay than using sleep. I see, just noticed this myself on Travis. My main concern is to get this fixed up on master, we can argue about how the test is layed out later. Acked-by: Joe Stringer <joe@ovn.org>
On Wed, May 17, 2017 at 1:43 PM, Joe Stringer <joe@ovn.org> wrote: > On 17 May 2017 at 13:16, Andy Zhou <azhou@ovn.org> wrote: >> On Wed, May 17, 2017 at 12:33 PM, Joe Stringer <joe@ovn.org> wrote: >>> On 17 May 2017 at 11:37, Andy Zhou <azhou@ovn.org> wrote: >>>> Without the fix, this test currently consistently fail when running >>>> on Travis CI. Connecting to the controller can take more time than >>>> running locally. Because the exact connecting time is variable, the >>>> exact output should not be used for correctness checking. >>>> >>>> Fixes: 85c55772a453(bridge: Fix controller status update) >>>> Signed-off-by: Andy Zhou <azhou@ovn.org> >>>> --- >>> >>> Thanks, this seems to improve the situation. >>> >>>> tests/bridge.at | 23 +++++++++++++++-------- >>>> 1 file changed, 15 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/tests/bridge.at b/tests/bridge.at >>>> index 58b27d445062..cc7619d3f035 100644 >>>> --- a/tests/bridge.at >>>> +++ b/tests/bridge.at >>>> @@ -49,23 +49,30 @@ OVS_VSWITCHD_START( >>>> >>>> dnl Start ovs-testcontroller >>>> AT_CHECK([ovs-testcontroller --detach punix:controller], [0], [ignore]) >>>> +OVS_WAIT_UNTIL([test -e controller]) >>>> >>>> dnl Add the controller to both bridges, 5 seconds apart. >>>> AT_CHECK([ovs-vsctl set-controller br0 unix:controller]) >>>> +AT_CHECK([ovs-vsctl set-fail-mode br0 secure]) >>>> AT_CHECK([ovs-appctl time/warp 5000], [0], [ignore]) >>>> AT_CHECK([ovs-vsctl set-controller br1 unix:controller]) >>>> +AT_CHECK([ovs-vsctl set-fail-mode br1 secure]) >>>> >>>> -dnl Wait for the controller connection to come up >>>> -for i in `seq 0 7` >>>> +dnl Wait for the controller connectionsi to be up >>>> +for i in `seq 0 19` >>>> do >>>> - AT_CHECK([ovs-appctl time/warp 10], [0], [ignore]) >>>> + if ovs-vsctl --column=is_connected list controller |grep "false"; then >>>> + : >>>> + else >>>> + break >>>> + fi >>>> + ovs-appctl time/warp 1100 >>>> done >>> >>> Can we substitute this for something like OVS_WAIT_WHILE([ovs-vsctl >>> --column=is_connected list controller | grep "false"]) ? >> >> I have tested using OVS_WAIT_WHILE() and found some times 10 seconds >> is not enough. The patch uses ovs-appctl time/warp instead wall clock, so it >> should have less delay than using sleep. > > I see, just noticed this myself on Travis. > > My main concern is to get this fixed up on master, we can argue about > how the test is layed out later. May be we can consider having another kind of wait what only advances ovs internal clock? At any rate, I agree that fix the master is a good thing to do. > > Acked-by: Joe Stringer <joe@ovn.org> Thanks for the review, Pushed to master.
diff --git a/tests/bridge.at b/tests/bridge.at index 58b27d445062..cc7619d3f035 100644 --- a/tests/bridge.at +++ b/tests/bridge.at @@ -49,23 +49,30 @@ OVS_VSWITCHD_START( dnl Start ovs-testcontroller AT_CHECK([ovs-testcontroller --detach punix:controller], [0], [ignore]) +OVS_WAIT_UNTIL([test -e controller]) dnl Add the controller to both bridges, 5 seconds apart. AT_CHECK([ovs-vsctl set-controller br0 unix:controller]) +AT_CHECK([ovs-vsctl set-fail-mode br0 secure]) AT_CHECK([ovs-appctl time/warp 5000], [0], [ignore]) AT_CHECK([ovs-vsctl set-controller br1 unix:controller]) +AT_CHECK([ovs-vsctl set-fail-mode br1 secure]) -dnl Wait for the controller connection to come up -for i in `seq 0 7` +dnl Wait for the controller connectionsi to be up +for i in `seq 0 19` do - AT_CHECK([ovs-appctl time/warp 10], [0], [ignore]) + if ovs-vsctl --column=is_connected list controller |grep "false"; then + : + else + break + fi + ovs-appctl time/warp 1100 done -dnl Make sure the connection status are different -AT_CHECK([ovs-vsctl --columns=status list controller | sort], [0], [dnl - -status : {sec_since_connect="0", state=ACTIVE} -status : {sec_since_connect="5", state=ACTIVE} +dnl Make sure the connection status have two records and they are different. +dnl (The exact output contains timing information that are machine dependent.) +AT_CHECK([ovs-vsctl --column=status list controller | dnl + grep "status" | sort -u |wc -l], [0], [2 ]) OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
Without the fix, this test currently consistently fail when running on Travis CI. Connecting to the controller can take more time than running locally. Because the exact connecting time is variable, the exact output should not be used for correctness checking. Fixes: 85c55772a453(bridge: Fix controller status update) Signed-off-by: Andy Zhou <azhou@ovn.org> --- tests/bridge.at | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-)