diff mbox

[ovs-dev] tests: Fix timing dependency bridge - multiple bridges share a controller

Message ID 1495046266-9007-1-git-send-email-azhou@ovn.org
State Accepted
Headers show

Commit Message

Andy Zhou May 17, 2017, 6:37 p.m. UTC
Without the fix, this test currently consistently fail when running
on Travis CI. Connecting to the controller can take more time than
running locally. Because the exact connecting time is variable, the
exact output should not be used for correctness checking.

Fixes: 85c55772a453(bridge: Fix controller status update)
Signed-off-by: Andy Zhou <azhou@ovn.org>
---
 tests/bridge.at | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

Comments

Gregory Rose May 17, 2017, 7:09 p.m. UTC | #1
On Wed, May 17, 2017 at 11:37 AM, Andy Zhou <azhou@ovn.org> wrote:
> Without the fix, this test currently consistently fail when running
> on Travis CI. Connecting to the controller can take more time than
> running locally. Because the exact connecting time is variable, the
> exact output should not be used for correctness checking.
>
> Fixes: 85c55772a453(bridge: Fix controller status update)
> Signed-off-by: Andy Zhou <azhou@ovn.org>
> ---
>  tests/bridge.at | 23 +++++++++++++++--------
>  1 file changed, 15 insertions(+), 8 deletions(-)

Is this the breakage in the TESTSUITE=1 builds that I'm seeing here:

https://travis-ci.org/gvrose8192/ovs-experimental

- Greg

>
> diff --git a/tests/bridge.at b/tests/bridge.at
> index 58b27d445062..cc7619d3f035 100644
> --- a/tests/bridge.at
> +++ b/tests/bridge.at
> @@ -49,23 +49,30 @@ OVS_VSWITCHD_START(
>
>  dnl Start ovs-testcontroller
>  AT_CHECK([ovs-testcontroller --detach punix:controller], [0], [ignore])
> +OVS_WAIT_UNTIL([test -e controller])
>
>  dnl Add the controller to both bridges, 5 seconds apart.
>  AT_CHECK([ovs-vsctl set-controller br0 unix:controller])
> +AT_CHECK([ovs-vsctl set-fail-mode br0 secure])
>  AT_CHECK([ovs-appctl time/warp 5000], [0], [ignore])
>  AT_CHECK([ovs-vsctl set-controller br1 unix:controller])
> +AT_CHECK([ovs-vsctl set-fail-mode br1 secure])
>
> -dnl Wait for the controller connection to come up
> -for i in `seq 0 7`
> +dnl Wait for the controller connectionsi to be up
> +for i in `seq 0 19`
>  do
> -    AT_CHECK([ovs-appctl time/warp 10], [0], [ignore])
> +    if ovs-vsctl --column=is_connected list controller |grep "false"; then
> +        :
> +    else
> +        break
> +    fi
> +    ovs-appctl time/warp 1100
>  done
>
> -dnl Make sure the connection status are different
> -AT_CHECK([ovs-vsctl --columns=status list controller | sort], [0], [dnl
> -
> -status              : {sec_since_connect="0", state=ACTIVE}
> -status              : {sec_since_connect="5", state=ACTIVE}
> +dnl Make sure the connection status have two records and they are different.
> +dnl (The exact output contains timing information that are machine dependent.)
> +AT_CHECK([ovs-vsctl --column=status list controller | dnl
> +          grep "status" | sort -u |wc -l], [0], [2
>  ])
>
>  OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> --
> 1.8.3.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Joe Stringer May 17, 2017, 7:12 p.m. UTC | #2
On 17 May 2017 at 12:09, Greg Rose <gvrose8192@gmail.com> wrote:
> On Wed, May 17, 2017 at 11:37 AM, Andy Zhou <azhou@ovn.org> wrote:
>> Without the fix, this test currently consistently fail when running
>> on Travis CI. Connecting to the controller can take more time than
>> running locally. Because the exact connecting time is variable, the
>> exact output should not be used for correctness checking.
>>
>> Fixes: 85c55772a453(bridge: Fix controller status update)
>> Signed-off-by: Andy Zhou <azhou@ovn.org>
>> ---
>>  tests/bridge.at | 23 +++++++++++++++--------
>>  1 file changed, 15 insertions(+), 8 deletions(-)
>
> Is this the breakage in the TESTSUITE=1 builds that I'm seeing here:
>
> https://travis-ci.org/gvrose8192/ovs-experimental

Yup.
Joe Stringer May 17, 2017, 7:33 p.m. UTC | #3
On 17 May 2017 at 11:37, Andy Zhou <azhou@ovn.org> wrote:
> Without the fix, this test currently consistently fail when running
> on Travis CI. Connecting to the controller can take more time than
> running locally. Because the exact connecting time is variable, the
> exact output should not be used for correctness checking.
>
> Fixes: 85c55772a453(bridge: Fix controller status update)
> Signed-off-by: Andy Zhou <azhou@ovn.org>
> ---

Thanks, this seems to improve the situation.

>  tests/bridge.at | 23 +++++++++++++++--------
>  1 file changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/tests/bridge.at b/tests/bridge.at
> index 58b27d445062..cc7619d3f035 100644
> --- a/tests/bridge.at
> +++ b/tests/bridge.at
> @@ -49,23 +49,30 @@ OVS_VSWITCHD_START(
>
>  dnl Start ovs-testcontroller
>  AT_CHECK([ovs-testcontroller --detach punix:controller], [0], [ignore])
> +OVS_WAIT_UNTIL([test -e controller])
>
>  dnl Add the controller to both bridges, 5 seconds apart.
>  AT_CHECK([ovs-vsctl set-controller br0 unix:controller])
> +AT_CHECK([ovs-vsctl set-fail-mode br0 secure])
>  AT_CHECK([ovs-appctl time/warp 5000], [0], [ignore])
>  AT_CHECK([ovs-vsctl set-controller br1 unix:controller])
> +AT_CHECK([ovs-vsctl set-fail-mode br1 secure])
>
> -dnl Wait for the controller connection to come up
> -for i in `seq 0 7`
> +dnl Wait for the controller connectionsi to be up
> +for i in `seq 0 19`
>  do
> -    AT_CHECK([ovs-appctl time/warp 10], [0], [ignore])
> +    if ovs-vsctl --column=is_connected list controller |grep "false"; then
> +        :
> +    else
> +        break
> +    fi
> +    ovs-appctl time/warp 1100
>  done

Can we substitute this for something like OVS_WAIT_WHILE([ovs-vsctl
--column=is_connected list controller | grep "false"]) ?
Andy Zhou May 17, 2017, 8:16 p.m. UTC | #4
On Wed, May 17, 2017 at 12:33 PM, Joe Stringer <joe@ovn.org> wrote:
> On 17 May 2017 at 11:37, Andy Zhou <azhou@ovn.org> wrote:
>> Without the fix, this test currently consistently fail when running
>> on Travis CI. Connecting to the controller can take more time than
>> running locally. Because the exact connecting time is variable, the
>> exact output should not be used for correctness checking.
>>
>> Fixes: 85c55772a453(bridge: Fix controller status update)
>> Signed-off-by: Andy Zhou <azhou@ovn.org>
>> ---
>
> Thanks, this seems to improve the situation.
>
>>  tests/bridge.at | 23 +++++++++++++++--------
>>  1 file changed, 15 insertions(+), 8 deletions(-)
>>
>> diff --git a/tests/bridge.at b/tests/bridge.at
>> index 58b27d445062..cc7619d3f035 100644
>> --- a/tests/bridge.at
>> +++ b/tests/bridge.at
>> @@ -49,23 +49,30 @@ OVS_VSWITCHD_START(
>>
>>  dnl Start ovs-testcontroller
>>  AT_CHECK([ovs-testcontroller --detach punix:controller], [0], [ignore])
>> +OVS_WAIT_UNTIL([test -e controller])
>>
>>  dnl Add the controller to both bridges, 5 seconds apart.
>>  AT_CHECK([ovs-vsctl set-controller br0 unix:controller])
>> +AT_CHECK([ovs-vsctl set-fail-mode br0 secure])
>>  AT_CHECK([ovs-appctl time/warp 5000], [0], [ignore])
>>  AT_CHECK([ovs-vsctl set-controller br1 unix:controller])
>> +AT_CHECK([ovs-vsctl set-fail-mode br1 secure])
>>
>> -dnl Wait for the controller connection to come up
>> -for i in `seq 0 7`
>> +dnl Wait for the controller connectionsi to be up
>> +for i in `seq 0 19`
>>  do
>> -    AT_CHECK([ovs-appctl time/warp 10], [0], [ignore])
>> +    if ovs-vsctl --column=is_connected list controller |grep "false"; then
>> +        :
>> +    else
>> +        break
>> +    fi
>> +    ovs-appctl time/warp 1100
>>  done
>
> Can we substitute this for something like OVS_WAIT_WHILE([ovs-vsctl
> --column=is_connected list controller | grep "false"]) ?

I have tested using OVS_WAIT_WHILE() and found some times 10 seconds
is not enough. The patch uses ovs-appctl time/warp instead wall clock, so it
should have less delay than using sleep.
Joe Stringer May 17, 2017, 8:43 p.m. UTC | #5
On 17 May 2017 at 13:16, Andy Zhou <azhou@ovn.org> wrote:
> On Wed, May 17, 2017 at 12:33 PM, Joe Stringer <joe@ovn.org> wrote:
>> On 17 May 2017 at 11:37, Andy Zhou <azhou@ovn.org> wrote:
>>> Without the fix, this test currently consistently fail when running
>>> on Travis CI. Connecting to the controller can take more time than
>>> running locally. Because the exact connecting time is variable, the
>>> exact output should not be used for correctness checking.
>>>
>>> Fixes: 85c55772a453(bridge: Fix controller status update)
>>> Signed-off-by: Andy Zhou <azhou@ovn.org>
>>> ---
>>
>> Thanks, this seems to improve the situation.
>>
>>>  tests/bridge.at | 23 +++++++++++++++--------
>>>  1 file changed, 15 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/tests/bridge.at b/tests/bridge.at
>>> index 58b27d445062..cc7619d3f035 100644
>>> --- a/tests/bridge.at
>>> +++ b/tests/bridge.at
>>> @@ -49,23 +49,30 @@ OVS_VSWITCHD_START(
>>>
>>>  dnl Start ovs-testcontroller
>>>  AT_CHECK([ovs-testcontroller --detach punix:controller], [0], [ignore])
>>> +OVS_WAIT_UNTIL([test -e controller])
>>>
>>>  dnl Add the controller to both bridges, 5 seconds apart.
>>>  AT_CHECK([ovs-vsctl set-controller br0 unix:controller])
>>> +AT_CHECK([ovs-vsctl set-fail-mode br0 secure])
>>>  AT_CHECK([ovs-appctl time/warp 5000], [0], [ignore])
>>>  AT_CHECK([ovs-vsctl set-controller br1 unix:controller])
>>> +AT_CHECK([ovs-vsctl set-fail-mode br1 secure])
>>>
>>> -dnl Wait for the controller connection to come up
>>> -for i in `seq 0 7`
>>> +dnl Wait for the controller connectionsi to be up
>>> +for i in `seq 0 19`
>>>  do
>>> -    AT_CHECK([ovs-appctl time/warp 10], [0], [ignore])
>>> +    if ovs-vsctl --column=is_connected list controller |grep "false"; then
>>> +        :
>>> +    else
>>> +        break
>>> +    fi
>>> +    ovs-appctl time/warp 1100
>>>  done
>>
>> Can we substitute this for something like OVS_WAIT_WHILE([ovs-vsctl
>> --column=is_connected list controller | grep "false"]) ?
>
> I have tested using OVS_WAIT_WHILE() and found some times 10 seconds
> is not enough. The patch uses ovs-appctl time/warp instead wall clock, so it
> should have less delay than using sleep.

I see, just noticed this myself on Travis.

My main concern is to get this fixed up on master, we can argue about
how the test is layed out later.

Acked-by: Joe Stringer <joe@ovn.org>
Andy Zhou May 17, 2017, 8:56 p.m. UTC | #6
On Wed, May 17, 2017 at 1:43 PM, Joe Stringer <joe@ovn.org> wrote:
> On 17 May 2017 at 13:16, Andy Zhou <azhou@ovn.org> wrote:
>> On Wed, May 17, 2017 at 12:33 PM, Joe Stringer <joe@ovn.org> wrote:
>>> On 17 May 2017 at 11:37, Andy Zhou <azhou@ovn.org> wrote:
>>>> Without the fix, this test currently consistently fail when running
>>>> on Travis CI. Connecting to the controller can take more time than
>>>> running locally. Because the exact connecting time is variable, the
>>>> exact output should not be used for correctness checking.
>>>>
>>>> Fixes: 85c55772a453(bridge: Fix controller status update)
>>>> Signed-off-by: Andy Zhou <azhou@ovn.org>
>>>> ---
>>>
>>> Thanks, this seems to improve the situation.
>>>
>>>>  tests/bridge.at | 23 +++++++++++++++--------
>>>>  1 file changed, 15 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/tests/bridge.at b/tests/bridge.at
>>>> index 58b27d445062..cc7619d3f035 100644
>>>> --- a/tests/bridge.at
>>>> +++ b/tests/bridge.at
>>>> @@ -49,23 +49,30 @@ OVS_VSWITCHD_START(
>>>>
>>>>  dnl Start ovs-testcontroller
>>>>  AT_CHECK([ovs-testcontroller --detach punix:controller], [0], [ignore])
>>>> +OVS_WAIT_UNTIL([test -e controller])
>>>>
>>>>  dnl Add the controller to both bridges, 5 seconds apart.
>>>>  AT_CHECK([ovs-vsctl set-controller br0 unix:controller])
>>>> +AT_CHECK([ovs-vsctl set-fail-mode br0 secure])
>>>>  AT_CHECK([ovs-appctl time/warp 5000], [0], [ignore])
>>>>  AT_CHECK([ovs-vsctl set-controller br1 unix:controller])
>>>> +AT_CHECK([ovs-vsctl set-fail-mode br1 secure])
>>>>
>>>> -dnl Wait for the controller connection to come up
>>>> -for i in `seq 0 7`
>>>> +dnl Wait for the controller connectionsi to be up
>>>> +for i in `seq 0 19`
>>>>  do
>>>> -    AT_CHECK([ovs-appctl time/warp 10], [0], [ignore])
>>>> +    if ovs-vsctl --column=is_connected list controller |grep "false"; then
>>>> +        :
>>>> +    else
>>>> +        break
>>>> +    fi
>>>> +    ovs-appctl time/warp 1100
>>>>  done
>>>
>>> Can we substitute this for something like OVS_WAIT_WHILE([ovs-vsctl
>>> --column=is_connected list controller | grep "false"]) ?
>>
>> I have tested using OVS_WAIT_WHILE() and found some times 10 seconds
>> is not enough. The patch uses ovs-appctl time/warp instead wall clock, so it
>> should have less delay than using sleep.
>
> I see, just noticed this myself on Travis.
>
> My main concern is to get this fixed up on master, we can argue about
> how the test is layed out later.

May be we can consider having another kind of wait what only advances
ovs internal clock? At any rate, I agree that fix the master is a good
thing to do.

>
> Acked-by: Joe Stringer <joe@ovn.org>
Thanks for the review, Pushed to master.
diff mbox

Patch

diff --git a/tests/bridge.at b/tests/bridge.at
index 58b27d445062..cc7619d3f035 100644
--- a/tests/bridge.at
+++ b/tests/bridge.at
@@ -49,23 +49,30 @@  OVS_VSWITCHD_START(
 
 dnl Start ovs-testcontroller
 AT_CHECK([ovs-testcontroller --detach punix:controller], [0], [ignore])
+OVS_WAIT_UNTIL([test -e controller])
 
 dnl Add the controller to both bridges, 5 seconds apart.
 AT_CHECK([ovs-vsctl set-controller br0 unix:controller])
+AT_CHECK([ovs-vsctl set-fail-mode br0 secure])
 AT_CHECK([ovs-appctl time/warp 5000], [0], [ignore])
 AT_CHECK([ovs-vsctl set-controller br1 unix:controller])
+AT_CHECK([ovs-vsctl set-fail-mode br1 secure])
 
-dnl Wait for the controller connection to come up
-for i in `seq 0 7`
+dnl Wait for the controller connectionsi to be up
+for i in `seq 0 19`
 do
-    AT_CHECK([ovs-appctl time/warp 10], [0], [ignore])
+    if ovs-vsctl --column=is_connected list controller |grep "false"; then
+        :
+    else
+        break
+    fi
+    ovs-appctl time/warp 1100
 done
 
-dnl Make sure the connection status are different
-AT_CHECK([ovs-vsctl --columns=status list controller | sort], [0], [dnl
-
-status              : {sec_since_connect="0", state=ACTIVE}
-status              : {sec_since_connect="5", state=ACTIVE}
+dnl Make sure the connection status have two records and they are different.
+dnl (The exact output contains timing information that are machine dependent.)
+AT_CHECK([ovs-vsctl --column=status list controller | dnl
+          grep "status" | sort -u |wc -l], [0], [2
 ])
 
 OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])