diff mbox series

[ovs-dev] ovn.at: Fix flaky tests "VLAN transparency, passthru=true, multiple hosts"

Message ID 20211001132624.3248475-1-xsimonar@redhat.com
State Superseded
Headers show
Series [ovs-dev] ovn.at: Fix flaky tests "VLAN transparency, passthru=true, multiple hosts" | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed

Commit Message

Xavier Simonart Oct. 1, 2021, 1:26 p.m. UTC
Tests were waiting for ports to be reported up before sending packets.
However, waiting for both ports to be up is not enough to guarantee
that all flows are installed for both ports.
We now wait for last flows (implementing switching to localnet port)
are installed.
Following tests were are fixed:
- VLAN transparency, passthru=true, multiple hosts
- VLAN transparency, passthru=true, multiple hosts, custom ethtype
- VLAN transparency, passthru=true, multiple hosts, flat/untagged

Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
---
 tests/ovn.at | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

Comments

Mark Gray Oct. 1, 2021, 3:12 p.m. UTC | #1
On 01/10/2021 14:26, Xavier Simonart wrote:
> Tests were waiting for ports to be reported up before sending packets.
> However, waiting for both ports to be up is not enough to guarantee
> that all flows are installed for both ports.
> We now wait for last flows (implementing switching to localnet port)
> are installed.
> Following tests were are fixed:
> - VLAN transparency, passthru=true, multiple hosts
> - VLAN transparency, passthru=true, multiple hosts, custom ethtype
> - VLAN transparency, passthru=true, multiple hosts, flat/untagged
> 
> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> ---
Looks good other than the comment I have in:
(ovn.at: Fix flaky test "controller I-P handling with monitoring disabled")

https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388223.html

>  tests/ovn.at | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/ovn.at b/tests/ovn.at
> index fc8f31d06..cc940701f 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -3413,6 +3413,14 @@ for i in 1 2; do
>                                    options:rxq_pcap=vif$i-rx.pcap \
>                                    ofport-request=$i
>      OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup])
> +    # Waiting for both ports to be up is not enough to guarantee that all flows are installed.
> +    # Wait for flows implementing switching to localnet port.
> +    OVS_WAIT_UNTIL([
> +        test $(as hv-$i ovs-ofctl dump-flows br-int | \
> +        grep "table=38" | \
> +        grep "resubmit(,38)" -c) -eq 1
> +    ])
> +
>  done
>  
>  test_packet() {
> @@ -3478,8 +3486,13 @@ for i in 1 2; do
>                                    options:rxq_pcap=vif$i-rx.pcap \
>                                    ofport-request=$i
>      wait_for_ports_up lsp$i
> -done
> +    OVS_WAIT_UNTIL([
> +        test $(ovs-ofctl dump-flows br-int | \
> +        grep "table=38" | \
> +        grep "resubmit(,38)" -c) -eq 1
> +    ])
>  
> +done
>  # create taps on fabric to check vlan encapsulation there
>  for i in 1 2; do
>      as hv-$i
> @@ -3561,6 +3574,12 @@ for i in 1 2; do
>                                    options:rxq_pcap=vif$i-rx.pcap \
>                                    ofport-request=$i
>      OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup])
> +    OVS_WAIT_UNTIL([
> +        test $(ovs-ofctl dump-flows br-int | \
> +        grep "table=38" | \
> +        grep "resubmit(,38)" -c) -eq 1
> +    ])
> +
>  done
>  
>  for i in 1 2; do
>
Mark Michelson Feb. 1, 2022, 2:50 p.m. UTC | #2
*blows the dust off*

Sorry this one has sat for so long. See my comments below.

On 10/1/21 11:12, Mark Gray wrote:
> On 01/10/2021 14:26, Xavier Simonart wrote:
>> Tests were waiting for ports to be reported up before sending packets.
>> However, waiting for both ports to be up is not enough to guarantee
>> that all flows are installed for both ports.
>> We now wait for last flows (implementing switching to localnet port)
>> are installed.
>> Following tests were are fixed:
>> - VLAN transparency, passthru=true, multiple hosts
>> - VLAN transparency, passthru=true, multiple hosts, custom ethtype
>> - VLAN transparency, passthru=true, multiple hosts, flat/untagged
>>
>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
>> ---
> Looks good other than the comment I have in:
> (ovn.at: Fix flaky test "controller I-P handling with monitoring disabled")
> 
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388223.html
> 
>>   tests/ovn.at | 21 ++++++++++++++++++++-
>>   1 file changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/tests/ovn.at b/tests/ovn.at
>> index fc8f31d06..cc940701f 100644
>> --- a/tests/ovn.at
>> +++ b/tests/ovn.at
>> @@ -3413,6 +3413,14 @@ for i in 1 2; do
>>                                     options:rxq_pcap=vif$i-rx.pcap \
>>                                     ofport-request=$i
>>       OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup])
>> +    # Waiting for both ports to be up is not enough to guarantee that all flows are installed.
>> +    # Wait for flows implementing switching to localnet port.
>> +    OVS_WAIT_UNTIL([
>> +        test $(as hv-$i ovs-ofctl dump-flows br-int | \
>> +        grep "table=38" | \
>> +        grep "resubmit(,38)" -c) -eq 1
>> +    ])
>> +

The concern Mark Gray raised is valid, since a change in table numbers 
would require updates to this test in order to continue to pass.

We actually have a more reliable method of testing if a port's OF flows 
have been installed. When a port's associated OF flows are installed, we 
set external_ids:ovn-installed=true on the Port_Binding in the 
southbound database.

I think the following should work:

wait_row_count Port_Binding 1 logical_port=lsp$i 
external_ids:ovn-installed=true

This will wait until there is 1 row in the Port_Binding table whose 
logical_port is set to lsp$i and whose external_ids:ovn-installed is set 
to true.

>>   done
>>   
>>   test_packet() {
>> @@ -3478,8 +3486,13 @@ for i in 1 2; do
>>                                     options:rxq_pcap=vif$i-rx.pcap \
>>                                     ofport-request=$i
>>       wait_for_ports_up lsp$i
>> -done
>> +    OVS_WAIT_UNTIL([
>> +        test $(ovs-ofctl dump-flows br-int | \
>> +        grep "table=38" | \
>> +        grep "resubmit(,38)" -c) -eq 1
>> +    ])
>>   
>> +done
>>   # create taps on fabric to check vlan encapsulation there
>>   for i in 1 2; do
>>       as hv-$i
>> @@ -3561,6 +3574,12 @@ for i in 1 2; do
>>                                     options:rxq_pcap=vif$i-rx.pcap \
>>                                     ofport-request=$i
>>       OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup])
>> +    OVS_WAIT_UNTIL([
>> +        test $(ovs-ofctl dump-flows br-int | \
>> +        grep "table=38" | \
>> +        grep "resubmit(,38)" -c) -eq 1
>> +    ])
>> +
>>   done
>>   
>>   for i in 1 2; do
>>
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Xavier Simonart Feb. 3, 2022, 11:21 a.m. UTC | #3
Hi Mark

Thanks for looking into this.

I agreed that the solution is not optimal as it would require updates if
table numbers are changing, but it was the best I had found at that time ...

For the solution you propose, do you mean looking at ovn-installed in *ovs*
- I do not think that there is an ovn-installed in sbdb ?
There is a "Port up" in sbdb and "ovn-installed" in ovs. But both might be
written quite before all flows are properly set up.

I did raise the question
<https://www.mail-archive.com/ovs-dev@openvswitch.org/msg59901.html> some
time ago about the potential solutions (fix ovn, fix test cases, ...) and
consensus seemed to be to try fixing the test case.
In addition to the case reported upstream, patch port is installed later in
this test case, and flows for the patch port even later.... so more reasons
for the test to fail.

However, I now think that, for this test case, we can do better than
checking table numbers:
- (1) check that the patch port is created in OVS.
- (2) wait for flows containing output:"patch_port_id" in Openflow - as
those flows are installed after the patch port has been created.
Note that checking whether patch port is created (1) is not enough - it is
decreasing the test failure rate, but there are still failures.
What do you think?

Thanks
Xvaier


On Tue, Feb 1, 2022 at 3:51 PM Mark Michelson <mmichels@redhat.com> wrote:

> *blows the dust off*
>
> Sorry this one has sat for so long. See my comments below.
>
> On 10/1/21 11:12, Mark Gray wrote:
> > On 01/10/2021 14:26, Xavier Simonart wrote:
> >> Tests were waiting for ports to be reported up before sending packets.
> >> However, waiting for both ports to be up is not enough to guarantee
> >> that all flows are installed for both ports.
> >> We now wait for last flows (implementing switching to localnet port)
> >> are installed.
> >> Following tests were are fixed:
> >> - VLAN transparency, passthru=true, multiple hosts
> >> - VLAN transparency, passthru=true, multiple hosts, custom ethtype
> >> - VLAN transparency, passthru=true, multiple hosts, flat/untagged
> >>
> >> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> >> ---
> > Looks good other than the comment I have in:
> > (ovn.at: Fix flaky test "controller I-P handling with monitoring
> disabled")
> >
> > https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388223.html
> >
> >>   tests/ovn.at | 21 ++++++++++++++++++++-
> >>   1 file changed, 20 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/tests/ovn.at b/tests/ovn.at
> >> index fc8f31d06..cc940701f 100644
> >> --- a/tests/ovn.at
> >> +++ b/tests/ovn.at
> >> @@ -3413,6 +3413,14 @@ for i in 1 2; do
> >>                                     options:rxq_pcap=vif$i-rx.pcap \
> >>                                     ofport-request=$i
> >>       OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup])
> >> +    # Waiting for both ports to be up is not enough to guarantee that
> all flows are installed.
> >> +    # Wait for flows implementing switching to localnet port.
> >> +    OVS_WAIT_UNTIL([
> >> +        test $(as hv-$i ovs-ofctl dump-flows br-int | \
> >> +        grep "table=38" | \
> >> +        grep "resubmit(,38)" -c) -eq 1
> >> +    ])
> >> +
>
> The concern Mark Gray raised is valid, since a change in table numbers
> would require updates to this test in order to continue to pass.
>
> We actually have a more reliable method of testing if a port's OF flows
> have been installed. When a port's associated OF flows are installed, we
> set external_ids:ovn-installed=true on the Port_Binding in the
> southbound database.
>
> I think the following should work:
>
> wait_row_count Port_Binding 1 logical_port=lsp$i
> external_ids:ovn-installed=true
>
> This will wait until there is 1 row in the Port_Binding table whose
> logical_port is set to lsp$i and whose external_ids:ovn-installed is set
> to true.
>
> >>   done
> >>
> >>   test_packet() {
> >> @@ -3478,8 +3486,13 @@ for i in 1 2; do
> >>                                     options:rxq_pcap=vif$i-rx.pcap \
> >>                                     ofport-request=$i
> >>       wait_for_ports_up lsp$i
> >> -done
> >> +    OVS_WAIT_UNTIL([
> >> +        test $(ovs-ofctl dump-flows br-int | \
> >> +        grep "table=38" | \
> >> +        grep "resubmit(,38)" -c) -eq 1
> >> +    ])
> >>
> >> +done
> >>   # create taps on fabric to check vlan encapsulation there
> >>   for i in 1 2; do
> >>       as hv-$i
> >> @@ -3561,6 +3574,12 @@ for i in 1 2; do
> >>                                     options:rxq_pcap=vif$i-rx.pcap \
> >>                                     ofport-request=$i
> >>       OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup])
> >> +    OVS_WAIT_UNTIL([
> >> +        test $(ovs-ofctl dump-flows br-int | \
> >> +        grep "table=38" | \
> >> +        grep "resubmit(,38)" -c) -eq 1
> >> +    ])
> >> +
> >>   done
> >>
> >>   for i in 1 2; do
> >>
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>
>
Mark Michelson Feb. 3, 2022, 12:59 p.m. UTC | #4
On 2/3/22 06:21, Xavier Simonart wrote:
> Hi Mark
> 
> Thanks for looking into this.
> 
> I agreed that the solution is not optimal as it would require updates if 
> table numbers are changing, but it was the best I had found at that time ...
> 
> For the solution you propose, do you mean looking at ovn-installed in 
> _ovs_ - I do not think that there is an ovn-installed in sbdb ?

Sorry, you are correct, it's in the OVS Interface table that corresponds 
with the SB port binding. Unfortunately, wait_row_count (and other 
functions/macros in ovn-macros.at) do not work on the OVS database. So 
to get the same functionality, you would need to copy the logic of 
wait_row_count to determine when an OVS Interface has 
external_ids:iface-id set to the logical port name and 
external_ids:ovn-installed set to true...or make the ovn-macros work 
with the OVS database :)

> There is a "Port up" in sbdb and "ovn-installed" in ovs. But both might 
> be written quite before all flows are properly set up.
> 
> I did raise thequestion 
> <https://www.mail-archive.com/ovs-dev@openvswitch.org/msg59901.html> 
> some time ago about the potential solutions (fix ovn, fix test cases, 
> ...) and consensus seemed to be to try fixing the test case.
> In addition to the case reported upstream, patch port is installed later 
> in this test case, and flows for the patch port even later.... so more 
> reasons for the test to fail.
> 
> However, I now think that, for this test case, we can do better than 
> checking table numbers:
> - (1) check that the patch port is created in OVS.
> - (2) wait for flows containing output:"patch_port_id" in Openflow - as 
> those flows are installed after the patch port has been created.
> Note that checking whether patch port is created (1) is not enough - it 
> is decreasing the test failure rate, but there are still failures.
> What do you think?

Yes, I think this can work, too. Checking "output" actions should be 
reliable even in the face of changing table numbers. Feel free to give 
it a try if you think it will be easier or more reliable than using 
ovn-installed.

> 
> Thanks
> Xvaier
> 
> 
> On Tue, Feb 1, 2022 at 3:51 PM Mark Michelson <mmichels@redhat.com 
> <mailto:mmichels@redhat.com>> wrote:
> 
>     *blows the dust off*
> 
>     Sorry this one has sat for so long. See my comments below.
> 
>     On 10/1/21 11:12, Mark Gray wrote:
>      > On 01/10/2021 14:26, Xavier Simonart wrote:
>      >> Tests were waiting for ports to be reported up before sending
>     packets.
>      >> However, waiting for both ports to be up is not enough to guarantee
>      >> that all flows are installed for both ports.
>      >> We now wait for last flows (implementing switching to localnet port)
>      >> are installed.
>      >> Following tests were are fixed:
>      >> - VLAN transparency, passthru=true, multiple hosts
>      >> - VLAN transparency, passthru=true, multiple hosts, custom ethtype
>      >> - VLAN transparency, passthru=true, multiple hosts, flat/untagged
>      >>
>      >> Signed-off-by: Xavier Simonart <xsimonar@redhat.com
>     <mailto:xsimonar@redhat.com>>
>      >> ---
>      > Looks good other than the comment I have in:
>      > (ovn.at <http://ovn.at>: Fix flaky test "controller I-P handling
>     with monitoring disabled")
>      >
>      >
>     https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388223.html
>     <https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388223.html>
>      >
>      >>   tests/ovn.at <http://ovn.at> | 21 ++++++++++++++++++++-
>      >>   1 file changed, 20 insertions(+), 1 deletion(-)
>      >>
>      >> diff --git a/tests/ovn.at <http://ovn.at> b/tests/ovn.at
>     <http://ovn.at>
>      >> index fc8f31d06..cc940701f 100644
>      >> --- a/tests/ovn.at <http://ovn.at>
>      >> +++ b/tests/ovn.at <http://ovn.at>
>      >> @@ -3413,6 +3413,14 @@ for i in 1 2; do
>      >>                                     options:rxq_pcap=vif$i-rx.pcap \
>      >>                                     ofport-request=$i
>      >>       OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup])
>      >> +    # Waiting for both ports to be up is not enough to
>     guarantee that all flows are installed.
>      >> +    # Wait for flows implementing switching to localnet port.
>      >> +    OVS_WAIT_UNTIL([
>      >> +        test $(as hv-$i ovs-ofctl dump-flows br-int | \
>      >> +        grep "table=38" | \
>      >> +        grep "resubmit(,38)" -c) -eq 1
>      >> +    ])
>      >> +
> 
>     The concern Mark Gray raised is valid, since a change in table numbers
>     would require updates to this test in order to continue to pass.
> 
>     We actually have a more reliable method of testing if a port's OF flows
>     have been installed. When a port's associated OF flows are
>     installed, we
>     set external_ids:ovn-installed=true on the Port_Binding in the
>     southbound database.
> 
>     I think the following should work:
> 
>     wait_row_count Port_Binding 1 logical_port=lsp$i
>     external_ids:ovn-installed=true
> 
>     This will wait until there is 1 row in the Port_Binding table whose
>     logical_port is set to lsp$i and whose external_ids:ovn-installed is
>     set
>     to true.
> 
>      >>   done
>      >>
>      >>   test_packet() {
>      >> @@ -3478,8 +3486,13 @@ for i in 1 2; do
>      >>                                     options:rxq_pcap=vif$i-rx.pcap \
>      >>                                     ofport-request=$i
>      >>       wait_for_ports_up lsp$i
>      >> -done
>      >> +    OVS_WAIT_UNTIL([
>      >> +        test $(ovs-ofctl dump-flows br-int | \
>      >> +        grep "table=38" | \
>      >> +        grep "resubmit(,38)" -c) -eq 1
>      >> +    ])
>      >>
>      >> +done
>      >>   # create taps on fabric to check vlan encapsulation there
>      >>   for i in 1 2; do
>      >>       as hv-$i
>      >> @@ -3561,6 +3574,12 @@ for i in 1 2; do
>      >>                                     options:rxq_pcap=vif$i-rx.pcap \
>      >>                                     ofport-request=$i
>      >>       OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup])
>      >> +    OVS_WAIT_UNTIL([
>      >> +        test $(ovs-ofctl dump-flows br-int | \
>      >> +        grep "table=38" | \
>      >> +        grep "resubmit(,38)" -c) -eq 1
>      >> +    ])
>      >> +
>      >>   done
>      >>
>      >>   for i in 1 2; do
>      >>
>      >
>      > _______________________________________________
>      > dev mailing list
>      > dev@openvswitch.org <mailto:dev@openvswitch.org>
>      > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>      >
>
diff mbox series

Patch

diff --git a/tests/ovn.at b/tests/ovn.at
index fc8f31d06..cc940701f 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -3413,6 +3413,14 @@  for i in 1 2; do
                                   options:rxq_pcap=vif$i-rx.pcap \
                                   ofport-request=$i
     OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup])
+    # Waiting for both ports to be up is not enough to guarantee that all flows are installed.
+    # Wait for flows implementing switching to localnet port.
+    OVS_WAIT_UNTIL([
+        test $(as hv-$i ovs-ofctl dump-flows br-int | \
+        grep "table=38" | \
+        grep "resubmit(,38)" -c) -eq 1
+    ])
+
 done
 
 test_packet() {
@@ -3478,8 +3486,13 @@  for i in 1 2; do
                                   options:rxq_pcap=vif$i-rx.pcap \
                                   ofport-request=$i
     wait_for_ports_up lsp$i
-done
+    OVS_WAIT_UNTIL([
+        test $(ovs-ofctl dump-flows br-int | \
+        grep "table=38" | \
+        grep "resubmit(,38)" -c) -eq 1
+    ])
 
+done
 # create taps on fabric to check vlan encapsulation there
 for i in 1 2; do
     as hv-$i
@@ -3561,6 +3574,12 @@  for i in 1 2; do
                                   options:rxq_pcap=vif$i-rx.pcap \
                                   ofport-request=$i
     OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup])
+    OVS_WAIT_UNTIL([
+        test $(ovs-ofctl dump-flows br-int | \
+        grep "table=38" | \
+        grep "resubmit(,38)" -c) -eq 1
+    ])
+
 done
 
 for i in 1 2; do