Message ID | 20220321095648.710943-1-xsimonar@redhat.com |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev] Fixed multiple flaky tests | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
On 3/21/22 10:56, Xavier Simonart wrote: > - send gratuitous arp on localnet > - send gratuitous arp for nat ips in localnet > - dns lookup : 1 HV, 2 LS, 2 LSPs/LS -- ovn-northd -- dp-groups=yes > - send gratuitous arp for NAT rules on distributed router > - 2 HVs, 1 lport/HV, localport ports > > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > --- Hi Xavier, > tests/ovn.at | 20 +++++++++++++------- > 1 file changed, 13 insertions(+), 7 deletions(-) > > diff --git a/tests/ovn.at b/tests/ovn.at > index 166b5f72e..cf027703f 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at [...] > @@ -10391,7 +10395,7 @@ test_dns6() { > } > > AT_CAPTURE_FILE([ofctl_monitor0.log]) > -as hv1 ovs-ofctl monitor br-int resume --detach --no-chdir \ > +as hv1 ovs-ofctl -t 300 monitor br-int resume --detach --no-chdir \ > --pidfile=ovs-ofctl0.pid 2> ofctl_monitor0.log > Is this really needed? We set OVS_CTL_TIMEOUT=30 in atlocal.in. Thanks, Dumitru
Hi Dumitru On Mon, Mar 21, 2022 at 12:02 PM Dumitru Ceara <dceara@redhat.com> wrote: > On 3/21/22 10:56, Xavier Simonart wrote: > > - send gratuitous arp on localnet > > - send gratuitous arp for nat ips in localnet > > - dns lookup : 1 HV, 2 LS, 2 LSPs/LS -- ovn-northd -- dp-groups=yes > > - send gratuitous arp for NAT rules on distributed router > > - 2 HVs, 1 lport/HV, localport ports > > > > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > > --- > > Hi Xavier, > > > tests/ovn.at | 20 +++++++++++++------- > > 1 file changed, 13 insertions(+), 7 deletions(-) > > > > diff --git a/tests/ovn.at b/tests/ovn.at > > index 166b5f72e..cf027703f 100644 > > --- a/tests/ovn.at > > +++ b/tests/ovn.at > > [...] > > > @@ -10391,7 +10395,7 @@ test_dns6() { > > } > > > > AT_CAPTURE_FILE([ofctl_monitor0.log]) > > -as hv1 ovs-ofctl monitor br-int resume --detach --no-chdir \ > > +as hv1 ovs-ofctl -t 300 monitor br-int resume --detach --no-chdir \ > > --pidfile=ovs-ofctl0.pid 2> ofctl_monitor0.log > > > > Is this really needed? We set OVS_CTL_TIMEOUT=30 in atlocal.in. > > Thanks for looking into this. Yes :-) The issue here is that we start an ovs-ofctl monitor, then run a few times (send packet / check monitor.log). The test is quite long, and on slow systems (e.g. VM running s390 architecture on a x86_64 host), the 30 seconds are not enough for all tests, and hence monitoring stops. Then we check (for 30 seconds) whether monitor.log contains what we expect ... 300 seconds might be a little much... just depends on how slow the system is... Requires at least one minute on my system... Thanks Xavier Thanks, > Dumitru > >
On 3/30/22 06:40, Xavier Simonart wrote: > Hi Dumitru > > On Mon, Mar 21, 2022 at 12:02 PM Dumitru Ceara <dceara@redhat.com> wrote: > >> On 3/21/22 10:56, Xavier Simonart wrote: >>> - send gratuitous arp on localnet >>> - send gratuitous arp for nat ips in localnet >>> - dns lookup : 1 HV, 2 LS, 2 LSPs/LS -- ovn-northd -- dp-groups=yes >>> - send gratuitous arp for NAT rules on distributed router >>> - 2 HVs, 1 lport/HV, localport ports >>> >>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> >>> --- >> >> Hi Xavier, >> >>> tests/ovn.at | 20 +++++++++++++------- >>> 1 file changed, 13 insertions(+), 7 deletions(-) >>> >>> diff --git a/tests/ovn.at b/tests/ovn.at >>> index 166b5f72e..cf027703f 100644 >>> --- a/tests/ovn.at >>> +++ b/tests/ovn.at >> >> [...] >> >>> @@ -10391,7 +10395,7 @@ test_dns6() { >>> } >>> >>> AT_CAPTURE_FILE([ofctl_monitor0.log]) >>> -as hv1 ovs-ofctl monitor br-int resume --detach --no-chdir \ >>> +as hv1 ovs-ofctl -t 300 monitor br-int resume --detach --no-chdir \ >>> --pidfile=ovs-ofctl0.pid 2> ofctl_monitor0.log >>> >> >> Is this really needed? We set OVS_CTL_TIMEOUT=30 in atlocal.in. >> >> Thanks for looking into this. > > Yes :-) > The issue here is that we start an ovs-ofctl monitor, then run a few times > (send packet / check monitor.log). > The test is quite long, and on slow systems (e.g. VM running s390 > architecture on a x86_64 host), the 30 seconds are not enough for all > tests, and hence monitoring stops. Then we check (for 30 seconds) whether > monitor.log contains what we expect ... > 300 seconds might be a little much... just depends on how slow the system > is... Requires at least one minute on my system... Thanks for the clarification on this. Acked-by: Mark Michelson I think this should be committed as-is since it's fixing the flakiness of the tests. With regards to the DNS test in particular, my assumption is that the long timeout is needed because the DNS queries are falling back to system default behavior when OVN is not able to answer the queries successfully. This can take a very long time since the system will wait a certain amount of time for a response and may retry the query a multiple times as well. If there were some way to temporarily override default system behavior it may end up speeding up the test and eliminating the need to set the timeout so large. It's also possible I'm completely wrong in the above paragraph :) > > Thanks > Xavier > > Thanks, >> Dumitru >> >> > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On 4/5/22 17:22, Mark Michelson wrote: > On 3/30/22 06:40, Xavier Simonart wrote: >> Hi Dumitru >> >> On Mon, Mar 21, 2022 at 12:02 PM Dumitru Ceara <dceara@redhat.com> wrote: >> >>> On 3/21/22 10:56, Xavier Simonart wrote: >>>> - send gratuitous arp on localnet >>>> - send gratuitous arp for nat ips in localnet >>>> - dns lookup : 1 HV, 2 LS, 2 LSPs/LS -- ovn-northd -- dp-groups=yes >>>> - send gratuitous arp for NAT rules on distributed router >>>> - 2 HVs, 1 lport/HV, localport ports >>>> >>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> >>>> --- >>> >>> Hi Xavier, >>> >>>> tests/ovn.at | 20 +++++++++++++------- >>>> 1 file changed, 13 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/tests/ovn.at b/tests/ovn.at >>>> index 166b5f72e..cf027703f 100644 >>>> --- a/tests/ovn.at >>>> +++ b/tests/ovn.at >>> >>> [...] >>> >>>> @@ -10391,7 +10395,7 @@ test_dns6() { >>>> } >>>> >>>> AT_CAPTURE_FILE([ofctl_monitor0.log]) >>>> -as hv1 ovs-ofctl monitor br-int resume --detach --no-chdir \ >>>> +as hv1 ovs-ofctl -t 300 monitor br-int resume --detach --no-chdir \ >>>> --pidfile=ovs-ofctl0.pid 2> ofctl_monitor0.log >>>> >>> >>> Is this really needed? We set OVS_CTL_TIMEOUT=30 in atlocal.in. >>> >>> Thanks for looking into this. >> >> Yes :-) >> The issue here is that we start an ovs-ofctl monitor, then run a few >> times >> (send packet / check monitor.log). >> The test is quite long, and on slow systems (e.g. VM running s390 >> architecture on a x86_64 host), the 30 seconds are not enough for all >> tests, and hence monitoring stops. Then we check (for 30 seconds) whether >> monitor.log contains what we expect ... >> 300 seconds might be a little much... just depends on how slow the system >> is... Requires at least one minute on my system... > > Thanks for the clarification on this. > > Acked-by: Mark Michelson > > I think this should be committed as-is since it's fixing the flakiness > of the tests. > Sounds good to me too, the CI has been quite unhappy lately. > With regards to the DNS test in particular, my assumption is that the > long timeout is needed because the DNS queries are falling back to > system default behavior when OVN is not able to answer the queries > successfully. This can take a very long time since the system will wait > a certain amount of time for a response and may retry the query a > multiple times as well. If there were some way to temporarily override > default system behavior it may end up speeding up the test and > eliminating the need to set the timeout so large. > > It's also possible I'm completely wrong in the above paragraph :) > >> >> Thanks >> Xavier >> >> Thanks, >>> Dumitru >>> >>> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >
On Tue, Apr 5, 2022 at 12:07 PM Dumitru Ceara <dceara@redhat.com> wrote: > > On 4/5/22 17:22, Mark Michelson wrote: > > On 3/30/22 06:40, Xavier Simonart wrote: > >> Hi Dumitru > >> > >> On Mon, Mar 21, 2022 at 12:02 PM Dumitru Ceara <dceara@redhat.com> wrote: > >> > >>> On 3/21/22 10:56, Xavier Simonart wrote: > >>>> - send gratuitous arp on localnet > >>>> - send gratuitous arp for nat ips in localnet > >>>> - dns lookup : 1 HV, 2 LS, 2 LSPs/LS -- ovn-northd -- dp-groups=yes > >>>> - send gratuitous arp for NAT rules on distributed router > >>>> - 2 HVs, 1 lport/HV, localport ports > >>>> > >>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > >>>> --- > >>> > >>> Hi Xavier, > >>> > >>>> tests/ovn.at | 20 +++++++++++++------- > >>>> 1 file changed, 13 insertions(+), 7 deletions(-) > >>>> > >>>> diff --git a/tests/ovn.at b/tests/ovn.at > >>>> index 166b5f72e..cf027703f 100644 > >>>> --- a/tests/ovn.at > >>>> +++ b/tests/ovn.at > >>> > >>> [...] > >>> > >>>> @@ -10391,7 +10395,7 @@ test_dns6() { > >>>> } > >>>> > >>>> AT_CAPTURE_FILE([ofctl_monitor0.log]) > >>>> -as hv1 ovs-ofctl monitor br-int resume --detach --no-chdir \ > >>>> +as hv1 ovs-ofctl -t 300 monitor br-int resume --detach --no-chdir \ > >>>> --pidfile=ovs-ofctl0.pid 2> ofctl_monitor0.log > >>>> > >>> > >>> Is this really needed? We set OVS_CTL_TIMEOUT=30 in atlocal.in. > >>> > >>> Thanks for looking into this. > >> > >> Yes :-) > >> The issue here is that we start an ovs-ofctl monitor, then run a few > >> times > >> (send packet / check monitor.log). > >> The test is quite long, and on slow systems (e.g. VM running s390 > >> architecture on a x86_64 host), the 30 seconds are not enough for all > >> tests, and hence monitoring stops. Then we check (for 30 seconds) whether > >> monitor.log contains what we expect ... > >> 300 seconds might be a little much... just depends on how slow the system > >> is... Requires at least one minute on my system... > > > > Thanks for the clarification on this. > > > > Acked-by: Mark Michelson > > > > I think this should be committed as-is since it's fixing the flakiness > > of the tests. > > > > Sounds good to me too, the CI has been quite unhappy lately. > > > With regards to the DNS test in particular, my assumption is that the > > long timeout is needed because the DNS queries are falling back to > > system default behavior when OVN is not able to answer the queries > > successfully. This can take a very long time since the system will wait > > a certain amount of time for a response and may retry the query a > > multiple times as well. If there were some way to temporarily override > > default system behavior it may end up speeding up the test and > > eliminating the need to set the timeout so large. > > > > It's also possible I'm completely wrong in the above paragraph :) > > > >> Thanks, I applied this patch to the main branch. Numan > >> Thanks > >> Xavier > >> > >> Thanks, > >>> Dumitru > >>> > >>> > >> _______________________________________________ > >> dev mailing list > >> dev@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >> > > > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff --git a/tests/ovn.at b/tests/ovn.at index 166b5f72e..cf027703f 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -69,6 +69,10 @@ m4_define([OVN_CHECK_PACKETS], [ovn_check_packets__ "$1" "$2" "__file__:__line__" AT_CHECK([sort $rcv_text], [0], [expout], [ignore], [dump_diff__ "$1" "$2"])]) +m4_define([OVN_CHECK_PACKETS_UNIQ], + [ovn_check_packets__ "$1" "$2" "__file__:__line__" + AT_CHECK([sort $rcv_text | uniq], [0], [expout])]) + m4_define([OVN_CHECK_PACKETS_REMOVE_BROADCAST], [ovn_check_packets_remove_broadcast__ "$1" "$2" "__file__:__line__" AT_CHECK([sort $rcv_text], [0], [expout], [ignore], [dump_diff__ "$1" "$2"])]) @@ -5889,7 +5893,7 @@ AT_CHECK([ovs-vsctl add-port br-int localvif1 -- set Interface localvif1 externa # Wait for packet to be received. echo "fffffffffffff0000000000108060001080006040001f00000000001c0a80102000000000000c0a80102" > expected -OVN_CHECK_PACKETS([hv/snoopvif-tx.pcap], [expected]) +OVN_CHECK_PACKETS_UNIQ([hv/snoopvif-tx.pcap], [expected]) # Check GARP packet when restart openflow connection. as hv @@ -5902,7 +5906,7 @@ start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif -vunixctl # Wait for packet to be received. echo "fffffffffffff0000000000108060001080006040001f00000000001c0a80102000000000000c0a80102" > expected -OVN_CHECK_PACKETS([hv/snoopvif-tx.pcap], [expected]) +OVN_CHECK_PACKETS_UNIQ([hv/snoopvif-tx.pcap], [expected]) # Delete the localnet ports. AT_CHECK([ovs-vsctl del-port localvif1]) @@ -8636,7 +8640,7 @@ OVS_WAIT_UNTIL([test `wc -c < "hv1/snoopvif-tx.pcap"` -ge 140]) trim_zeros() { sed 's/\(00\)\{1,\}$//' } -$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/snoopvif-tx.pcap | trim_zeros > packets +$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/snoopvif-tx.pcap | trim_zeros |sort | uniq > packets AT_CHECK([sort packets], [0], [dnl fffffffffffff0000000000108060001080006040001f00000000001c0a80001000000000000c0a80001 fffffffffffff0000000000108060001080006040001f00000000001c0a80002000000000000c0a80002 @@ -8733,7 +8737,7 @@ OVS_WAIT_UNTIL([test `wc -c < "hv1/snoopvif-tx.pcap"` -ge 50]) trim_zeros() { sed 's/\(00\)\{1,\}$//' } -$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/snoopvif-tx.pcap | trim_zeros > packets +$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/snoopvif-tx.pcap | trim_zeros | sort | uniq > packets expected="fffffffffffff0000000000108060001080006040001f00000000001c0a80001000000000000c0a80001" echo $expected > expout expected="fffffffffffff0000000000108060001080006040001f00000000001c0a80002000000000000c0a80002" @@ -10391,7 +10395,7 @@ test_dns6() { } AT_CAPTURE_FILE([ofctl_monitor0.log]) -as hv1 ovs-ofctl monitor br-int resume --detach --no-chdir \ +as hv1 ovs-ofctl -t 300 monitor br-int resume --detach --no-chdir \ --pidfile=ovs-ofctl0.pid 2> ofctl_monitor0.log set_dns_params vm2 @@ -11607,7 +11611,7 @@ expected="fffffffffffff0000000000108060001080006040001f00000000001c0a80001000000 echo $expected > expout expected="fffffffffffff0000000000108060001080006040001f00000000001c0a80002000000000000c0a80002" echo $expected >> expout -AT_CHECK([sort packets], [0], [expout]) +AT_CHECK([sort packets | uniq], [0], [expout]) sort packets | cat # Temporarily remove nat-addresses option to avoid race conditions @@ -11650,7 +11654,7 @@ trim_zeros() { sed 's/\(00\)\{1,\}$//' } -$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/snoopvif-tx.pcap | trim_zeros > packets +$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/snoopvif-tx.pcap | trim_zeros | sort | uniq > packets garp_1="fffffffffffff0000000000308060001080006040001f00000000003c0a80003000000000000c0a80003" echo $garp_1 > expout garp_2="fffffffffffff0000000000408060001080006040001f00000000004c0a80004000000000000c0a80004" @@ -12494,6 +12498,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int |awk '/table=65/{print substr($8, 1 # remove the localport from br-int and re-create it as hv1 check ovs-vsctl del-port vif01 +check ovn-nbctl --wait=hv sync AT_CHECK([as hv1 ovs-ofctl dump-flows br-int |awk '/table=65/{print substr($8, 16, length($8))}' |sort -n], [0], [dnl 11 ]) @@ -12501,6 +12506,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int |awk '/table=65/{print substr($8, 1 as hv1 check ovs-vsctl add-port br-int vif01 \ -- set Interface vif01 external-ids:iface-id=lp01 +check ovn-nbctl --wait=hv sync AT_CHECK([as hv1 ovs-ofctl dump-flows br-int |awk '/table=65/{print substr($8, 16, length($8))}' |sort -n], [0], [dnl 2 11
- send gratuitous arp on localnet - send gratuitous arp for nat ips in localnet - dns lookup : 1 HV, 2 LS, 2 LSPs/LS -- ovn-northd -- dp-groups=yes - send gratuitous arp for NAT rules on distributed router - 2 HVs, 1 lport/HV, localport ports Signed-off-by: Xavier Simonart <xsimonar@redhat.com> --- tests/ovn.at | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-)