diff mbox series

[ovs-dev,1/1] tests: Work around ovn-controller incremental processing bugs.

Message ID 20201124225029.990905-2-blp@ovn.org
State Not Applicable
Headers show
Series OVN patch and discussion for incremental processing bugs | expand

Commit Message

Ben Pfaff Nov. 24, 2020, 10:50 p.m. UTC
The tests "superseding ACLs with conjunction" and "ARP replies for SNAT
external ips" trigger bugs in the ovn-controller incremental processing
logic.  This works around those bugs.

Signed-off-by: Ben Pfaff <blp@ovn.org>
---
 tests/ovn.at | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Numan Siddique Nov. 25, 2020, 7:43 a.m. UTC | #1
On Wed, Nov 25, 2020 at 4:21 AM Ben Pfaff <blp@ovn.org> wrote:
>
> The tests "superseding ACLs with conjunction" and "ARP replies for SNAT
> external ips" trigger bugs in the ovn-controller incremental processing
> logic.  This works around those bugs.
>

> Signed-off-by: Ben Pfaff <blp@ovn.org>

Can you please try test case - "ARP replies for SNAT external ips"
with the latest OVN master ?

The commit https://github.com/ovn-org/ovn/commit/53f60c7ab742cba0b3dd84b73658e0bbd44ec145
should solve this issue.

I will take a look into the other test case - "superseding ACLs with
conjunction".

Thanks
Numan

> ---
>  tests/ovn.at | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 9a9b8a50790e..905fcccba500 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -13648,6 +13648,11 @@ ovn-nbctl acl-add ls1 to-lport 3 '(ip4.src==10.0.0.1 || ip4.src==10.0.0.2) && (i
>  ovn-nbctl acl-add ls1 to-lport 3 '(ip4.src==10.0.0.1 || ip4.src==10.0.0.42) && (ip4.dst == 10.0.0.3 || ip4.dst == 10.0.0.4)' allow
>  ovn-nbctl --wait=hv sync
>
> +# There's a bug in ovn-controller that usually makes this test fail
> +# without the following (more often with ovn-northd than ovn-northd-ddlog).
> +check as hv1 ovs-appctl -t ovn-controller recompute
> +sleep 1
> +
>  # Traffic 10.0.0.1, 10.0.0.2 -> 10.0.0.3, 10.0.0.4 should be allowed.
>  for src in `seq 1 2`; do
>      for dst in `seq 3 4`; do
> @@ -22243,6 +22248,14 @@ send_arp_request() {
>      local arp=0001080006040001${eth_src}${spa}${eth_dst}${tpa}
>
>      local request=${eth}${arp}
> +
> +    # There's a bug in ovn-controller incremental processing that
> +    # makes this test fail most of the time without forcing full
> +    # recomputation.
> +    check as hv1 ovs-appctl -t ovn-controller recompute
> +    check as hv2 ovs-appctl -t ovn-controller recompute
> +    sleep 1
> +
>      as hv2 ovs-appctl netdev-dummy/receive hv2-phys1 $request
>  }
>
> --
> 2.26.2
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Ben Pfaff Nov. 26, 2020, 5:24 a.m. UTC | #2
On Wed, Nov 25, 2020 at 01:13:22PM +0530, Numan Siddique wrote:
> On Wed, Nov 25, 2020 at 4:21 AM Ben Pfaff <blp@ovn.org> wrote:
> >
> > The tests "superseding ACLs with conjunction" and "ARP replies for SNAT
> > external ips" trigger bugs in the ovn-controller incremental processing
> > logic.  This works around those bugs.
> >
> 
> > Signed-off-by: Ben Pfaff <blp@ovn.org>
> 
> Can you please try test case - "ARP replies for SNAT external ips"
> with the latest OVN master ?
> 
> The commit https://github.com/ovn-org/ovn/commit/53f60c7ab742cba0b3dd84b73658e0bbd44ec145
> should solve this issue.
> 
> I will take a look into the other test case - "superseding ACLs with
> conjunction".

It does solve the issues that this was meant to fix.

The following tests still segfault in ovn-controlle:

269: ovn -- controller I-P handling with monitoring disabled -- ovn-northd-ddlog FAILED (ovs-macros.at:253)
301: ovn -- ovn-controller incremental processing    FAILED (ovn-performance.at:542)

with backtraces that look like the following.  If this is because of a
bug I introduced into ovsdb-idl, I think it has to be a subtle one...

#0  0x0000000000413e00 in handle_deleted_lport (pb=0x110c550, 
    b_ctx_in=0x7ffea1c813d0, b_ctx_out=0x7ffea1c81380)
    at ../controller/binding.c:1982
#1  0x000000000041628e in binding_handle_port_binding_changes (
    b_ctx_in=b_ctx_in@entry=0x7ffea1c813d0, 
    b_ctx_out=b_ctx_out@entry=0x7ffea1c81380) at ../controller/binding.c:2153
#2  0x0000000000434650 in runtime_data_sb_port_binding_handler (
    node=0x7ffea1c82730, data=0x10ad150) at ../controller/ovn-controller.c:1471
#3  0x00007f0016dff4ab in engine_compute (recompute_allowed=<optimized out>, 
    node=<optimized out>) at ../lib/inc-proc-eng.c:306
#4  engine_run_node (recompute_allowed=true, node=0x7ffea1c82730)
    at ../lib/inc-proc-eng.c:352
#5  engine_run (recompute_allowed=recompute_allowed@entry=true)
    at ../lib/inc-proc-eng.c:377
#6  0x0000000000411a4d in main (argc=<optimized out>, argv=<optimized out>)
    at ../controller/ovn-controller.c:2747
Numan Siddique Nov. 26, 2020, 6 a.m. UTC | #3
On Thu, Nov 26, 2020 at 10:54 AM Ben Pfaff <blp@ovn.org> wrote:
>
> On Wed, Nov 25, 2020 at 01:13:22PM +0530, Numan Siddique wrote:
> > On Wed, Nov 25, 2020 at 4:21 AM Ben Pfaff <blp@ovn.org> wrote:
> > >
> > > The tests "superseding ACLs with conjunction" and "ARP replies for SNAT
> > > external ips" trigger bugs in the ovn-controller incremental processing
> > > logic.  This works around those bugs.
> > >
> >
> > > Signed-off-by: Ben Pfaff <blp@ovn.org>
> >
> > Can you please try test case - "ARP replies for SNAT external ips"
> > with the latest OVN master ?
> >
> > The commit https://github.com/ovn-org/ovn/commit/53f60c7ab742cba0b3dd84b73658e0bbd44ec145
> > should solve this issue.
> >
> > I will take a look into the other test case - "superseding ACLs with
> > conjunction".
>
> It does solve the issues that this was meant to fix.
>
> The following tests still segfault in ovn-controlle:
>
> 269: ovn -- controller I-P handling with monitoring disabled -- ovn-northd-ddlog FAILED (ovs-macros.at:253)
> 301: ovn -- ovn-controller incremental processing    FAILED (ovn-performance.at:542)
>
> with backtraces that look like the following.  If this is because of a
> bug I introduced into ovsdb-idl, I think it has to be a subtle one...
>
> #0  0x0000000000413e00 in handle_deleted_lport (pb=0x110c550,
>     b_ctx_in=0x7ffea1c813d0, b_ctx_out=0x7ffea1c81380)
>     at ../controller/binding.c:1982
> #1  0x000000000041628e in binding_handle_port_binding_changes (
>     b_ctx_in=b_ctx_in@entry=0x7ffea1c813d0,
>     b_ctx_out=b_ctx_out@entry=0x7ffea1c81380) at ../controller/binding.c:2153
> #2  0x0000000000434650 in runtime_data_sb_port_binding_handler (
>     node=0x7ffea1c82730, data=0x10ad150) at ../controller/ovn-controller.c:1471
> #3  0x00007f0016dff4ab in engine_compute (recompute_allowed=<optimized out>,
>     node=<optimized out>) at ../lib/inc-proc-eng.c:306
> #4  engine_run_node (recompute_allowed=true, node=0x7ffea1c82730)
>     at ../lib/inc-proc-eng.c:352
> #5  engine_run (recompute_allowed=recompute_allowed@entry=true)
>     at ../lib/inc-proc-eng.c:377
> #6  0x0000000000411a4d in main (argc=<optimized out>, argv=<optimized out>)
>     at ../controller/ovn-controller.c:2747

With your IDL CS patch series, I'm seeing 100% failure for
"ovn-controller incremental processing" test case.
I think ovn-controller should not segfault. Thanks for the backtrace.
I will look into it.

Thanks
Numan


> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Numan Siddique Nov. 26, 2020, 1:05 p.m. UTC | #4
On Thu, Nov 26, 2020 at 11:30 AM Numan Siddique <numans@ovn.org> wrote:
>
> On Thu, Nov 26, 2020 at 10:54 AM Ben Pfaff <blp@ovn.org> wrote:
> >
> > On Wed, Nov 25, 2020 at 01:13:22PM +0530, Numan Siddique wrote:
> > > On Wed, Nov 25, 2020 at 4:21 AM Ben Pfaff <blp@ovn.org> wrote:
> > > >
> > > > The tests "superseding ACLs with conjunction" and "ARP replies for SNAT
> > > > external ips" trigger bugs in the ovn-controller incremental processing
> > > > logic.  This works around those bugs.
> > > >
> > >
> > > > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > >
> > > Can you please try test case - "ARP replies for SNAT external ips"
> > > with the latest OVN master ?
> > >
> > > The commit https://github.com/ovn-org/ovn/commit/53f60c7ab742cba0b3dd84b73658e0bbd44ec145
> > > should solve this issue.
> > >
> > > I will take a look into the other test case - "superseding ACLs with
> > > conjunction".
> >
> > It does solve the issues that this was meant to fix.
> >
> > The following tests still segfault in ovn-controlle:
> >
> > 269: ovn -- controller I-P handling with monitoring disabled -- ovn-northd-ddlog FAILED (ovs-macros.at:253)
> > 301: ovn -- ovn-controller incremental processing    FAILED (ovn-performance.at:542)
> >
> > with backtraces that look like the following.  If this is because of a
> > bug I introduced into ovsdb-idl, I think it has to be a subtle one...
> >
> > #0  0x0000000000413e00 in handle_deleted_lport (pb=0x110c550,
> >     b_ctx_in=0x7ffea1c813d0, b_ctx_out=0x7ffea1c81380)
> >     at ../controller/binding.c:1982
> > #1  0x000000000041628e in binding_handle_port_binding_changes (
> >     b_ctx_in=b_ctx_in@entry=0x7ffea1c813d0,
> >     b_ctx_out=b_ctx_out@entry=0x7ffea1c81380) at ../controller/binding.c:2153
> > #2  0x0000000000434650 in runtime_data_sb_port_binding_handler (
> >     node=0x7ffea1c82730, data=0x10ad150) at ../controller/ovn-controller.c:1471
> > #3  0x00007f0016dff4ab in engine_compute (recompute_allowed=<optimized out>,
> >     node=<optimized out>) at ../lib/inc-proc-eng.c:306
> > #4  engine_run_node (recompute_allowed=true, node=0x7ffea1c82730)
> >     at ../lib/inc-proc-eng.c:352
> > #5  engine_run (recompute_allowed=recompute_allowed@entry=true)
> >     at ../lib/inc-proc-eng.c:377
> > #6  0x0000000000411a4d in main (argc=<optimized out>, argv=<optimized out>)
> >     at ../controller/ovn-controller.c:2747
>
> With your IDL CS patch series, I'm seeing 100% failure for
> "ovn-controller incremental processing" test case.
> I think ovn-controller should not segfault. Thanks for the backtrace.
> I will look into it.
>

Hi Ben,

The crash is seen because in binding.c, we access port_binding->datapath column.

Since the 'datapath' column of the Port_Binding table has a  strong
reference to the Datapath_binding table, this column
should never be NULL, right ?

Since the crash is seen with the tracked data, maybe your IDL CS
patchset needs some handling in the tracked code in IDL ?

Thanks
Numan


> Thanks
> Numan
>
>
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
Ben Pfaff Nov. 26, 2020, 4:30 p.m. UTC | #5
On Thu, Nov 26, 2020 at 06:35:44PM +0530, Numan Siddique wrote:
> On Thu, Nov 26, 2020 at 11:30 AM Numan Siddique <numans@ovn.org> wrote:
> >
> > On Thu, Nov 26, 2020 at 10:54 AM Ben Pfaff <blp@ovn.org> wrote:
> > >
> > > On Wed, Nov 25, 2020 at 01:13:22PM +0530, Numan Siddique wrote:
> > > > On Wed, Nov 25, 2020 at 4:21 AM Ben Pfaff <blp@ovn.org> wrote:
> > > > >
> > > > > The tests "superseding ACLs with conjunction" and "ARP replies for SNAT
> > > > > external ips" trigger bugs in the ovn-controller incremental processing
> > > > > logic.  This works around those bugs.
> > > > >
> > > >
> > > > > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > > >
> > > > Can you please try test case - "ARP replies for SNAT external ips"
> > > > with the latest OVN master ?
> > > >
> > > > The commit https://github.com/ovn-org/ovn/commit/53f60c7ab742cba0b3dd84b73658e0bbd44ec145
> > > > should solve this issue.
> > > >
> > > > I will take a look into the other test case - "superseding ACLs with
> > > > conjunction".
> > >
> > > It does solve the issues that this was meant to fix.
> > >
> > > The following tests still segfault in ovn-controlle:
> > >
> > > 269: ovn -- controller I-P handling with monitoring disabled -- ovn-northd-ddlog FAILED (ovs-macros.at:253)
> > > 301: ovn -- ovn-controller incremental processing    FAILED (ovn-performance.at:542)
> > >
> > > with backtraces that look like the following.  If this is because of a
> > > bug I introduced into ovsdb-idl, I think it has to be a subtle one...
> > >
> > > #0  0x0000000000413e00 in handle_deleted_lport (pb=0x110c550,
> > >     b_ctx_in=0x7ffea1c813d0, b_ctx_out=0x7ffea1c81380)
> > >     at ../controller/binding.c:1982
> > > #1  0x000000000041628e in binding_handle_port_binding_changes (
> > >     b_ctx_in=b_ctx_in@entry=0x7ffea1c813d0,
> > >     b_ctx_out=b_ctx_out@entry=0x7ffea1c81380) at ../controller/binding.c:2153
> > > #2  0x0000000000434650 in runtime_data_sb_port_binding_handler (
> > >     node=0x7ffea1c82730, data=0x10ad150) at ../controller/ovn-controller.c:1471
> > > #3  0x00007f0016dff4ab in engine_compute (recompute_allowed=<optimized out>,
> > >     node=<optimized out>) at ../lib/inc-proc-eng.c:306
> > > #4  engine_run_node (recompute_allowed=true, node=0x7ffea1c82730)
> > >     at ../lib/inc-proc-eng.c:352
> > > #5  engine_run (recompute_allowed=recompute_allowed@entry=true)
> > >     at ../lib/inc-proc-eng.c:377
> > > #6  0x0000000000411a4d in main (argc=<optimized out>, argv=<optimized out>)
> > >     at ../controller/ovn-controller.c:2747
> >
> > With your IDL CS patch series, I'm seeing 100% failure for
> > "ovn-controller incremental processing" test case.
> > I think ovn-controller should not segfault. Thanks for the backtrace.
> > I will look into it.
> >
> 
> Hi Ben,
> 
> The crash is seen because in binding.c, we access port_binding->datapath column.
> 
> Since the 'datapath' column of the Port_Binding table has a  strong
> reference to the Datapath_binding table, this column
> should never be NULL, right ?
> 
> Since the crash is seen with the tracked data, maybe your IDL CS
> patchset needs some handling in the tracked code in IDL ?

OK, I will look into it.  Thanks.
Ben Pfaff Dec. 2, 2020, 6:27 a.m. UTC | #6
On Thu, Nov 26, 2020 at 06:35:44PM +0530, Numan Siddique wrote:
> On Thu, Nov 26, 2020 at 11:30 AM Numan Siddique <numans@ovn.org> wrote:
> >
> > On Thu, Nov 26, 2020 at 10:54 AM Ben Pfaff <blp@ovn.org> wrote:
> > >
> > > On Wed, Nov 25, 2020 at 01:13:22PM +0530, Numan Siddique wrote:
> > > > On Wed, Nov 25, 2020 at 4:21 AM Ben Pfaff <blp@ovn.org> wrote:
> > > > >
> > > > > The tests "superseding ACLs with conjunction" and "ARP replies for SNAT
> > > > > external ips" trigger bugs in the ovn-controller incremental processing
> > > > > logic.  This works around those bugs.
> > > > >
> > > >
> > > > > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > > >
> > > > Can you please try test case - "ARP replies for SNAT external ips"
> > > > with the latest OVN master ?
> > > >
> > > > The commit https://github.com/ovn-org/ovn/commit/53f60c7ab742cba0b3dd84b73658e0bbd44ec145
> > > > should solve this issue.
> > > >
> > > > I will take a look into the other test case - "superseding ACLs with
> > > > conjunction".
> > >
> > > It does solve the issues that this was meant to fix.
> > >
> > > The following tests still segfault in ovn-controlle:
> > >
> > > 269: ovn -- controller I-P handling with monitoring disabled -- ovn-northd-ddlog FAILED (ovs-macros.at:253)
> > > 301: ovn -- ovn-controller incremental processing    FAILED (ovn-performance.at:542)
> > >
> > > with backtraces that look like the following.  If this is because of a
> > > bug I introduced into ovsdb-idl, I think it has to be a subtle one...
> > >
> > > #0  0x0000000000413e00 in handle_deleted_lport (pb=0x110c550,
> > >     b_ctx_in=0x7ffea1c813d0, b_ctx_out=0x7ffea1c81380)
> > >     at ../controller/binding.c:1982
> > > #1  0x000000000041628e in binding_handle_port_binding_changes (
> > >     b_ctx_in=b_ctx_in@entry=0x7ffea1c813d0,
> > >     b_ctx_out=b_ctx_out@entry=0x7ffea1c81380) at ../controller/binding.c:2153
> > > #2  0x0000000000434650 in runtime_data_sb_port_binding_handler (
> > >     node=0x7ffea1c82730, data=0x10ad150) at ../controller/ovn-controller.c:1471
> > > #3  0x00007f0016dff4ab in engine_compute (recompute_allowed=<optimized out>,
> > >     node=<optimized out>) at ../lib/inc-proc-eng.c:306
> > > #4  engine_run_node (recompute_allowed=true, node=0x7ffea1c82730)
> > >     at ../lib/inc-proc-eng.c:352
> > > #5  engine_run (recompute_allowed=recompute_allowed@entry=true)
> > >     at ../lib/inc-proc-eng.c:377
> > > #6  0x0000000000411a4d in main (argc=<optimized out>, argv=<optimized out>)
> > >     at ../controller/ovn-controller.c:2747
> >
> > With your IDL CS patch series, I'm seeing 100% failure for
> > "ovn-controller incremental processing" test case.
> > I think ovn-controller should not segfault. Thanks for the backtrace.
> > I will look into it.
> >
> 
> Hi Ben,
> 
> The crash is seen because in binding.c, we access port_binding->datapath column.
> 
> Since the 'datapath' column of the Port_Binding table has a  strong
> reference to the Datapath_binding table, this column
> should never be NULL, right ?
> 
> Since the crash is seen with the tracked data, maybe your IDL CS
> patchset needs some handling in the tracked code in IDL ?

My OVS series could change the order in which updates to rows in a
single set of updates were applied to the IDL.  This order wasn't
predictable anyway (it just depended on the ordering of randomly
generated UUIDs), but apparently something in the IDL was sensitive to
it.  There's a probably a bug in the IDL related to this.

I posted a v2 of my patchset.  It exactly reproduces the application
order that the IDL previously used.  It's an improvement in another way
since the data structures are simpler and better.  And this workaround
patch can be dropped.

Thanks,

Ben.
Numan Siddique Dec. 3, 2020, 12:53 p.m. UTC | #7
On Wed, Dec 2, 2020 at 11:57 AM Ben Pfaff <blp@ovn.org> wrote:
>
> On Thu, Nov 26, 2020 at 06:35:44PM +0530, Numan Siddique wrote:
> > On Thu, Nov 26, 2020 at 11:30 AM Numan Siddique <numans@ovn.org> wrote:
> > >
> > > On Thu, Nov 26, 2020 at 10:54 AM Ben Pfaff <blp@ovn.org> wrote:
> > > >
> > > > On Wed, Nov 25, 2020 at 01:13:22PM +0530, Numan Siddique wrote:
> > > > > On Wed, Nov 25, 2020 at 4:21 AM Ben Pfaff <blp@ovn.org> wrote:
> > > > > >
> > > > > > The tests "superseding ACLs with conjunction" and "ARP replies for SNAT
> > > > > > external ips" trigger bugs in the ovn-controller incremental processing
> > > > > > logic.  This works around those bugs.
> > > > > >
> > > > >
> > > > > > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > > > >
> > > > > Can you please try test case - "ARP replies for SNAT external ips"
> > > > > with the latest OVN master ?
> > > > >
> > > > > The commit https://github.com/ovn-org/ovn/commit/53f60c7ab742cba0b3dd84b73658e0bbd44ec145
> > > > > should solve this issue.
> > > > >
> > > > > I will take a look into the other test case - "superseding ACLs with
> > > > > conjunction".
> > > >
> > > > It does solve the issues that this was meant to fix.
> > > >
> > > > The following tests still segfault in ovn-controlle:
> > > >
> > > > 269: ovn -- controller I-P handling with monitoring disabled -- ovn-northd-ddlog FAILED (ovs-macros.at:253)
> > > > 301: ovn -- ovn-controller incremental processing    FAILED (ovn-performance.at:542)
> > > >
> > > > with backtraces that look like the following.  If this is because of a
> > > > bug I introduced into ovsdb-idl, I think it has to be a subtle one...
> > > >
> > > > #0  0x0000000000413e00 in handle_deleted_lport (pb=0x110c550,
> > > >     b_ctx_in=0x7ffea1c813d0, b_ctx_out=0x7ffea1c81380)
> > > >     at ../controller/binding.c:1982
> > > > #1  0x000000000041628e in binding_handle_port_binding_changes (
> > > >     b_ctx_in=b_ctx_in@entry=0x7ffea1c813d0,
> > > >     b_ctx_out=b_ctx_out@entry=0x7ffea1c81380) at ../controller/binding.c:2153
> > > > #2  0x0000000000434650 in runtime_data_sb_port_binding_handler (
> > > >     node=0x7ffea1c82730, data=0x10ad150) at ../controller/ovn-controller.c:1471
> > > > #3  0x00007f0016dff4ab in engine_compute (recompute_allowed=<optimized out>,
> > > >     node=<optimized out>) at ../lib/inc-proc-eng.c:306
> > > > #4  engine_run_node (recompute_allowed=true, node=0x7ffea1c82730)
> > > >     at ../lib/inc-proc-eng.c:352
> > > > #5  engine_run (recompute_allowed=recompute_allowed@entry=true)
> > > >     at ../lib/inc-proc-eng.c:377
> > > > #6  0x0000000000411a4d in main (argc=<optimized out>, argv=<optimized out>)
> > > >     at ../controller/ovn-controller.c:2747
> > >
> > > With your IDL CS patch series, I'm seeing 100% failure for
> > > "ovn-controller incremental processing" test case.
> > > I think ovn-controller should not segfault. Thanks for the backtrace.
> > > I will look into it.
> > >
> >
> > Hi Ben,
> >
> > The crash is seen because in binding.c, we access port_binding->datapath column.
> >
> > Since the 'datapath' column of the Port_Binding table has a  strong
> > reference to the Datapath_binding table, this column
> > should never be NULL, right ?
> >
> > Since the crash is seen with the tracked data, maybe your IDL CS
> > patchset needs some handling in the tracked code in IDL ?
>
> My OVS series could change the order in which updates to rows in a
> single set of updates were applied to the IDL.  This order wasn't
> predictable anyway (it just depended on the ordering of randomly
> generated UUIDs), but apparently something in the IDL was sensitive to
> it.  There's a probably a bug in the IDL related to this.
>
> I posted a v2 of my patchset.  It exactly reproduces the application
> order that the IDL previously used.  It's an improvement in another way
> since the data structures are simpler and better.  And this workaround
> patch can be dropped.

Thanks for the update.

Numan

>
> Thanks,
>
> Ben.
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
diff mbox series

Patch

diff --git a/tests/ovn.at b/tests/ovn.at
index 9a9b8a50790e..905fcccba500 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -13648,6 +13648,11 @@  ovn-nbctl acl-add ls1 to-lport 3 '(ip4.src==10.0.0.1 || ip4.src==10.0.0.2) && (i
 ovn-nbctl acl-add ls1 to-lport 3 '(ip4.src==10.0.0.1 || ip4.src==10.0.0.42) && (ip4.dst == 10.0.0.3 || ip4.dst == 10.0.0.4)' allow
 ovn-nbctl --wait=hv sync
 
+# There's a bug in ovn-controller that usually makes this test fail
+# without the following (more often with ovn-northd than ovn-northd-ddlog).
+check as hv1 ovs-appctl -t ovn-controller recompute
+sleep 1
+
 # Traffic 10.0.0.1, 10.0.0.2 -> 10.0.0.3, 10.0.0.4 should be allowed.
 for src in `seq 1 2`; do
     for dst in `seq 3 4`; do
@@ -22243,6 +22248,14 @@  send_arp_request() {
     local arp=0001080006040001${eth_src}${spa}${eth_dst}${tpa}
 
     local request=${eth}${arp}
+
+    # There's a bug in ovn-controller incremental processing that
+    # makes this test fail most of the time without forcing full
+    # recomputation.
+    check as hv1 ovs-appctl -t ovn-controller recompute
+    check as hv2 ovs-appctl -t ovn-controller recompute
+    sleep 1
+
     as hv2 ovs-appctl netdev-dummy/receive hv2-phys1 $request
 }