diff mbox series

[ovs-dev] ovn-controller.c: Fix assertion failure during address set update.

Message ID 20230525160117.274973-1-hzhou@ovn.org
State Accepted
Headers show
Series [ovs-dev] ovn-controller.c: Fix assertion failure during address set update. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed

Commit Message

Han Zhou May 25, 2023, 4:01 p.m. UTC
When an address set is deleted and a new one is created immediately with
the same name, the OVSDB deletion and creation notifications can come to
ovn-controller within the same message, and the order of the deletion
and addition in the IDL is undefined. In this case, if the deletion
happens to be processed after the addition, it will result in problems.

If the content of the new address set is the same as the old one, the
addition will be handled as no change to an existing address set and
ignored, but later when handling the deletion, the related flows will be
deleted, while we actually expect no change to the flows.

If the content of the new address set is different from the old one, the
addition will be handled as an update to the old address set, and the
delta will be tracked in the I-P engine node data's "updated" member.
Later when handling the deletion, the address set will be deleted from
the engine node data. So at this point the tracked change in "updated"
doesn't have a related address set object in the data, so later when the
lflow node handles the addr_sets changes, the assertion fails as shown
in below example backtrace:

0  0x00007ffbe5fcabc5 in raise () from /lib64/libc.so.6
1  0x00007ffbe5fb38a4 in abort () from /lib64/libc.so.6
2  0x00000000004f579e in ovs_abort_valist (err_no=<optimized out>, format=<optimized out>, args=<optimized out>) at ../lib/util.c:444
3  0x00000000004fd001 in vlog_abort_valist (module_=<optimized out>, message=0x5e6fd0 "%s: assertion %s failed in %s()", args=args@entry=0x7ffed304caa8) at ../lib/vlog.c:1249
4  0x00000000004fd097 in vlog_abort (module=<optimized out>, message=<optimized out>) at ../lib/vlog.c:1263
5  0x00000000004f54e1 in ovs_assert_failure (where=<optimized out>, function=<optimized out>, condition=<optimized out>) at ../lib/util.c:86
6  0x000000000041941d in as_update_can_be_handled (l_ctx_in=0x7ffed304cda0, l_ctx_in=0x7ffed304cda0, as_diff=0x25520f0, as_name=0x2550a00 "as1") at ../controller/lflow.c:766
7  lflow_handle_addr_set_update (as_name=0x2550a00 "as1", as_diff=0x25520f0, l_ctx_in=0x7ffed304cda0, l_ctx_out=0x7ffed304cd50, changed=0x7ffed304cd4f) at ../controller/lflow.c:869
8  0x0000000000438376 in lflow_output_addr_sets_handler (node=0x7ffed3054420, data=<optimized out>) at ../controller/ovn-controller.c:2648
9  0x000000000045708e in engine_compute (recompute_allowed=<optimized out>, node=<optimized out>) at ../lib/inc-proc-eng.c:417
10 engine_run_node (recompute_allowed=true, node=0x7ffed3054420) at ../lib/inc-proc-eng.c:479
11 engine_run (recompute_allowed=true) at ../lib/inc-proc-eng.c:504
12 0x000000000040939b in main (argc=<optimized out>, argv=<optimized out>) at ../controller/ovn-controller.c:3757

This patch fixes it by handling deletions before additions/updates,
similar to how we handle other changes such as lflow and port-groups.

Fixes: c0fe8dca767f ("ovn-controller: Tracking SB address set updates.")
Signed-off-by: Han Zhou <hzhou@ovn.org>
---
 controller/ovn-controller.c |  6 ++++-
 tests/ovn-controller.at     | 51 +++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+), 1 deletion(-)

Comments

Ales Musil May 26, 2023, 6:45 a.m. UTC | #1
On Thu, May 25, 2023 at 6:01 PM Han Zhou <hzhou@ovn.org> wrote:

> When an address set is deleted and a new one is created immediately with
> the same name, the OVSDB deletion and creation notifications can come to
> ovn-controller within the same message, and the order of the deletion
> and addition in the IDL is undefined. In this case, if the deletion
> happens to be processed after the addition, it will result in problems.
>
> If the content of the new address set is the same as the old one, the
> addition will be handled as no change to an existing address set and
> ignored, but later when handling the deletion, the related flows will be
> deleted, while we actually expect no change to the flows.
>
> If the content of the new address set is different from the old one, the
> addition will be handled as an update to the old address set, and the
> delta will be tracked in the I-P engine node data's "updated" member.
> Later when handling the deletion, the address set will be deleted from
> the engine node data. So at this point the tracked change in "updated"
> doesn't have a related address set object in the data, so later when the
> lflow node handles the addr_sets changes, the assertion fails as shown
> in below example backtrace:
>
> 0  0x00007ffbe5fcabc5 in raise () from /lib64/libc.so.6
> 1  0x00007ffbe5fb38a4 in abort () from /lib64/libc.so.6
> 2  0x00000000004f579e in ovs_abort_valist (err_no=<optimized out>,
> format=<optimized out>, args=<optimized out>) at ../lib/util.c:444
> 3  0x00000000004fd001 in vlog_abort_valist (module_=<optimized out>,
> message=0x5e6fd0 "%s: assertion %s failed in %s()", args=args@entry=0x7ffed304caa8)
> at ../lib/vlog.c:1249
> 4  0x00000000004fd097 in vlog_abort (module=<optimized out>,
> message=<optimized out>) at ../lib/vlog.c:1263
> 5  0x00000000004f54e1 in ovs_assert_failure (where=<optimized out>,
> function=<optimized out>, condition=<optimized out>) at ../lib/util.c:86
> 6  0x000000000041941d in as_update_can_be_handled
> (l_ctx_in=0x7ffed304cda0, l_ctx_in=0x7ffed304cda0, as_diff=0x25520f0,
> as_name=0x2550a00 "as1") at ../controller/lflow.c:766
> 7  lflow_handle_addr_set_update (as_name=0x2550a00 "as1",
> as_diff=0x25520f0, l_ctx_in=0x7ffed304cda0, l_ctx_out=0x7ffed304cd50,
> changed=0x7ffed304cd4f) at ../controller/lflow.c:869
> 8  0x0000000000438376 in lflow_output_addr_sets_handler
> (node=0x7ffed3054420, data=<optimized out>) at
> ../controller/ovn-controller.c:2648
> 9  0x000000000045708e in engine_compute (recompute_allowed=<optimized
> out>, node=<optimized out>) at ../lib/inc-proc-eng.c:417
> 10 engine_run_node (recompute_allowed=true, node=0x7ffed3054420) at
> ../lib/inc-proc-eng.c:479
> 11 engine_run (recompute_allowed=true) at ../lib/inc-proc-eng.c:504
> 12 0x000000000040939b in main (argc=<optimized out>, argv=<optimized out>)
> at ../controller/ovn-controller.c:3757
>
> This patch fixes it by handling deletions before additions/updates,
> similar to how we handle other changes such as lflow and port-groups.
>
> Fixes: c0fe8dca767f ("ovn-controller: Tracking SB address set updates.")
> Signed-off-by: Han Zhou <hzhou@ovn.org>
> ---
>  controller/ovn-controller.c |  6 ++++-
>  tests/ovn-controller.at     | 51 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 56 insertions(+), 1 deletion(-)
>
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 1151d3664478..2706dc628d6d 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -2015,7 +2015,11 @@ addr_sets_update(const struct
> sbrec_address_set_table *address_set_table,
>          if (sbrec_address_set_is_deleted(as)) {
>              expr_const_sets_remove(addr_sets, as->name);
>              sset_add(deleted, as->name);
> -        } else {
> +        }
> +    }
> +
> +    SBREC_ADDRESS_SET_TABLE_FOR_EACH_TRACKED (as, address_set_table) {
> +        if (!sbrec_address_set_is_deleted(as)) {
>              struct expr_constant_set *cs_old = shash_find_data(addr_sets,
>                                                                 as->name);
>              if (!cs_old) {
> diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> index 64d6a9336613..a95834e69eca 100644
> --- a/tests/ovn-controller.at
> +++ b/tests/ovn-controller.at
> @@ -2065,6 +2065,57 @@ AT_CHECK([echo $(($reprocess_count_new -
> $reprocess_count_old))], [0], [2
>  OVN_CLEANUP([hv1])
>  AT_CLEANUP
>
> +AT_SETUP([ovn-controller - address set del-and-add])
> +
> +ovn_start
> +
> +net_add n1
> +sim_add hv1
> +as hv1
> +check ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.1
> +check ovs-vsctl -- add-port br-int hv1-vif1 -- \
> +    set interface hv1-vif1 external-ids:iface-id=ls1-lp1
> +
> +check ovn-nbctl ls-add ls1
> +
> +check ovn-nbctl lsp-add ls1 ls1-lp1 \
> +-- lsp-set-addresses ls1-lp1 "f0:00:00:00:00:01"
> +
> +wait_for_ports_up
> +ovn-appctl -t ovn-controller vlog/set file:dbg
> +
> +ovn-nbctl create address_set name=as1 addresses=8.8.8.8
> +check ovn-nbctl acl-add ls1 to-lport 100 'outport == "ls1-lp1" && ip4.src
> == $as1' drop
> +check ovn-nbctl --wait=hv sync
> +AT_CHECK([ovs-ofctl dump-flows br-int table=44 | grep -c
> "priority=1100"], [0], [1
> +])
> +
> +# pause ovn-northd
> +check as northd ovn-appctl -t ovn-northd pause
> +check as northd-backup ovn-appctl -t ovn-northd pause
> +
> +# Simulate a SB address set "del and add" notification to ovn-controller
> in the
> +# same IDL iteration. The flows programmed by ovn-controller should
> reflect the
> +# newly added address set. In reality it can happen when CMS deletes an
> +# address-set and immediately creates a new address-set with the same name
> +# (with same or different content). The notification of the changes can
> come to
> +# ovn-controller in one shot and the order of the "del" and "add" in the
> IDL is
> +# undefined. This test runs the scenario ten times to make sure different
> +# orders are covered and handled properly.
> +
> +flow_count=$(ovs-ofctl dump-flows br-int table=44 | grep -c
> "priority=1100")
> +for i in $(seq 10); do
> +    # Delete and recreate the SB address set with same name and an extra
> IP.
> +    addrs_=$(fetch_column address_set addresses name=as1)
> +    addrs=${addrs_// /,}
> +    AT_CHECK([ovn-sbctl destroy address_set as1 -- create address_set
> name=as1 addresses=$addrs,1.1.1.$i], [0], [ignore])
> +    OVS_WAIT_UNTIL([test $(as hv1 ovs-ofctl dump-flows br-int table=44 |
> grep -c "priority=1100") = "$(($i + 1))"])
> +done
> +
> +OVN_CLEANUP([hv1])
> +AT_CLEANUP
> +
>  AT_SETUP([ovn-controller - I-P handle lb_hairpin_use_ct_mark change])
>
>  ovn_start --backup-northd=none
> --
> 2.30.2
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>

Looks good to me, thanks.

Acked-by: Ales Musil <amusil@redhat.com>
Han Zhou May 26, 2023, 9:40 p.m. UTC | #2
On Thu, May 25, 2023 at 11:45 PM Ales Musil <amusil@redhat.com> wrote:
>
>
>
> On Thu, May 25, 2023 at 6:01 PM Han Zhou <hzhou@ovn.org> wrote:
>>
>> When an address set is deleted and a new one is created immediately with
>> the same name, the OVSDB deletion and creation notifications can come to
>> ovn-controller within the same message, and the order of the deletion
>> and addition in the IDL is undefined. In this case, if the deletion
>> happens to be processed after the addition, it will result in problems.
>>
>> If the content of the new address set is the same as the old one, the
>> addition will be handled as no change to an existing address set and
>> ignored, but later when handling the deletion, the related flows will be
>> deleted, while we actually expect no change to the flows.
>>
>> If the content of the new address set is different from the old one, the
>> addition will be handled as an update to the old address set, and the
>> delta will be tracked in the I-P engine node data's "updated" member.
>> Later when handling the deletion, the address set will be deleted from
>> the engine node data. So at this point the tracked change in "updated"
>> doesn't have a related address set object in the data, so later when the
>> lflow node handles the addr_sets changes, the assertion fails as shown
>> in below example backtrace:
>>
>> 0  0x00007ffbe5fcabc5 in raise () from /lib64/libc.so.6
>> 1  0x00007ffbe5fb38a4 in abort () from /lib64/libc.so.6
>> 2  0x00000000004f579e in ovs_abort_valist (err_no=<optimized out>,
format=<optimized out>, args=<optimized out>) at ../lib/util.c:444
>> 3  0x00000000004fd001 in vlog_abort_valist (module_=<optimized out>,
message=0x5e6fd0 "%s: assertion %s failed in %s()",
args=args@entry=0x7ffed304caa8)
at ../lib/vlog.c:1249
>> 4  0x00000000004fd097 in vlog_abort (module=<optimized out>,
message=<optimized out>) at ../lib/vlog.c:1263
>> 5  0x00000000004f54e1 in ovs_assert_failure (where=<optimized out>,
function=<optimized out>, condition=<optimized out>) at ../lib/util.c:86
>> 6  0x000000000041941d in as_update_can_be_handled
(l_ctx_in=0x7ffed304cda0, l_ctx_in=0x7ffed304cda0, as_diff=0x25520f0,
as_name=0x2550a00 "as1") at ../controller/lflow.c:766
>> 7  lflow_handle_addr_set_update (as_name=0x2550a00 "as1",
as_diff=0x25520f0, l_ctx_in=0x7ffed304cda0, l_ctx_out=0x7ffed304cd50,
changed=0x7ffed304cd4f) at ../controller/lflow.c:869
>> 8  0x0000000000438376 in lflow_output_addr_sets_handler
(node=0x7ffed3054420, data=<optimized out>) at
../controller/ovn-controller.c:2648
>> 9  0x000000000045708e in engine_compute (recompute_allowed=<optimized
out>, node=<optimized out>) at ../lib/inc-proc-eng.c:417
>> 10 engine_run_node (recompute_allowed=true, node=0x7ffed3054420) at
../lib/inc-proc-eng.c:479
>> 11 engine_run (recompute_allowed=true) at ../lib/inc-proc-eng.c:504
>> 12 0x000000000040939b in main (argc=<optimized out>, argv=<optimized
out>) at ../controller/ovn-controller.c:3757
>>
>> This patch fixes it by handling deletions before additions/updates,
>> similar to how we handle other changes such as lflow and port-groups.
>>
>> Fixes: c0fe8dca767f ("ovn-controller: Tracking SB address set updates.")
>> Signed-off-by: Han Zhou <hzhou@ovn.org>
>> ---
>>  controller/ovn-controller.c |  6 ++++-
>>  tests/ovn-controller.at     | 51 +++++++++++++++++++++++++++++++++++++
>>  2 files changed, 56 insertions(+), 1 deletion(-)
>>
>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> index 1151d3664478..2706dc628d6d 100644
>> --- a/controller/ovn-controller.c
>> +++ b/controller/ovn-controller.c
>> @@ -2015,7 +2015,11 @@ addr_sets_update(const struct
sbrec_address_set_table *address_set_table,
>>          if (sbrec_address_set_is_deleted(as)) {
>>              expr_const_sets_remove(addr_sets, as->name);
>>              sset_add(deleted, as->name);
>> -        } else {
>> +        }
>> +    }
>> +
>> +    SBREC_ADDRESS_SET_TABLE_FOR_EACH_TRACKED (as, address_set_table) {
>> +        if (!sbrec_address_set_is_deleted(as)) {
>>              struct expr_constant_set *cs_old =
shash_find_data(addr_sets,
>>
as->name);
>>              if (!cs_old) {
>> diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
>> index 64d6a9336613..a95834e69eca 100644
>> --- a/tests/ovn-controller.at
>> +++ b/tests/ovn-controller.at
>> @@ -2065,6 +2065,57 @@ AT_CHECK([echo $(($reprocess_count_new -
$reprocess_count_old))], [0], [2
>>  OVN_CLEANUP([hv1])
>>  AT_CLEANUP
>>
>> +AT_SETUP([ovn-controller - address set del-and-add])
>> +
>> +ovn_start
>> +
>> +net_add n1
>> +sim_add hv1
>> +as hv1
>> +check ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.1
>> +check ovs-vsctl -- add-port br-int hv1-vif1 -- \
>> +    set interface hv1-vif1 external-ids:iface-id=ls1-lp1
>> +
>> +check ovn-nbctl ls-add ls1
>> +
>> +check ovn-nbctl lsp-add ls1 ls1-lp1 \
>> +-- lsp-set-addresses ls1-lp1 "f0:00:00:00:00:01"
>> +
>> +wait_for_ports_up
>> +ovn-appctl -t ovn-controller vlog/set file:dbg
>> +
>> +ovn-nbctl create address_set name=as1 addresses=8.8.8.8
>> +check ovn-nbctl acl-add ls1 to-lport 100 'outport == "ls1-lp1" &&
ip4.src == $as1' drop
>> +check ovn-nbctl --wait=hv sync
>> +AT_CHECK([ovs-ofctl dump-flows br-int table=44 | grep -c
"priority=1100"], [0], [1
>> +])
>> +
>> +# pause ovn-northd
>> +check as northd ovn-appctl -t ovn-northd pause
>> +check as northd-backup ovn-appctl -t ovn-northd pause
>> +
>> +# Simulate a SB address set "del and add" notification to
ovn-controller in the
>> +# same IDL iteration. The flows programmed by ovn-controller should
reflect the
>> +# newly added address set. In reality it can happen when CMS deletes an
>> +# address-set and immediately creates a new address-set with the same
name
>> +# (with same or different content). The notification of the changes can
come to
>> +# ovn-controller in one shot and the order of the "del" and "add" in
the IDL is
>> +# undefined. This test runs the scenario ten times to make sure
different
>> +# orders are covered and handled properly.
>> +
>> +flow_count=$(ovs-ofctl dump-flows br-int table=44 | grep -c
"priority=1100")
>> +for i in $(seq 10); do
>> +    # Delete and recreate the SB address set with same name and an
extra IP.
>> +    addrs_=$(fetch_column address_set addresses name=as1)
>> +    addrs=${addrs_// /,}
>> +    AT_CHECK([ovn-sbctl destroy address_set as1 -- create address_set
name=as1 addresses=$addrs,1.1.1.$i], [0], [ignore])
>> +    OVS_WAIT_UNTIL([test $(as hv1 ovs-ofctl dump-flows br-int table=44
| grep -c "priority=1100") = "$(($i + 1))"])
>> +done
>> +
>> +OVN_CLEANUP([hv1])
>> +AT_CLEANUP
>> +
>>  AT_SETUP([ovn-controller - I-P handle lb_hairpin_use_ct_mark change])
>>
>>  ovn_start --backup-northd=none
>> --
>> 2.30.2
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
>
> Looks good to me, thanks.
>
> Acked-by: Ales Musil <amusil@redhat.com>

Thanks Ales. I pushed to main and backported down to branch-22.03.

Regards,
Han
diff mbox series

Patch

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 1151d3664478..2706dc628d6d 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -2015,7 +2015,11 @@  addr_sets_update(const struct sbrec_address_set_table *address_set_table,
         if (sbrec_address_set_is_deleted(as)) {
             expr_const_sets_remove(addr_sets, as->name);
             sset_add(deleted, as->name);
-        } else {
+        }
+    }
+
+    SBREC_ADDRESS_SET_TABLE_FOR_EACH_TRACKED (as, address_set_table) {
+        if (!sbrec_address_set_is_deleted(as)) {
             struct expr_constant_set *cs_old = shash_find_data(addr_sets,
                                                                as->name);
             if (!cs_old) {
diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index 64d6a9336613..a95834e69eca 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -2065,6 +2065,57 @@  AT_CHECK([echo $(($reprocess_count_new - $reprocess_count_old))], [0], [2
 OVN_CLEANUP([hv1])
 AT_CLEANUP
 
+AT_SETUP([ovn-controller - address set del-and-add])
+
+ovn_start
+
+net_add n1
+sim_add hv1
+as hv1
+check ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+check ovs-vsctl -- add-port br-int hv1-vif1 -- \
+    set interface hv1-vif1 external-ids:iface-id=ls1-lp1
+
+check ovn-nbctl ls-add ls1
+
+check ovn-nbctl lsp-add ls1 ls1-lp1 \
+-- lsp-set-addresses ls1-lp1 "f0:00:00:00:00:01"
+
+wait_for_ports_up
+ovn-appctl -t ovn-controller vlog/set file:dbg
+
+ovn-nbctl create address_set name=as1 addresses=8.8.8.8
+check ovn-nbctl acl-add ls1 to-lport 100 'outport == "ls1-lp1" && ip4.src == $as1' drop
+check ovn-nbctl --wait=hv sync
+AT_CHECK([ovs-ofctl dump-flows br-int table=44 | grep -c "priority=1100"], [0], [1
+])
+
+# pause ovn-northd
+check as northd ovn-appctl -t ovn-northd pause
+check as northd-backup ovn-appctl -t ovn-northd pause
+
+# Simulate a SB address set "del and add" notification to ovn-controller in the
+# same IDL iteration. The flows programmed by ovn-controller should reflect the
+# newly added address set. In reality it can happen when CMS deletes an
+# address-set and immediately creates a new address-set with the same name
+# (with same or different content). The notification of the changes can come to
+# ovn-controller in one shot and the order of the "del" and "add" in the IDL is
+# undefined. This test runs the scenario ten times to make sure different
+# orders are covered and handled properly.
+
+flow_count=$(ovs-ofctl dump-flows br-int table=44 | grep -c "priority=1100")
+for i in $(seq 10); do
+    # Delete and recreate the SB address set with same name and an extra IP.
+    addrs_=$(fetch_column address_set addresses name=as1)
+    addrs=${addrs_// /,}
+    AT_CHECK([ovn-sbctl destroy address_set as1 -- create address_set name=as1 addresses=$addrs,1.1.1.$i], [0], [ignore])
+    OVS_WAIT_UNTIL([test $(as hv1 ovs-ofctl dump-flows br-int table=44 | grep -c "priority=1100") = "$(($i + 1))"])
+done
+
+OVN_CLEANUP([hv1])
+AT_CLEANUP
+
 AT_SETUP([ovn-controller - I-P handle lb_hairpin_use_ct_mark change])
 
 ovn_start --backup-northd=none