Message ID | 20230525160117.274973-1-hzhou@ovn.org |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev] ovn-controller.c: Fix assertion failure during address set update. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | fail | github build: failed |
On Thu, May 25, 2023 at 6:01 PM Han Zhou <hzhou@ovn.org> wrote: > When an address set is deleted and a new one is created immediately with > the same name, the OVSDB deletion and creation notifications can come to > ovn-controller within the same message, and the order of the deletion > and addition in the IDL is undefined. In this case, if the deletion > happens to be processed after the addition, it will result in problems. > > If the content of the new address set is the same as the old one, the > addition will be handled as no change to an existing address set and > ignored, but later when handling the deletion, the related flows will be > deleted, while we actually expect no change to the flows. > > If the content of the new address set is different from the old one, the > addition will be handled as an update to the old address set, and the > delta will be tracked in the I-P engine node data's "updated" member. > Later when handling the deletion, the address set will be deleted from > the engine node data. So at this point the tracked change in "updated" > doesn't have a related address set object in the data, so later when the > lflow node handles the addr_sets changes, the assertion fails as shown > in below example backtrace: > > 0 0x00007ffbe5fcabc5 in raise () from /lib64/libc.so.6 > 1 0x00007ffbe5fb38a4 in abort () from /lib64/libc.so.6 > 2 0x00000000004f579e in ovs_abort_valist (err_no=<optimized out>, > format=<optimized out>, args=<optimized out>) at ../lib/util.c:444 > 3 0x00000000004fd001 in vlog_abort_valist (module_=<optimized out>, > message=0x5e6fd0 "%s: assertion %s failed in %s()", args=args@entry=0x7ffed304caa8) > at ../lib/vlog.c:1249 > 4 0x00000000004fd097 in vlog_abort (module=<optimized out>, > message=<optimized out>) at ../lib/vlog.c:1263 > 5 0x00000000004f54e1 in ovs_assert_failure (where=<optimized out>, > function=<optimized out>, condition=<optimized out>) at ../lib/util.c:86 > 6 0x000000000041941d in as_update_can_be_handled > (l_ctx_in=0x7ffed304cda0, l_ctx_in=0x7ffed304cda0, as_diff=0x25520f0, > as_name=0x2550a00 "as1") at ../controller/lflow.c:766 > 7 lflow_handle_addr_set_update (as_name=0x2550a00 "as1", > as_diff=0x25520f0, l_ctx_in=0x7ffed304cda0, l_ctx_out=0x7ffed304cd50, > changed=0x7ffed304cd4f) at ../controller/lflow.c:869 > 8 0x0000000000438376 in lflow_output_addr_sets_handler > (node=0x7ffed3054420, data=<optimized out>) at > ../controller/ovn-controller.c:2648 > 9 0x000000000045708e in engine_compute (recompute_allowed=<optimized > out>, node=<optimized out>) at ../lib/inc-proc-eng.c:417 > 10 engine_run_node (recompute_allowed=true, node=0x7ffed3054420) at > ../lib/inc-proc-eng.c:479 > 11 engine_run (recompute_allowed=true) at ../lib/inc-proc-eng.c:504 > 12 0x000000000040939b in main (argc=<optimized out>, argv=<optimized out>) > at ../controller/ovn-controller.c:3757 > > This patch fixes it by handling deletions before additions/updates, > similar to how we handle other changes such as lflow and port-groups. > > Fixes: c0fe8dca767f ("ovn-controller: Tracking SB address set updates.") > Signed-off-by: Han Zhou <hzhou@ovn.org> > --- > controller/ovn-controller.c | 6 ++++- > tests/ovn-controller.at | 51 +++++++++++++++++++++++++++++++++++++ > 2 files changed, 56 insertions(+), 1 deletion(-) > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index 1151d3664478..2706dc628d6d 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -2015,7 +2015,11 @@ addr_sets_update(const struct > sbrec_address_set_table *address_set_table, > if (sbrec_address_set_is_deleted(as)) { > expr_const_sets_remove(addr_sets, as->name); > sset_add(deleted, as->name); > - } else { > + } > + } > + > + SBREC_ADDRESS_SET_TABLE_FOR_EACH_TRACKED (as, address_set_table) { > + if (!sbrec_address_set_is_deleted(as)) { > struct expr_constant_set *cs_old = shash_find_data(addr_sets, > as->name); > if (!cs_old) { > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at > index 64d6a9336613..a95834e69eca 100644 > --- a/tests/ovn-controller.at > +++ b/tests/ovn-controller.at > @@ -2065,6 +2065,57 @@ AT_CHECK([echo $(($reprocess_count_new - > $reprocess_count_old))], [0], [2 > OVN_CLEANUP([hv1]) > AT_CLEANUP > > +AT_SETUP([ovn-controller - address set del-and-add]) > + > +ovn_start > + > +net_add n1 > +sim_add hv1 > +as hv1 > +check ovs-vsctl add-br br-phys > +ovn_attach n1 br-phys 192.168.0.1 > +check ovs-vsctl -- add-port br-int hv1-vif1 -- \ > + set interface hv1-vif1 external-ids:iface-id=ls1-lp1 > + > +check ovn-nbctl ls-add ls1 > + > +check ovn-nbctl lsp-add ls1 ls1-lp1 \ > +-- lsp-set-addresses ls1-lp1 "f0:00:00:00:00:01" > + > +wait_for_ports_up > +ovn-appctl -t ovn-controller vlog/set file:dbg > + > +ovn-nbctl create address_set name=as1 addresses=8.8.8.8 > +check ovn-nbctl acl-add ls1 to-lport 100 'outport == "ls1-lp1" && ip4.src > == $as1' drop > +check ovn-nbctl --wait=hv sync > +AT_CHECK([ovs-ofctl dump-flows br-int table=44 | grep -c > "priority=1100"], [0], [1 > +]) > + > +# pause ovn-northd > +check as northd ovn-appctl -t ovn-northd pause > +check as northd-backup ovn-appctl -t ovn-northd pause > + > +# Simulate a SB address set "del and add" notification to ovn-controller > in the > +# same IDL iteration. The flows programmed by ovn-controller should > reflect the > +# newly added address set. In reality it can happen when CMS deletes an > +# address-set and immediately creates a new address-set with the same name > +# (with same or different content). The notification of the changes can > come to > +# ovn-controller in one shot and the order of the "del" and "add" in the > IDL is > +# undefined. This test runs the scenario ten times to make sure different > +# orders are covered and handled properly. > + > +flow_count=$(ovs-ofctl dump-flows br-int table=44 | grep -c > "priority=1100") > +for i in $(seq 10); do > + # Delete and recreate the SB address set with same name and an extra > IP. > + addrs_=$(fetch_column address_set addresses name=as1) > + addrs=${addrs_// /,} > + AT_CHECK([ovn-sbctl destroy address_set as1 -- create address_set > name=as1 addresses=$addrs,1.1.1.$i], [0], [ignore]) > + OVS_WAIT_UNTIL([test $(as hv1 ovs-ofctl dump-flows br-int table=44 | > grep -c "priority=1100") = "$(($i + 1))"]) > +done > + > +OVN_CLEANUP([hv1]) > +AT_CLEANUP > + > AT_SETUP([ovn-controller - I-P handle lb_hairpin_use_ct_mark change]) > > ovn_start --backup-northd=none > -- > 2.30.2 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > Looks good to me, thanks. Acked-by: Ales Musil <amusil@redhat.com>
On Thu, May 25, 2023 at 11:45 PM Ales Musil <amusil@redhat.com> wrote: > > > > On Thu, May 25, 2023 at 6:01 PM Han Zhou <hzhou@ovn.org> wrote: >> >> When an address set is deleted and a new one is created immediately with >> the same name, the OVSDB deletion and creation notifications can come to >> ovn-controller within the same message, and the order of the deletion >> and addition in the IDL is undefined. In this case, if the deletion >> happens to be processed after the addition, it will result in problems. >> >> If the content of the new address set is the same as the old one, the >> addition will be handled as no change to an existing address set and >> ignored, but later when handling the deletion, the related flows will be >> deleted, while we actually expect no change to the flows. >> >> If the content of the new address set is different from the old one, the >> addition will be handled as an update to the old address set, and the >> delta will be tracked in the I-P engine node data's "updated" member. >> Later when handling the deletion, the address set will be deleted from >> the engine node data. So at this point the tracked change in "updated" >> doesn't have a related address set object in the data, so later when the >> lflow node handles the addr_sets changes, the assertion fails as shown >> in below example backtrace: >> >> 0 0x00007ffbe5fcabc5 in raise () from /lib64/libc.so.6 >> 1 0x00007ffbe5fb38a4 in abort () from /lib64/libc.so.6 >> 2 0x00000000004f579e in ovs_abort_valist (err_no=<optimized out>, format=<optimized out>, args=<optimized out>) at ../lib/util.c:444 >> 3 0x00000000004fd001 in vlog_abort_valist (module_=<optimized out>, message=0x5e6fd0 "%s: assertion %s failed in %s()", args=args@entry=0x7ffed304caa8) at ../lib/vlog.c:1249 >> 4 0x00000000004fd097 in vlog_abort (module=<optimized out>, message=<optimized out>) at ../lib/vlog.c:1263 >> 5 0x00000000004f54e1 in ovs_assert_failure (where=<optimized out>, function=<optimized out>, condition=<optimized out>) at ../lib/util.c:86 >> 6 0x000000000041941d in as_update_can_be_handled (l_ctx_in=0x7ffed304cda0, l_ctx_in=0x7ffed304cda0, as_diff=0x25520f0, as_name=0x2550a00 "as1") at ../controller/lflow.c:766 >> 7 lflow_handle_addr_set_update (as_name=0x2550a00 "as1", as_diff=0x25520f0, l_ctx_in=0x7ffed304cda0, l_ctx_out=0x7ffed304cd50, changed=0x7ffed304cd4f) at ../controller/lflow.c:869 >> 8 0x0000000000438376 in lflow_output_addr_sets_handler (node=0x7ffed3054420, data=<optimized out>) at ../controller/ovn-controller.c:2648 >> 9 0x000000000045708e in engine_compute (recompute_allowed=<optimized out>, node=<optimized out>) at ../lib/inc-proc-eng.c:417 >> 10 engine_run_node (recompute_allowed=true, node=0x7ffed3054420) at ../lib/inc-proc-eng.c:479 >> 11 engine_run (recompute_allowed=true) at ../lib/inc-proc-eng.c:504 >> 12 0x000000000040939b in main (argc=<optimized out>, argv=<optimized out>) at ../controller/ovn-controller.c:3757 >> >> This patch fixes it by handling deletions before additions/updates, >> similar to how we handle other changes such as lflow and port-groups. >> >> Fixes: c0fe8dca767f ("ovn-controller: Tracking SB address set updates.") >> Signed-off-by: Han Zhou <hzhou@ovn.org> >> --- >> controller/ovn-controller.c | 6 ++++- >> tests/ovn-controller.at | 51 +++++++++++++++++++++++++++++++++++++ >> 2 files changed, 56 insertions(+), 1 deletion(-) >> >> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >> index 1151d3664478..2706dc628d6d 100644 >> --- a/controller/ovn-controller.c >> +++ b/controller/ovn-controller.c >> @@ -2015,7 +2015,11 @@ addr_sets_update(const struct sbrec_address_set_table *address_set_table, >> if (sbrec_address_set_is_deleted(as)) { >> expr_const_sets_remove(addr_sets, as->name); >> sset_add(deleted, as->name); >> - } else { >> + } >> + } >> + >> + SBREC_ADDRESS_SET_TABLE_FOR_EACH_TRACKED (as, address_set_table) { >> + if (!sbrec_address_set_is_deleted(as)) { >> struct expr_constant_set *cs_old = shash_find_data(addr_sets, >> as->name); >> if (!cs_old) { >> diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at >> index 64d6a9336613..a95834e69eca 100644 >> --- a/tests/ovn-controller.at >> +++ b/tests/ovn-controller.at >> @@ -2065,6 +2065,57 @@ AT_CHECK([echo $(($reprocess_count_new - $reprocess_count_old))], [0], [2 >> OVN_CLEANUP([hv1]) >> AT_CLEANUP >> >> +AT_SETUP([ovn-controller - address set del-and-add]) >> + >> +ovn_start >> + >> +net_add n1 >> +sim_add hv1 >> +as hv1 >> +check ovs-vsctl add-br br-phys >> +ovn_attach n1 br-phys 192.168.0.1 >> +check ovs-vsctl -- add-port br-int hv1-vif1 -- \ >> + set interface hv1-vif1 external-ids:iface-id=ls1-lp1 >> + >> +check ovn-nbctl ls-add ls1 >> + >> +check ovn-nbctl lsp-add ls1 ls1-lp1 \ >> +-- lsp-set-addresses ls1-lp1 "f0:00:00:00:00:01" >> + >> +wait_for_ports_up >> +ovn-appctl -t ovn-controller vlog/set file:dbg >> + >> +ovn-nbctl create address_set name=as1 addresses=8.8.8.8 >> +check ovn-nbctl acl-add ls1 to-lport 100 'outport == "ls1-lp1" && ip4.src == $as1' drop >> +check ovn-nbctl --wait=hv sync >> +AT_CHECK([ovs-ofctl dump-flows br-int table=44 | grep -c "priority=1100"], [0], [1 >> +]) >> + >> +# pause ovn-northd >> +check as northd ovn-appctl -t ovn-northd pause >> +check as northd-backup ovn-appctl -t ovn-northd pause >> + >> +# Simulate a SB address set "del and add" notification to ovn-controller in the >> +# same IDL iteration. The flows programmed by ovn-controller should reflect the >> +# newly added address set. In reality it can happen when CMS deletes an >> +# address-set and immediately creates a new address-set with the same name >> +# (with same or different content). The notification of the changes can come to >> +# ovn-controller in one shot and the order of the "del" and "add" in the IDL is >> +# undefined. This test runs the scenario ten times to make sure different >> +# orders are covered and handled properly. >> + >> +flow_count=$(ovs-ofctl dump-flows br-int table=44 | grep -c "priority=1100") >> +for i in $(seq 10); do >> + # Delete and recreate the SB address set with same name and an extra IP. >> + addrs_=$(fetch_column address_set addresses name=as1) >> + addrs=${addrs_// /,} >> + AT_CHECK([ovn-sbctl destroy address_set as1 -- create address_set name=as1 addresses=$addrs,1.1.1.$i], [0], [ignore]) >> + OVS_WAIT_UNTIL([test $(as hv1 ovs-ofctl dump-flows br-int table=44 | grep -c "priority=1100") = "$(($i + 1))"]) >> +done >> + >> +OVN_CLEANUP([hv1]) >> +AT_CLEANUP >> + >> AT_SETUP([ovn-controller - I-P handle lb_hairpin_use_ct_mark change]) >> >> ovn_start --backup-northd=none >> -- >> 2.30.2 >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > > > Looks good to me, thanks. > > Acked-by: Ales Musil <amusil@redhat.com> Thanks Ales. I pushed to main and backported down to branch-22.03. Regards, Han
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 1151d3664478..2706dc628d6d 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -2015,7 +2015,11 @@ addr_sets_update(const struct sbrec_address_set_table *address_set_table, if (sbrec_address_set_is_deleted(as)) { expr_const_sets_remove(addr_sets, as->name); sset_add(deleted, as->name); - } else { + } + } + + SBREC_ADDRESS_SET_TABLE_FOR_EACH_TRACKED (as, address_set_table) { + if (!sbrec_address_set_is_deleted(as)) { struct expr_constant_set *cs_old = shash_find_data(addr_sets, as->name); if (!cs_old) { diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at index 64d6a9336613..a95834e69eca 100644 --- a/tests/ovn-controller.at +++ b/tests/ovn-controller.at @@ -2065,6 +2065,57 @@ AT_CHECK([echo $(($reprocess_count_new - $reprocess_count_old))], [0], [2 OVN_CLEANUP([hv1]) AT_CLEANUP +AT_SETUP([ovn-controller - address set del-and-add]) + +ovn_start + +net_add n1 +sim_add hv1 +as hv1 +check ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.1 +check ovs-vsctl -- add-port br-int hv1-vif1 -- \ + set interface hv1-vif1 external-ids:iface-id=ls1-lp1 + +check ovn-nbctl ls-add ls1 + +check ovn-nbctl lsp-add ls1 ls1-lp1 \ +-- lsp-set-addresses ls1-lp1 "f0:00:00:00:00:01" + +wait_for_ports_up +ovn-appctl -t ovn-controller vlog/set file:dbg + +ovn-nbctl create address_set name=as1 addresses=8.8.8.8 +check ovn-nbctl acl-add ls1 to-lport 100 'outport == "ls1-lp1" && ip4.src == $as1' drop +check ovn-nbctl --wait=hv sync +AT_CHECK([ovs-ofctl dump-flows br-int table=44 | grep -c "priority=1100"], [0], [1 +]) + +# pause ovn-northd +check as northd ovn-appctl -t ovn-northd pause +check as northd-backup ovn-appctl -t ovn-northd pause + +# Simulate a SB address set "del and add" notification to ovn-controller in the +# same IDL iteration. The flows programmed by ovn-controller should reflect the +# newly added address set. In reality it can happen when CMS deletes an +# address-set and immediately creates a new address-set with the same name +# (with same or different content). The notification of the changes can come to +# ovn-controller in one shot and the order of the "del" and "add" in the IDL is +# undefined. This test runs the scenario ten times to make sure different +# orders are covered and handled properly. + +flow_count=$(ovs-ofctl dump-flows br-int table=44 | grep -c "priority=1100") +for i in $(seq 10); do + # Delete and recreate the SB address set with same name and an extra IP. + addrs_=$(fetch_column address_set addresses name=as1) + addrs=${addrs_// /,} + AT_CHECK([ovn-sbctl destroy address_set as1 -- create address_set name=as1 addresses=$addrs,1.1.1.$i], [0], [ignore]) + OVS_WAIT_UNTIL([test $(as hv1 ovs-ofctl dump-flows br-int table=44 | grep -c "priority=1100") = "$(($i + 1))"]) +done + +OVN_CLEANUP([hv1]) +AT_CLEANUP + AT_SETUP([ovn-controller - I-P handle lb_hairpin_use_ct_mark change]) ovn_start --backup-northd=none
When an address set is deleted and a new one is created immediately with the same name, the OVSDB deletion and creation notifications can come to ovn-controller within the same message, and the order of the deletion and addition in the IDL is undefined. In this case, if the deletion happens to be processed after the addition, it will result in problems. If the content of the new address set is the same as the old one, the addition will be handled as no change to an existing address set and ignored, but later when handling the deletion, the related flows will be deleted, while we actually expect no change to the flows. If the content of the new address set is different from the old one, the addition will be handled as an update to the old address set, and the delta will be tracked in the I-P engine node data's "updated" member. Later when handling the deletion, the address set will be deleted from the engine node data. So at this point the tracked change in "updated" doesn't have a related address set object in the data, so later when the lflow node handles the addr_sets changes, the assertion fails as shown in below example backtrace: 0 0x00007ffbe5fcabc5 in raise () from /lib64/libc.so.6 1 0x00007ffbe5fb38a4 in abort () from /lib64/libc.so.6 2 0x00000000004f579e in ovs_abort_valist (err_no=<optimized out>, format=<optimized out>, args=<optimized out>) at ../lib/util.c:444 3 0x00000000004fd001 in vlog_abort_valist (module_=<optimized out>, message=0x5e6fd0 "%s: assertion %s failed in %s()", args=args@entry=0x7ffed304caa8) at ../lib/vlog.c:1249 4 0x00000000004fd097 in vlog_abort (module=<optimized out>, message=<optimized out>) at ../lib/vlog.c:1263 5 0x00000000004f54e1 in ovs_assert_failure (where=<optimized out>, function=<optimized out>, condition=<optimized out>) at ../lib/util.c:86 6 0x000000000041941d in as_update_can_be_handled (l_ctx_in=0x7ffed304cda0, l_ctx_in=0x7ffed304cda0, as_diff=0x25520f0, as_name=0x2550a00 "as1") at ../controller/lflow.c:766 7 lflow_handle_addr_set_update (as_name=0x2550a00 "as1", as_diff=0x25520f0, l_ctx_in=0x7ffed304cda0, l_ctx_out=0x7ffed304cd50, changed=0x7ffed304cd4f) at ../controller/lflow.c:869 8 0x0000000000438376 in lflow_output_addr_sets_handler (node=0x7ffed3054420, data=<optimized out>) at ../controller/ovn-controller.c:2648 9 0x000000000045708e in engine_compute (recompute_allowed=<optimized out>, node=<optimized out>) at ../lib/inc-proc-eng.c:417 10 engine_run_node (recompute_allowed=true, node=0x7ffed3054420) at ../lib/inc-proc-eng.c:479 11 engine_run (recompute_allowed=true) at ../lib/inc-proc-eng.c:504 12 0x000000000040939b in main (argc=<optimized out>, argv=<optimized out>) at ../controller/ovn-controller.c:3757 This patch fixes it by handling deletions before additions/updates, similar to how we handle other changes such as lflow and port-groups. Fixes: c0fe8dca767f ("ovn-controller: Tracking SB address set updates.") Signed-off-by: Han Zhou <hzhou@ovn.org> --- controller/ovn-controller.c | 6 ++++- tests/ovn-controller.at | 51 +++++++++++++++++++++++++++++++++++++ 2 files changed, 56 insertions(+), 1 deletion(-)