| Message ID | 20260520102522.1108893-1-dceara@redhat.com |
|---|---|
| State | Accepted |
| Headers | show |
| Series | [ovs-dev] physical: Re-evaluate CR ports when localnet port changes. | expand |
| Context | Check | Description |
|---|---|---|
| ovsrobot/apply-robot | success | apply and check: success |
| ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
| ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
| ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
| ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
On 5/20/26 12:25 PM, Dumitru Ceara via dev wrote: > physical_handle_flows_for_lport() handles the forward > dependency (PATCH/EXTERNAL/L3GATEWAY -> localnet) but > not the reverse: when a localnet port is added or > updated, chassisredirect ports on peer router datapaths > whose bridged redirect flows depend on that localnet > port (via put_remote_port_redirect_bridged() -> > get_localnet_port()) are not re-evaluated. > > This causes the CR bridged redirect flow in > OFTABLE_LOCAL_OUTPUT to be permanently missing when the > localnet port binding is processed incrementally without > a full recompute. > > Fix this by iterating peer router datapaths when a > localnet port is processed and re-evaluating any CR port > bindings found there. > > Add a test that deterministically exposes the bug by > pre-creating the OVS patch port so that non_vif_data > does not change when the localnet port binding arrives, > forcing pflow_output to use the incremental path. > > Fixes: 3ae8470edc64 ("I-P: Handle runtime data changes for pflow_output engine.") > Assisted-by: Claude Opus 4.6, Claude Code > Signed-off-by: Dumitru Ceara <dceara@redhat.com> > --- > NOTE: > - on 25.03 this patch would need some minor changes: > https://github.com/dceara/ovn/commit/54089ca > - that would allow us to also backport b408eedf6d9d > ("ovn-controller: Skip type-update check for new port bindings.") > to 25.03 > - more context here: > https://mail.openvswitch.org/pipermail/ovs-dev/2026-May/432414.html > --- Known CI failure, we need an OVS submodule bump to pick up the OVS fixes for this. --- /dev/null 2026-05-20 10:47:47.504261528 +0000 +++ /workspace/ovn-tmp/tests/testsuite.dir/at-groups/154/stdout 2026-05-20 10:57:31.808115116 +0000 @@ -0,0 +1,2 @@ +2026-05-20T10:57:31.587Z|00483|ofproto_dpif_rid|ERR|recirc_id 2 left allocated when ofproto (br-int) is destructed +2026-05-20T10:57:31.587Z|00484|ofproto_dpif_rid|ERR|recirc_id 4 left allocated when ofproto (br-int) is destructed related-ports-diff: 154. ovn.at:10203: 154. ARP/ND from localnet -- proxy reply on resident chassis only -- parallelization=yes -- ovn_monitor_all=yes (ovn.at:10203): FAILED (ovn.at:10203) Recheck-request: github-robot-_Build_and_Test
Hi Dumitru, thanks for the patch. Acked-by: Mark Michelson <mmichels@redhat.com> With regards to the CI failure, I posted patches for bumping the OVS submodule: https://patchwork.ozlabs.org/project/ovn/patch/20260520185920.2175267-1-mmichels@redhat.com/ https://patchwork.ozlabs.org/project/ovn/patch/20260520185935.2175319-1-mmichels@redhat.com/ https://patchwork.ozlabs.org/project/ovn/patch/20260520185947.2175355-1-mmichels@redhat.com/ https://patchwork.ozlabs.org/project/ovn/patch/20260520190001.2175393-1-mmichels@redhat.com/ Hopefully those will help with CI stability. On Wed, May 20, 2026 at 9:36 AM Dumitru Ceara via dev <ovs-dev@openvswitch.org> wrote: > > On 5/20/26 12:25 PM, Dumitru Ceara via dev wrote: > > physical_handle_flows_for_lport() handles the forward > > dependency (PATCH/EXTERNAL/L3GATEWAY -> localnet) but > > not the reverse: when a localnet port is added or > > updated, chassisredirect ports on peer router datapaths > > whose bridged redirect flows depend on that localnet > > port (via put_remote_port_redirect_bridged() -> > > get_localnet_port()) are not re-evaluated. > > > > This causes the CR bridged redirect flow in > > OFTABLE_LOCAL_OUTPUT to be permanently missing when the > > localnet port binding is processed incrementally without > > a full recompute. > > > > Fix this by iterating peer router datapaths when a > > localnet port is processed and re-evaluating any CR port > > bindings found there. > > > > Add a test that deterministically exposes the bug by > > pre-creating the OVS patch port so that non_vif_data > > does not change when the localnet port binding arrives, > > forcing pflow_output to use the incremental path. > > > > Fixes: 3ae8470edc64 ("I-P: Handle runtime data changes for pflow_output engine.") > > Assisted-by: Claude Opus 4.6, Claude Code > > Signed-off-by: Dumitru Ceara <dceara@redhat.com> > > --- > > NOTE: > > - on 25.03 this patch would need some minor changes: > > https://github.com/dceara/ovn/commit/54089ca > > - that would allow us to also backport b408eedf6d9d > > ("ovn-controller: Skip type-update check for new port bindings.") > > to 25.03 > > - more context here: > > https://mail.openvswitch.org/pipermail/ovs-dev/2026-May/432414.html > > --- > > Known CI failure, we need an OVS submodule bump to pick up the OVS fixes > for this. > > --- /dev/null 2026-05-20 10:47:47.504261528 +0000 > +++ /workspace/ovn-tmp/tests/testsuite.dir/at-groups/154/stdout > 2026-05-20 10:57:31.808115116 +0000 > @@ -0,0 +1,2 @@ > +2026-05-20T10:57:31.587Z|00483|ofproto_dpif_rid|ERR|recirc_id 2 left > allocated when ofproto (br-int) is destructed > +2026-05-20T10:57:31.587Z|00484|ofproto_dpif_rid|ERR|recirc_id 4 left > allocated when ofproto (br-int) is destructed > related-ports-diff: > 154. ovn.at:10203: 154. ARP/ND from localnet -- proxy reply on resident > chassis only -- parallelization=yes -- ovn_monitor_all=yes > (ovn.at:10203): FAILED (ovn.at:10203) > > Recheck-request: github-robot-_Build_and_Test > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On 5/20/26 9:31 PM, Mark Michelson wrote: > Hi Dumitru, thanks for the patch. > Hi Mark, Thanks for the review! > Acked-by: Mark Michelson <mmichels@redhat.com> > Applied to main and all stable branches down to 25.03. I didn't go further because on 24.09 and 24.03 we'd need other patches backported and that's not straightforward. We can revisit that in the future if needed I guess. I did have to make a small change to the test though as it was sometimes unstable in CI. I hope that's OK. diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at index d4d666b699..fd63399f17 100644 --- a/tests/ovn-controller.at +++ b/tests/ovn-controller.at @@ -1096,14 +1096,10 @@ check ovn-nbctl --wait=hv sync dnl Re-add the localnet port. The patch port already exists, dnl so non_vif_data should not change, and pflow_output should dnl be handled incrementally. -check as hv2 ovn-appctl -t ovn-controller inc-engine/clear-stats check ovn-nbctl --wait=hv \ -- lsp-add-localnet-port ls-underlay ln-underlay phys \ -- set logical_switch_port ln-underlay tag_request=1000 -dnl Verify pflow_output was NOT recomputed. -check_controller_engine_stats hv2 pflow_output norecompute compute - dnl Verify the CR bridged redirect flow is back. OVS_WAIT_UNTIL([as hv2 ovs-ofctl dump-flows br-int table=OFTABLE_LOCAL_OUTPUT | grep -q "reg15=0x${cr_key},metadata=0x${router_dp_key}"]) > With regards to the CI failure, I posted patches for bumping the OVS submodule: > > https://patchwork.ozlabs.org/project/ovn/patch/20260520185920.2175267-1-mmichels@redhat.com/ > https://patchwork.ozlabs.org/project/ovn/patch/20260520185935.2175319-1-mmichels@redhat.com/ > https://patchwork.ozlabs.org/project/ovn/patch/20260520185947.2175355-1-mmichels@redhat.com/ > https://patchwork.ozlabs.org/project/ovn/patch/20260520190001.2175393-1-mmichels@redhat.com/ > > Hopefully those will help with CI stability. > Unfortunately, as discussed on the 25.09 bump patch, these introduce other issues that need to be fixed in ovs: https://mail.openvswitch.org/pipermail/ovs-dev/2026-May/432516.html > On Wed, May 20, 2026 at 9:36 AM Dumitru Ceara via dev > <ovs-dev@openvswitch.org> wrote: >> >> On 5/20/26 12:25 PM, Dumitru Ceara via dev wrote: >>> physical_handle_flows_for_lport() handles the forward >>> dependency (PATCH/EXTERNAL/L3GATEWAY -> localnet) but >>> not the reverse: when a localnet port is added or >>> updated, chassisredirect ports on peer router datapaths >>> whose bridged redirect flows depend on that localnet >>> port (via put_remote_port_redirect_bridged() -> >>> get_localnet_port()) are not re-evaluated. >>> >>> This causes the CR bridged redirect flow in >>> OFTABLE_LOCAL_OUTPUT to be permanently missing when the >>> localnet port binding is processed incrementally without >>> a full recompute. >>> >>> Fix this by iterating peer router datapaths when a >>> localnet port is processed and re-evaluating any CR port >>> bindings found there. >>> >>> Add a test that deterministically exposes the bug by >>> pre-creating the OVS patch port so that non_vif_data >>> does not change when the localnet port binding arrives, >>> forcing pflow_output to use the incremental path. >>> >>> Fixes: 3ae8470edc64 ("I-P: Handle runtime data changes for pflow_output engine.") >>> Assisted-by: Claude Opus 4.6, Claude Code >>> Signed-off-by: Dumitru Ceara <dceara@redhat.com> >>> --- >>> NOTE: >>> - on 25.03 this patch would need some minor changes: >>> https://github.com/dceara/ovn/commit/54089ca >>> - that would allow us to also backport b408eedf6d9d >>> ("ovn-controller: Skip type-update check for new port bindings.") >>> to 25.03 >>> - more context here: >>> https://mail.openvswitch.org/pipermail/ovs-dev/2026-May/432414.html >>> --- >> >> Known CI failure, we need an OVS submodule bump to pick up the OVS fixes >> for this. >> >> --- /dev/null 2026-05-20 10:47:47.504261528 +0000 >> +++ /workspace/ovn-tmp/tests/testsuite.dir/at-groups/154/stdout >> 2026-05-20 10:57:31.808115116 +0000 >> @@ -0,0 +1,2 @@ >> +2026-05-20T10:57:31.587Z|00483|ofproto_dpif_rid|ERR|recirc_id 2 left >> allocated when ofproto (br-int) is destructed >> +2026-05-20T10:57:31.587Z|00484|ofproto_dpif_rid|ERR|recirc_id 4 left >> allocated when ofproto (br-int) is destructed >> related-ports-diff: >> 154. ovn.at:10203: 154. ARP/ND from localnet -- proxy reply on resident >> chassis only -- parallelization=yes -- ovn_monitor_all=yes >> (ovn.at:10203): FAILED (ovn.at:10203) >> >> Recheck-request: github-robot-_Build_and_Test >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > Regards, Dumitru
diff --git a/controller/physical.c b/controller/physical.c index 185b798b93..42b42e948b 100644 --- a/controller/physical.c +++ b/controller/physical.c @@ -3799,6 +3799,24 @@ physical_handle_flows_for_lport(const struct sbrec_port_binding *pb, } } + /* Chassisredirect ports on peer router datapaths may have bridged + * redirect flows that depend on this localnet port + * (put_remote_port_redirect_bridged() calls get_localnet_port()). + * Re-evaluate those CR ports. */ + if (type == LP_LOCALNET && !removed && ldp) { + const struct peer_ports *pp; + VECTOR_FOR_EACH_PTR (&ldp->peer_ports, pp) { + const struct sbrec_port_binding *cr_pb = + lport_get_cr_port(p_ctx->sbrec_port_binding_by_name, + pp->remote, NULL); + if (cr_pb) { + ofctrl_remove_flows(flow_table, &cr_pb->header_.uuid); + physical_eval_port_binding(p_ctx, cr_pb, LP_CHASSISREDIRECT, + flow_table); + } + } + } + if (sbrec_port_binding_is_updated( pb, SBREC_PORT_BINDING_COL_ADDITIONAL_CHASSIS) || removed) { physical_multichassis_reprocess(pb, p_ctx, flow_table); diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at index f9f64d691c..d4d666b699 100644 --- a/tests/ovn-controller.at +++ b/tests/ovn-controller.at @@ -1019,6 +1019,98 @@ OVN_CLEANUP([hv1]) AT_CLEANUP ]) +OVN_FOR_EACH_NORTHD([ +AT_SETUP([ovn-controller - localnet port change and chassisredirect bridged redirect]) +AT_KEYWORDS([ovn-localnet-cr-bridged]) + +ovn_start + +net_add n1 + +sim_add hv1 +as hv1 +check ovs-vsctl add-br br-phys +check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys +ovn_attach n1 br-phys 192.168.0.1 + +sim_add hv2 +as hv2 +check ovs-vsctl add-br br-phys +check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys +ovn_attach n1 br-phys 192.168.0.2 + +dnl Create the full topology with localnet ports, router, +dnl gateway chassis, and bridged redirect. Bind VIFs so the +dnl datapaths are local. +check ovn-nbctl \ + -- ls-add ls1 \ + -- lsp-add-localnet-port ls1 ln1 phys \ + -- set logical_switch_port ln1 tag_request=101 \ + -- lsp-add ls1 lp1 \ + -- lsp-set-addresses lp1 "00:00:00:00:00:01 192.168.1.10" \ + -- lsp-add ls1 lp2 \ + -- lsp-set-addresses lp2 "00:00:00:00:00:02 192.168.1.11" \ + -- lsp-add-router-port ls1 ls1-to-router router-to-ls1 \ + -- ls-add ls-underlay \ + -- lsp-add-localnet-port ls-underlay ln-underlay phys \ + -- set logical_switch_port ln-underlay tag_request=1000 \ + -- lsp-add-router-port ls-underlay underlay-to-router router-to-underlay \ + -- lr-add lr1 \ + -- lrp-add lr1 router-to-ls1 00:00:01:01:02:03 192.168.1.1/24 \ + -- lrp-add lr1 router-to-underlay 00:00:01:01:02:07 172.31.0.1/24 \ + -- lrp-set-gateway-chassis router-to-underlay hv1 \ + -- lrp-set-redirect-type router-to-underlay bridged + +check as hv1 ovs-vsctl add-port br-int vif0 \ + -- set Interface vif0 external_ids:iface-id=lp1 +check as hv2 ovs-vsctl add-port br-int vif1 \ + -- set Interface vif1 external_ids:iface-id=lp2 + +wait_for_ports_up +check ovn-nbctl --wait=hv sync + +dnl Verify initial state: hv2 has the CR bridged redirect flow. +router_dp_key=$(printf "%x" $(fetch_column datapath tunnel_key external_ids:name=lr1)) +cr_key=$(printf "%x" $(fetch_column port_binding tunnel_key logical_port=cr-router-to-underlay)) +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=OFTABLE_LOCAL_OUTPUT | grep -q "reg15=0x${cr_key},metadata=0x${router_dp_key}"]) + +dnl Delete the localnet port on ls-underlay. +check ovn-nbctl --wait=hv lsp-del ln-underlay +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=OFTABLE_LOCAL_OUTPUT | grep "reg15=0x${cr_key},metadata=0x${router_dp_key}"], [1]) + +dnl Pre-create the patch port that the new localnet port +dnl would need. This way, when the localnet port binding arrives, +dnl patch_run() sees the patch port already exists and non_vif_data +dnl does NOT change, forcing pflow_output to handle the +dnl change incrementally instead of recomputing. +check as hv2 ovs-vsctl \ + -- add-port br-int patch-br-int-to-ln-underlay \ + -- set Interface patch-br-int-to-ln-underlay \ + type=patch options:peer=patch-ln-underlay-to-br-int +check as hv2 ovs-vsctl \ + -- add-port br-phys patch-ln-underlay-to-br-int \ + -- set Interface patch-ln-underlay-to-br-int \ + type=patch options:peer=patch-br-int-to-ln-underlay +check ovn-nbctl --wait=hv sync + +dnl Re-add the localnet port. The patch port already exists, +dnl so non_vif_data should not change, and pflow_output should +dnl be handled incrementally. +check as hv2 ovn-appctl -t ovn-controller inc-engine/clear-stats +check ovn-nbctl --wait=hv \ + -- lsp-add-localnet-port ls-underlay ln-underlay phys \ + -- set logical_switch_port ln-underlay tag_request=1000 + +dnl Verify pflow_output was NOT recomputed. +check_controller_engine_stats hv2 pflow_output norecompute compute + +dnl Verify the CR bridged redirect flow is back. +OVS_WAIT_UNTIL([as hv2 ovs-ofctl dump-flows br-int table=OFTABLE_LOCAL_OUTPUT | grep -q "reg15=0x${cr_key},metadata=0x${router_dp_key}"]) + +OVN_CLEANUP([hv1], [hv2]) +AT_CLEANUP +]) + AT_SETUP([ovn-controller - I-P for address set update: no conjunction]) AT_KEYWORDS([as-i-p])
physical_handle_flows_for_lport() handles the forward dependency (PATCH/EXTERNAL/L3GATEWAY -> localnet) but not the reverse: when a localnet port is added or updated, chassisredirect ports on peer router datapaths whose bridged redirect flows depend on that localnet port (via put_remote_port_redirect_bridged() -> get_localnet_port()) are not re-evaluated. This causes the CR bridged redirect flow in OFTABLE_LOCAL_OUTPUT to be permanently missing when the localnet port binding is processed incrementally without a full recompute. Fix this by iterating peer router datapaths when a localnet port is processed and re-evaluating any CR port bindings found there. Add a test that deterministically exposes the bug by pre-creating the OVS patch port so that non_vif_data does not change when the localnet port binding arrives, forcing pflow_output to use the incremental path. Fixes: 3ae8470edc64 ("I-P: Handle runtime data changes for pflow_output engine.") Assisted-by: Claude Opus 4.6, Claude Code Signed-off-by: Dumitru Ceara <dceara@redhat.com> --- NOTE: - on 25.03 this patch would need some minor changes: https://github.com/dceara/ovn/commit/54089ca - that would allow us to also backport b408eedf6d9d ("ovn-controller: Skip type-update check for new port bindings.") to 25.03 - more context here: https://mail.openvswitch.org/pipermail/ovs-dev/2026-May/432414.html --- controller/physical.c | 18 ++++++++ tests/ovn-controller.at | 92 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 110 insertions(+)