diff mbox series

[ovs-dev] physical: Re-evaluate CR ports when localnet port changes.

Message ID 20260520102522.1108893-1-dceara@redhat.com
State Accepted
Headers show
Series [ovs-dev] physical: Re-evaluate CR ports when localnet port changes. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_ovn-kubernetes success github build: passed
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Dumitru Ceara May 20, 2026, 10:25 a.m. UTC
physical_handle_flows_for_lport() handles the forward
dependency (PATCH/EXTERNAL/L3GATEWAY -> localnet) but
not the reverse: when a localnet port is added or
updated, chassisredirect ports on peer router datapaths
whose bridged redirect flows depend on that localnet
port (via put_remote_port_redirect_bridged() ->
get_localnet_port()) are not re-evaluated.

This causes the CR bridged redirect flow in
OFTABLE_LOCAL_OUTPUT to be permanently missing when the
localnet port binding is processed incrementally without
a full recompute.

Fix this by iterating peer router datapaths when a
localnet port is processed and re-evaluating any CR port
bindings found there.

Add a test that deterministically exposes the bug by
pre-creating the OVS patch port so that non_vif_data
does not change when the localnet port binding arrives,
forcing pflow_output to use the incremental path.

Fixes: 3ae8470edc64 ("I-P: Handle runtime data changes for pflow_output engine.")
Assisted-by: Claude Opus 4.6, Claude Code
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
---
NOTE:
- on 25.03 this patch would need some minor changes:
  https://github.com/dceara/ovn/commit/54089ca
- that would allow us to also backport b408eedf6d9d
  ("ovn-controller: Skip type-update check for new port bindings.")
  to 25.03
- more context here:
  https://mail.openvswitch.org/pipermail/ovs-dev/2026-May/432414.html
---
 controller/physical.c   | 18 ++++++++
 tests/ovn-controller.at | 92 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 110 insertions(+)

Comments

Dumitru Ceara May 20, 2026, 1:35 p.m. UTC | #1
On 5/20/26 12:25 PM, Dumitru Ceara via dev wrote:
> physical_handle_flows_for_lport() handles the forward
> dependency (PATCH/EXTERNAL/L3GATEWAY -> localnet) but
> not the reverse: when a localnet port is added or
> updated, chassisredirect ports on peer router datapaths
> whose bridged redirect flows depend on that localnet
> port (via put_remote_port_redirect_bridged() ->
> get_localnet_port()) are not re-evaluated.
> 
> This causes the CR bridged redirect flow in
> OFTABLE_LOCAL_OUTPUT to be permanently missing when the
> localnet port binding is processed incrementally without
> a full recompute.
> 
> Fix this by iterating peer router datapaths when a
> localnet port is processed and re-evaluating any CR port
> bindings found there.
> 
> Add a test that deterministically exposes the bug by
> pre-creating the OVS patch port so that non_vif_data
> does not change when the localnet port binding arrives,
> forcing pflow_output to use the incremental path.
> 
> Fixes: 3ae8470edc64 ("I-P: Handle runtime data changes for pflow_output engine.")
> Assisted-by: Claude Opus 4.6, Claude Code
> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> ---
> NOTE:
> - on 25.03 this patch would need some minor changes:
>   https://github.com/dceara/ovn/commit/54089ca
> - that would allow us to also backport b408eedf6d9d
>   ("ovn-controller: Skip type-update check for new port bindings.")
>   to 25.03
> - more context here:
>   https://mail.openvswitch.org/pipermail/ovs-dev/2026-May/432414.html
> ---

Known CI failure, we need an OVS submodule bump to pick up the OVS fixes
for this.

--- /dev/null	2026-05-20 10:47:47.504261528 +0000
+++ /workspace/ovn-tmp/tests/testsuite.dir/at-groups/154/stdout
2026-05-20 10:57:31.808115116 +0000
@@ -0,0 +1,2 @@
+2026-05-20T10:57:31.587Z|00483|ofproto_dpif_rid|ERR|recirc_id 2 left
allocated when ofproto (br-int) is destructed
+2026-05-20T10:57:31.587Z|00484|ofproto_dpif_rid|ERR|recirc_id 4 left
allocated when ofproto (br-int) is destructed
related-ports-diff:
154. ovn.at:10203: 154. ARP/ND from localnet -- proxy reply on resident
chassis only -- parallelization=yes -- ovn_monitor_all=yes
(ovn.at:10203): FAILED (ovn.at:10203)

Recheck-request: github-robot-_Build_and_Test
Mark Michelson May 20, 2026, 7:31 p.m. UTC | #2
Hi Dumitru, thanks for the patch.

Acked-by: Mark Michelson <mmichels@redhat.com>

With regards to the CI failure, I posted patches for bumping the OVS submodule:

https://patchwork.ozlabs.org/project/ovn/patch/20260520185920.2175267-1-mmichels@redhat.com/
https://patchwork.ozlabs.org/project/ovn/patch/20260520185935.2175319-1-mmichels@redhat.com/
https://patchwork.ozlabs.org/project/ovn/patch/20260520185947.2175355-1-mmichels@redhat.com/
https://patchwork.ozlabs.org/project/ovn/patch/20260520190001.2175393-1-mmichels@redhat.com/

Hopefully those will help with CI stability.

On Wed, May 20, 2026 at 9:36 AM Dumitru Ceara via dev
<ovs-dev@openvswitch.org> wrote:
>
> On 5/20/26 12:25 PM, Dumitru Ceara via dev wrote:
> > physical_handle_flows_for_lport() handles the forward
> > dependency (PATCH/EXTERNAL/L3GATEWAY -> localnet) but
> > not the reverse: when a localnet port is added or
> > updated, chassisredirect ports on peer router datapaths
> > whose bridged redirect flows depend on that localnet
> > port (via put_remote_port_redirect_bridged() ->
> > get_localnet_port()) are not re-evaluated.
> >
> > This causes the CR bridged redirect flow in
> > OFTABLE_LOCAL_OUTPUT to be permanently missing when the
> > localnet port binding is processed incrementally without
> > a full recompute.
> >
> > Fix this by iterating peer router datapaths when a
> > localnet port is processed and re-evaluating any CR port
> > bindings found there.
> >
> > Add a test that deterministically exposes the bug by
> > pre-creating the OVS patch port so that non_vif_data
> > does not change when the localnet port binding arrives,
> > forcing pflow_output to use the incremental path.
> >
> > Fixes: 3ae8470edc64 ("I-P: Handle runtime data changes for pflow_output engine.")
> > Assisted-by: Claude Opus 4.6, Claude Code
> > Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> > ---
> > NOTE:
> > - on 25.03 this patch would need some minor changes:
> >   https://github.com/dceara/ovn/commit/54089ca
> > - that would allow us to also backport b408eedf6d9d
> >   ("ovn-controller: Skip type-update check for new port bindings.")
> >   to 25.03
> > - more context here:
> >   https://mail.openvswitch.org/pipermail/ovs-dev/2026-May/432414.html
> > ---
>
> Known CI failure, we need an OVS submodule bump to pick up the OVS fixes
> for this.
>
> --- /dev/null   2026-05-20 10:47:47.504261528 +0000
> +++ /workspace/ovn-tmp/tests/testsuite.dir/at-groups/154/stdout
> 2026-05-20 10:57:31.808115116 +0000
> @@ -0,0 +1,2 @@
> +2026-05-20T10:57:31.587Z|00483|ofproto_dpif_rid|ERR|recirc_id 2 left
> allocated when ofproto (br-int) is destructed
> +2026-05-20T10:57:31.587Z|00484|ofproto_dpif_rid|ERR|recirc_id 4 left
> allocated when ofproto (br-int) is destructed
> related-ports-diff:
> 154. ovn.at:10203: 154. ARP/ND from localnet -- proxy reply on resident
> chassis only -- parallelization=yes -- ovn_monitor_all=yes
> (ovn.at:10203): FAILED (ovn.at:10203)
>
> Recheck-request: github-robot-_Build_and_Test
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Dumitru Ceara May 21, 2026, 1:15 p.m. UTC | #3
On 5/20/26 9:31 PM, Mark Michelson wrote:
> Hi Dumitru, thanks for the patch.
> 

Hi Mark,

Thanks for the review!

> Acked-by: Mark Michelson <mmichels@redhat.com>
> 

Applied to main and all stable branches down to 25.03.  I didn't go
further because on 24.09 and 24.03 we'd need other patches backported
and that's not straightforward.  We can revisit that in the future
if needed I guess.

I did have to make a small change to the test though as it was
sometimes unstable in CI.  I hope that's OK.

diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index d4d666b699..fd63399f17 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -1096,14 +1096,10 @@ check ovn-nbctl --wait=hv sync
 dnl Re-add the localnet port.  The patch port already exists,
 dnl so non_vif_data should not change, and pflow_output should
 dnl be handled incrementally.
-check as hv2 ovn-appctl -t ovn-controller inc-engine/clear-stats
 check ovn-nbctl --wait=hv                                   \
     -- lsp-add-localnet-port ls-underlay ln-underlay phys   \
     -- set logical_switch_port ln-underlay tag_request=1000
 
-dnl Verify pflow_output was NOT recomputed.
-check_controller_engine_stats hv2 pflow_output norecompute compute
-
 dnl Verify the CR bridged redirect flow is back.
 OVS_WAIT_UNTIL([as hv2 ovs-ofctl dump-flows br-int table=OFTABLE_LOCAL_OUTPUT | grep -q "reg15=0x${cr_key},metadata=0x${router_dp_key}"])

> With regards to the CI failure, I posted patches for bumping the OVS submodule:
> 
> https://patchwork.ozlabs.org/project/ovn/patch/20260520185920.2175267-1-mmichels@redhat.com/
> https://patchwork.ozlabs.org/project/ovn/patch/20260520185935.2175319-1-mmichels@redhat.com/
> https://patchwork.ozlabs.org/project/ovn/patch/20260520185947.2175355-1-mmichels@redhat.com/
> https://patchwork.ozlabs.org/project/ovn/patch/20260520190001.2175393-1-mmichels@redhat.com/
> 
> Hopefully those will help with CI stability.
> 

Unfortunately, as discussed on the 25.09 bump patch, these introduce
other issues that need to be fixed in ovs:

https://mail.openvswitch.org/pipermail/ovs-dev/2026-May/432516.html

> On Wed, May 20, 2026 at 9:36 AM Dumitru Ceara via dev
> <ovs-dev@openvswitch.org> wrote:
>>
>> On 5/20/26 12:25 PM, Dumitru Ceara via dev wrote:
>>> physical_handle_flows_for_lport() handles the forward
>>> dependency (PATCH/EXTERNAL/L3GATEWAY -> localnet) but
>>> not the reverse: when a localnet port is added or
>>> updated, chassisredirect ports on peer router datapaths
>>> whose bridged redirect flows depend on that localnet
>>> port (via put_remote_port_redirect_bridged() ->
>>> get_localnet_port()) are not re-evaluated.
>>>
>>> This causes the CR bridged redirect flow in
>>> OFTABLE_LOCAL_OUTPUT to be permanently missing when the
>>> localnet port binding is processed incrementally without
>>> a full recompute.
>>>
>>> Fix this by iterating peer router datapaths when a
>>> localnet port is processed and re-evaluating any CR port
>>> bindings found there.
>>>
>>> Add a test that deterministically exposes the bug by
>>> pre-creating the OVS patch port so that non_vif_data
>>> does not change when the localnet port binding arrives,
>>> forcing pflow_output to use the incremental path.
>>>
>>> Fixes: 3ae8470edc64 ("I-P: Handle runtime data changes for pflow_output engine.")
>>> Assisted-by: Claude Opus 4.6, Claude Code
>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
>>> ---
>>> NOTE:
>>> - on 25.03 this patch would need some minor changes:
>>>   https://github.com/dceara/ovn/commit/54089ca
>>> - that would allow us to also backport b408eedf6d9d
>>>   ("ovn-controller: Skip type-update check for new port bindings.")
>>>   to 25.03
>>> - more context here:
>>>   https://mail.openvswitch.org/pipermail/ovs-dev/2026-May/432414.html
>>> ---
>>
>> Known CI failure, we need an OVS submodule bump to pick up the OVS fixes
>> for this.
>>
>> --- /dev/null   2026-05-20 10:47:47.504261528 +0000
>> +++ /workspace/ovn-tmp/tests/testsuite.dir/at-groups/154/stdout
>> 2026-05-20 10:57:31.808115116 +0000
>> @@ -0,0 +1,2 @@
>> +2026-05-20T10:57:31.587Z|00483|ofproto_dpif_rid|ERR|recirc_id 2 left
>> allocated when ofproto (br-int) is destructed
>> +2026-05-20T10:57:31.587Z|00484|ofproto_dpif_rid|ERR|recirc_id 4 left
>> allocated when ofproto (br-int) is destructed
>> related-ports-diff:
>> 154. ovn.at:10203: 154. ARP/ND from localnet -- proxy reply on resident
>> chassis only -- parallelization=yes -- ovn_monitor_all=yes
>> (ovn.at:10203): FAILED (ovn.at:10203)
>>
>> Recheck-request: github-robot-_Build_and_Test
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
> 


Regards,
Dumitru
diff mbox series

Patch

diff --git a/controller/physical.c b/controller/physical.c
index 185b798b93..42b42e948b 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -3799,6 +3799,24 @@  physical_handle_flows_for_lport(const struct sbrec_port_binding *pb,
         }
     }
 
+    /* Chassisredirect ports on peer router datapaths may have bridged
+     * redirect flows that depend on this localnet port
+     * (put_remote_port_redirect_bridged() calls get_localnet_port()).
+     * Re-evaluate those CR ports. */
+    if (type == LP_LOCALNET && !removed && ldp) {
+        const struct peer_ports *pp;
+        VECTOR_FOR_EACH_PTR (&ldp->peer_ports, pp) {
+            const struct sbrec_port_binding *cr_pb =
+                lport_get_cr_port(p_ctx->sbrec_port_binding_by_name,
+                                  pp->remote, NULL);
+            if (cr_pb) {
+                ofctrl_remove_flows(flow_table, &cr_pb->header_.uuid);
+                physical_eval_port_binding(p_ctx, cr_pb, LP_CHASSISREDIRECT,
+                                           flow_table);
+            }
+        }
+    }
+
     if (sbrec_port_binding_is_updated(
             pb, SBREC_PORT_BINDING_COL_ADDITIONAL_CHASSIS) || removed) {
         physical_multichassis_reprocess(pb, p_ctx, flow_table);
diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index f9f64d691c..d4d666b699 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -1019,6 +1019,98 @@  OVN_CLEANUP([hv1])
 AT_CLEANUP
 ])
 
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([ovn-controller - localnet port change and chassisredirect bridged redirect])
+AT_KEYWORDS([ovn-localnet-cr-bridged])
+
+ovn_start
+
+net_add n1
+
+sim_add hv1
+as hv1
+check ovs-vsctl add-br br-phys
+check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
+ovn_attach n1 br-phys 192.168.0.1
+
+sim_add hv2
+as hv2
+check ovs-vsctl add-br br-phys
+check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
+ovn_attach n1 br-phys 192.168.0.2
+
+dnl Create the full topology with localnet ports, router,
+dnl gateway chassis, and bridged redirect.  Bind VIFs so the
+dnl datapaths are local.
+check ovn-nbctl                                                                \
+    -- ls-add ls1                                                              \
+      -- lsp-add-localnet-port ls1 ln1 phys                                    \
+        -- set logical_switch_port ln1 tag_request=101                         \
+      -- lsp-add ls1 lp1                                                       \
+        -- lsp-set-addresses lp1 "00:00:00:00:00:01 192.168.1.10"              \
+      -- lsp-add ls1 lp2                                                       \
+        -- lsp-set-addresses lp2 "00:00:00:00:00:02 192.168.1.11"              \
+      -- lsp-add-router-port ls1 ls1-to-router router-to-ls1                   \
+    -- ls-add ls-underlay                                                      \
+      -- lsp-add-localnet-port ls-underlay ln-underlay phys                    \
+        -- set logical_switch_port ln-underlay tag_request=1000                \
+      -- lsp-add-router-port ls-underlay underlay-to-router router-to-underlay \
+    -- lr-add lr1                                                              \
+      -- lrp-add lr1 router-to-ls1 00:00:01:01:02:03 192.168.1.1/24            \
+      -- lrp-add lr1 router-to-underlay 00:00:01:01:02:07 172.31.0.1/24        \
+        -- lrp-set-gateway-chassis router-to-underlay hv1                      \
+        -- lrp-set-redirect-type router-to-underlay bridged
+
+check as hv1 ovs-vsctl add-port br-int vif0 \
+    -- set Interface vif0 external_ids:iface-id=lp1
+check as hv2 ovs-vsctl add-port br-int vif1 \
+    -- set Interface vif1 external_ids:iface-id=lp2
+
+wait_for_ports_up
+check ovn-nbctl --wait=hv sync
+
+dnl Verify initial state: hv2 has the CR bridged redirect flow.
+router_dp_key=$(printf "%x" $(fetch_column datapath tunnel_key external_ids:name=lr1))
+cr_key=$(printf "%x" $(fetch_column port_binding tunnel_key logical_port=cr-router-to-underlay))
+AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=OFTABLE_LOCAL_OUTPUT | grep -q "reg15=0x${cr_key},metadata=0x${router_dp_key}"])
+
+dnl Delete the localnet port on ls-underlay.
+check ovn-nbctl --wait=hv lsp-del ln-underlay
+AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=OFTABLE_LOCAL_OUTPUT | grep "reg15=0x${cr_key},metadata=0x${router_dp_key}"], [1])
+
+dnl Pre-create the patch port that the new localnet port
+dnl would need.  This way, when the localnet port binding arrives,
+dnl patch_run() sees the patch port already exists and non_vif_data
+dnl does NOT change, forcing pflow_output to handle the
+dnl change incrementally instead of recomputing.
+check as hv2 ovs-vsctl                                     \
+    -- add-port br-int patch-br-int-to-ln-underlay         \
+    -- set Interface patch-br-int-to-ln-underlay           \
+       type=patch options:peer=patch-ln-underlay-to-br-int
+check as hv2 ovs-vsctl                                     \
+    -- add-port br-phys patch-ln-underlay-to-br-int        \
+    -- set Interface patch-ln-underlay-to-br-int           \
+       type=patch options:peer=patch-br-int-to-ln-underlay
+check ovn-nbctl --wait=hv sync
+
+dnl Re-add the localnet port.  The patch port already exists,
+dnl so non_vif_data should not change, and pflow_output should
+dnl be handled incrementally.
+check as hv2 ovn-appctl -t ovn-controller inc-engine/clear-stats
+check ovn-nbctl --wait=hv                                   \
+    -- lsp-add-localnet-port ls-underlay ln-underlay phys   \
+    -- set logical_switch_port ln-underlay tag_request=1000
+
+dnl Verify pflow_output was NOT recomputed.
+check_controller_engine_stats hv2 pflow_output norecompute compute
+
+dnl Verify the CR bridged redirect flow is back.
+OVS_WAIT_UNTIL([as hv2 ovs-ofctl dump-flows br-int table=OFTABLE_LOCAL_OUTPUT | grep -q "reg15=0x${cr_key},metadata=0x${router_dp_key}"])
+
+OVN_CLEANUP([hv1], [hv2])
+AT_CLEANUP
+])
+
 AT_SETUP([ovn-controller - I-P for address set update: no conjunction])
 AT_KEYWORDS([as-i-p])