diff mbox series

[ovs-dev,v1] ofproto-dpif-xlate: fix zone set from non frozen metadata field

Message ID 20210215095041.74228-1-hepeng.0320@bytedance.com
State Changes Requested
Headers show
Series [ovs-dev,v1] ofproto-dpif-xlate: fix zone set from non frozen metadata field | expand

Commit Message

Peng He Feb. 15, 2021, 9:50 a.m. UTC
CT zone could be set from a field that is not included in frozen
metedata. Consider the belowing cases which is normally used in
OpenStack security group rules:

priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2

The zone is set from the first rule's ct action. These two rules will
generate two megaflows: the first one uses zone=5 to query the CT module,
the second one set zone from the first megaflow and commit to CT.

The current implementation will generate a megaflow which does not use
ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
set by an Imm not a field.

Consider a situation that one changes the zone id (for example to 15)
in the first rule however still keep the second rule unchanged. During
this change, there is traffic hitting the two generated megaflows, the
revaldiator would revalidate all megaflows, however, the revalidator will
not change the second megaflow, because zone=5 is recorded in the
megaflow, so the xlate will still translate the commit action into zone=5,
and the new traffic will still commit to CT as zone=5, not zone=15,
resulting in taffic drops and other issues.

Just like OVS set-field action, if the field X is set by Y, we should also
mask Y as a match in the generated megaflow. An exception is that, if the zone
is set by the field that is included in the frozen state and this upcall is
a resume of a thawed xlate, the masking can be skipped, as if one changes
the previous rule, the generated recirc_id will be changed, and all megaflows
with the old recirc id will be invalid later, i.e. the revalidator will
not reuse the old megaflows at all.

Fixes: 07659514c3 ("Add support for connection tracking.")
Reported-by: Sai Su <susai.ss@bytedance.com>
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
---
 include/openvswitch/meta-flow.h |  1 +
 lib/meta-flow.c                 | 13 +++++++++++
 ofproto/ofproto-dpif-xlate.c    | 15 +++++++++++++
 tests/system-traffic.at         | 39 +++++++++++++++++++++++++++++++++
 4 files changed, 68 insertions(+)

Comments

0-day Robot Feb. 15, 2021, 9:59 a.m. UTC | #1
Bleep bloop.  Greetings Peng He, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Peng He <xnhp0320@gmail.com> needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Peng He <hepeng.0320@bytedance.com>
Lines checked: 162, Warnings: 1, Errors: 1


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Mark Gray Feb. 17, 2021, 10:13 a.m. UTC | #2
I'm not too familiar with this code but I have some comments.

On 15/02/2021 09:50, Peng He wrote:
> CT zone could be set from a field that is not included in frozen
> metedata. Consider the belowing cases which is normally used in

Nits:

s/metedata/metadata
s/belowing cases which is/cases below which are

> OpenStack security group rules:
> 
> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> 
> The zone is set from the first rule's ct action. These two rules will
> generate two megaflows: the first one uses zone=5 to query the CT module,
> the second one set zone from the first megaflow and commit to CT.
> 
> The current implementation will generate a megaflow which does not use
> ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
> set by an Imm not a field.
> 
> Consider a situation that one changes the zone id (for example to 15)
> in the first rule however still keep the second rule unchanged. During
> this change, there is traffic hitting the two generated megaflows, the
> revaldiator would revalidate all megaflows, however, the revalidator will
> not change the second megaflow, because zone=5 is recorded in the
> megaflow, so the xlate will still translate the commit action into zone=5,
> and the new traffic will still commit to CT as zone=5, not zone=15,
> resulting in taffic drops and other issues.
> 
> Just like OVS set-field action, if the field X is set by Y, we should also
> mask Y as a match in the generated megaflow. An exception is that, if the zone
> is set by the field that is included in the frozen state and this upcall is
> a resume of a thawed xlate, the masking can be skipped, as if one changes
> the previous rule, the generated recirc_id will be changed, and all megaflows
> with the old recirc id will be invalid later, i.e. the revalidator will
> not reuse the old megaflows at all.
> 
> Fixes: 07659514c3 ("Add support for connection tracking.")
> Reported-by: Sai Su <susai.ss@bytedance.com>
> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> ---
>  include/openvswitch/meta-flow.h |  1 +
>  lib/meta-flow.c                 | 13 +++++++++++
>  ofproto/ofproto-dpif-xlate.c    | 15 +++++++++++++
>  tests/system-traffic.at         | 39 +++++++++++++++++++++++++++++++++
>  4 files changed, 68 insertions(+)
> 
> diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> index 95e52e358..045dce8f5 100644
> --- a/include/openvswitch/meta-flow.h
> +++ b/include/openvswitch/meta-flow.h
> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
>                                const union mf_value *mask,
>                                struct flow *);
>  bool mf_is_tun_metadata(const struct mf_field *);
> +bool mf_is_frozen_metadata(const struct mf_field *);
>  bool mf_is_pipeline_field(const struct mf_field *);
>  bool mf_is_set(const struct mf_field *, const struct flow *);
>  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> index c808d205d..e03cd8d0c 100644
> --- a/lib/meta-flow.c
> +++ b/lib/meta-flow.c
> @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
>             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
>  }
>  
> +bool
> +mf_is_frozen_metadata(const struct mf_field *mf)
> +{
> +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> +        return true;
> +    }
> +
> +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> +        return true;
> +    }
> +    return false;
> +}
> +
>  bool
>  mf_is_pipeline_field(const struct mf_field *mf)
>  {
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 7108c8a30..5d1f029fd 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -6212,6 +6212,21 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
>                             &ctx->xin->flow, ctx->wc, zone);
>          }
>      }
> +
> +    if (ofc->zone_src.field) {
> +        if (ctx->xin->frozen_state) {
> +            /* If the upcall is a resume of a recirculation, we only need to
> +             * unwildcard the fields that are not in the frozen_metadata, as
> +             * when the rules update, OVS will generate a new recirc_id,
> +             * which will invalidate the megaflow with old the recirc_id.
> +             */
> +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> +                mf_mask_field(ofc->zone_src.field, ctx->wc);
Is this the only field we should check and un-wildcard here. This seems
like it would be applicable across other fields.
> +            }
> +        } else {
> +            mf_mask_field(ofc->zone_src.field, ctx->wc);
> +        }
> +    }

Add a new line after the bracket

>      nl_msg_put_u16(ctx->odp_actions, OVS_CT_ATTR_ZONE, zone);
>      put_ct_mark(&ctx->xin->flow, ctx->odp_actions, ctx->wc);
>      put_ct_label(&ctx->xin->flow, ctx->odp_actions, ctx->wc);
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index fb5b9a36d..bee50e530 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -1927,6 +1927,45 @@ tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src=
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>  
> +AT_SETUP([conntrack - zones from other field])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> +AT_DATA([flows.txt], [dnl
> +priority=1,action=drop
> +priority=10,arp,action=normal
> +priority=10,icmp,action=normal
> +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> +priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5)
> +priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +OVS_START_L7([at_ns1], [http])
> +
> +dnl HTTP requests from p0->p1 should work fine.
> +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>)
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | strip_used | grep "+trk" ], [0], [dnl
> +ct_state(+trk),ct_zone(0x5),recirc_id(0x2),in_port(ovs-p0),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no), packets:5, bytes:465, used:0.0s, flags:FP., actions:ct(commit,zone=5),ovs-p1
> +])

This test fails for me. It looks like the order of the output from
dpctl/dump-flows is different and the recirc_id is different. Could
easily be an issue on my side but I am not sure what it is.

+++ /home/magray/ovs/tests/system-kmod-testsuite.dir/at-groups/41/stdout
       2021-02-17 04:57:36.171766685 -0500
@@ -1,2 +1,2 @@
-ct_state(+trk),ct_zone(0x5),recirc_id(0x2),in_port(ovs-p0),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
packets:5, bytes:465, used:0.0s, flags:FP., actions:ct(commit,zone=5),ovs-p1
+recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_zone(0x5),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
packets:5, bytes:465, used:0.0s, flags:FP., actions:ct(commit,zone=5),ovs-p1

It seems this test does not test the situation that you mention in the
commit message: "Consider a situation .. "? Can you add a check for that?

> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +
>  AT_SETUP([conntrack - multiple bridges])
>  CHECK_CONNTRACK()
>  OVS_TRAFFIC_VSWITCHD_START(
>
Peng He Feb. 17, 2021, 10:40 a.m. UTC | #3
Hi,

Thanks for the review.

Mark Gray <mark.d.gray@redhat.com> 于2021年2月17日周三 下午6:13写道:
>
> I'm not too familiar with this code but I have some comments.
>
> On 15/02/2021 09:50, Peng He wrote:
> > CT zone could be set from a field that is not included in frozen
> > metedata. Consider the belowing cases which is normally used in
>
> Nits:
>
> s/metedata/metadata
> s/belowing cases which is/cases below which are

sorry for the typo, will fix it in the next version

>
> > OpenStack security group rules:
> >
> > priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> > priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> >
> > The zone is set from the first rule's ct action. These two rules will
> > generate two megaflows: the first one uses zone=5 to query the CT module,
> > the second one set zone from the first megaflow and commit to CT.
> >
> > The current implementation will generate a megaflow which does not use
> > ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
> > set by an Imm not a field.
> >
> > Consider a situation that one changes the zone id (for example to 15)
> > in the first rule however still keep the second rule unchanged. During
> > this change, there is traffic hitting the two generated megaflows, the
> > revaldiator would revalidate all megaflows, however, the revalidator will
> > not change the second megaflow, because zone=5 is recorded in the
> > megaflow, so the xlate will still translate the commit action into zone=5,
> > and the new traffic will still commit to CT as zone=5, not zone=15,
> > resulting in taffic drops and other issues.
> >
> > Just like OVS set-field action, if the field X is set by Y, we should also
> > mask Y as a match in the generated megaflow. An exception is that, if the zone
> > is set by the field that is included in the frozen state and this upcall is
> > a resume of a thawed xlate, the masking can be skipped, as if one changes
> > the previous rule, the generated recirc_id will be changed, and all megaflows
> > with the old recirc id will be invalid later, i.e. the revalidator will
> > not reuse the old megaflows at all.
> >
> > Fixes: 07659514c3 ("Add support for connection tracking.")
> > Reported-by: Sai Su <susai.ss@bytedance.com>
> > Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> > ---
> >  include/openvswitch/meta-flow.h |  1 +
> >  lib/meta-flow.c                 | 13 +++++++++++
> >  ofproto/ofproto-dpif-xlate.c    | 15 +++++++++++++
> >  tests/system-traffic.at         | 39 +++++++++++++++++++++++++++++++++
> >  4 files changed, 68 insertions(+)
> >
> > diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> > index 95e52e358..045dce8f5 100644
> > --- a/include/openvswitch/meta-flow.h
> > +++ b/include/openvswitch/meta-flow.h
> > @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
> >                                const union mf_value *mask,
> >                                struct flow *);
> >  bool mf_is_tun_metadata(const struct mf_field *);
> > +bool mf_is_frozen_metadata(const struct mf_field *);
> >  bool mf_is_pipeline_field(const struct mf_field *);
> >  bool mf_is_set(const struct mf_field *, const struct flow *);
> >  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> > diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> > index c808d205d..e03cd8d0c 100644
> > --- a/lib/meta-flow.c
> > +++ b/lib/meta-flow.c
> > @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
> >             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
> >  }
> >
> > +bool
> > +mf_is_frozen_metadata(const struct mf_field *mf)
> > +{
> > +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> > +        return true;
> > +    }
> > +
> > +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> > +        return true;
> > +    }
> > +    return false;
> > +}
> > +
> >  bool
> >  mf_is_pipeline_field(const struct mf_field *mf)
> >  {
> > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> > index 7108c8a30..5d1f029fd 100644
> > --- a/ofproto/ofproto-dpif-xlate.c
> > +++ b/ofproto/ofproto-dpif-xlate.c
> > @@ -6212,6 +6212,21 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
> >                             &ctx->xin->flow, ctx->wc, zone);
> >          }
> >      }
> > +
> > +    if (ofc->zone_src.field) {
> > +        if (ctx->xin->frozen_state) {
> > +            /* If the upcall is a resume of a recirculation, we only need to
> > +             * unwildcard the fields that are not in the frozen_metadata, as
> > +             * when the rules update, OVS will generate a new recirc_id,
> > +             * which will invalidate the megaflow with old the recirc_id.
> > +             */
> > +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> > +                mf_mask_field(ofc->zone_src.field, ctx->wc);
> Is this the only field we should check and un-wildcard here. This seems
> like it would be applicable across other fields.

if you mean that we should strictly limit the un-wildcarded fields to
the subfield
of the zone_src, not all the subfields of it, then yes.
I think here we only extended to the field pointed by zone_src, not
across other fields.

> > +            }
> > +        } else {
> > +            mf_mask_field(ofc->zone_src.field, ctx->wc);
> > +        }
> > +    }
>
> Add a new line after the bracket
>
> >      nl_msg_put_u16(ctx->odp_actions, OVS_CT_ATTR_ZONE, zone);
> >      put_ct_mark(&ctx->xin->flow, ctx->odp_actions, ctx->wc);
> >      put_ct_label(&ctx->xin->flow, ctx->odp_actions, ctx->wc);
> > diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> > index fb5b9a36d..bee50e530 100644
> > --- a/tests/system-traffic.at
> > +++ b/tests/system-traffic.at
> > @@ -1927,6 +1927,45 @@ tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src=
> >  OVS_TRAFFIC_VSWITCHD_STOP
> >  AT_CLEANUP
> >
> > +AT_SETUP([conntrack - zones from other field])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=10,icmp,action=normal
> > +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> > +priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> > +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5)
> > +priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +OVS_START_L7([at_ns1], [http])
> > +
> > +dnl HTTP requests from p0->p1 should work fine.
> > +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> > +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>)
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | strip_used | grep "+trk" ], [0], [dnl
> > +ct_state(+trk),ct_zone(0x5),recirc_id(0x2),in_port(ovs-p0),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no), packets:5, bytes:465, used:0.0s, flags:FP., actions:ct(commit,zone=5),ovs-p1
> > +])
>
> This test fails for me. It looks like the order of the output from
> dpctl/dump-flows is different and the recirc_id is different. Could
> easily be an issue on my side but I am not sure what it is.
>
> +++ /home/magray/ovs/tests/system-kmod-testsuite.dir/at-groups/41/stdout
>        2021-02-17 04:57:36.171766685 -0500
> @@ -1,2 +1,2 @@
> -ct_state(+trk),ct_zone(0x5),recirc_id(0x2),in_port(ovs-p0),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
> packets:5, bytes:465, used:0.0s, flags:FP., actions:ct(commit,zone=5),ovs-p1
> +recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_zone(0x5),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
> packets:5, bytes:465, used:0.0s, flags:FP., actions:ct(commit,zone=5),ovs-p1
>

yes, the problem is the recirc_id, looks like we need a similar
function like stirp_used, (strip_recirc_id)
The case is to make sure that the ct_zone(0x5) should show up in the match.

> It seems this test does not test the situation that you mention in the
> commit message: "Consider a situation .. "? Can you add a check for that?

The commit describes how we discover the bug, but essentially this is
just following the convention that when you try to load Y to X, (both
Y and X are variable, not a Imm)
you should un-wildcard Y in xlate process. I would like to add the
check in the test suites, but I do not know if the
current test framework provides enough tools for that. Will try to add
the check in v2.



>
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +
> >  AT_SETUP([conntrack - multiple bridges])
> >  CHECK_CONNTRACK()
> >  OVS_TRAFFIC_VSWITCHD_START(
> >
>
Mark Gray Feb. 17, 2021, 10:43 a.m. UTC | #4
On 17/02/2021 10:40, 贺鹏 wrote:
> Hi,
> 
> Thanks for the review.
> 
> Mark Gray <mark.d.gray@redhat.com> 于2021年2月17日周三 下午6:13写道:
>>
>> I'm not too familiar with this code but I have some comments.
>>
>> On 15/02/2021 09:50, Peng He wrote:
>>> CT zone could be set from a field that is not included in frozen
>>> metedata. Consider the belowing cases which is normally used in
>>
>> Nits:
>>
>> s/metedata/metadata
>> s/belowing cases which is/cases below which are
> 
> sorry for the typo, will fix it in the next version
> 
>>
>>> OpenStack security group rules:
>>>
>>> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
>>> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
>>>
>>> The zone is set from the first rule's ct action. These two rules will
>>> generate two megaflows: the first one uses zone=5 to query the CT module,
>>> the second one set zone from the first megaflow and commit to CT.
>>>
>>> The current implementation will generate a megaflow which does not use
>>> ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
>>> set by an Imm not a field.
>>>
>>> Consider a situation that one changes the zone id (for example to 15)
>>> in the first rule however still keep the second rule unchanged. During
>>> this change, there is traffic hitting the two generated megaflows, the
>>> revaldiator would revalidate all megaflows, however, the revalidator will
>>> not change the second megaflow, because zone=5 is recorded in the
>>> megaflow, so the xlate will still translate the commit action into zone=5,
>>> and the new traffic will still commit to CT as zone=5, not zone=15,
>>> resulting in taffic drops and other issues.
>>>
>>> Just like OVS set-field action, if the field X is set by Y, we should also
>>> mask Y as a match in the generated megaflow. An exception is that, if the zone
>>> is set by the field that is included in the frozen state and this upcall is
>>> a resume of a thawed xlate, the masking can be skipped, as if one changes
>>> the previous rule, the generated recirc_id will be changed, and all megaflows
>>> with the old recirc id will be invalid later, i.e. the revalidator will
>>> not reuse the old megaflows at all.
>>>
>>> Fixes: 07659514c3 ("Add support for connection tracking.")
>>> Reported-by: Sai Su <susai.ss@bytedance.com>
>>> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
>>> ---
>>>  include/openvswitch/meta-flow.h |  1 +
>>>  lib/meta-flow.c                 | 13 +++++++++++
>>>  ofproto/ofproto-dpif-xlate.c    | 15 +++++++++++++
>>>  tests/system-traffic.at         | 39 +++++++++++++++++++++++++++++++++
>>>  4 files changed, 68 insertions(+)
>>>
>>> diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
>>> index 95e52e358..045dce8f5 100644
>>> --- a/include/openvswitch/meta-flow.h
>>> +++ b/include/openvswitch/meta-flow.h
>>> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
>>>                                const union mf_value *mask,
>>>                                struct flow *);
>>>  bool mf_is_tun_metadata(const struct mf_field *);
>>> +bool mf_is_frozen_metadata(const struct mf_field *);
>>>  bool mf_is_pipeline_field(const struct mf_field *);
>>>  bool mf_is_set(const struct mf_field *, const struct flow *);
>>>  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
>>> diff --git a/lib/meta-flow.c b/lib/meta-flow.c
>>> index c808d205d..e03cd8d0c 100644
>>> --- a/lib/meta-flow.c
>>> +++ b/lib/meta-flow.c
>>> @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
>>>             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
>>>  }
>>>
>>> +bool
>>> +mf_is_frozen_metadata(const struct mf_field *mf)
>>> +{
>>> +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
>>> +        return true;
>>> +    }
>>> +
>>> +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
>>> +        return true;
>>> +    }
>>> +    return false;
>>> +}
>>> +
>>>  bool
>>>  mf_is_pipeline_field(const struct mf_field *mf)
>>>  {
>>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>>> index 7108c8a30..5d1f029fd 100644
>>> --- a/ofproto/ofproto-dpif-xlate.c
>>> +++ b/ofproto/ofproto-dpif-xlate.c
>>> @@ -6212,6 +6212,21 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
>>>                             &ctx->xin->flow, ctx->wc, zone);
>>>          }
>>>      }
>>> +
>>> +    if (ofc->zone_src.field) {
>>> +        if (ctx->xin->frozen_state) {
>>> +            /* If the upcall is a resume of a recirculation, we only need to
>>> +             * unwildcard the fields that are not in the frozen_metadata, as
>>> +             * when the rules update, OVS will generate a new recirc_id,
>>> +             * which will invalidate the megaflow with old the recirc_id.
>>> +             */
>>> +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
>>> +                mf_mask_field(ofc->zone_src.field, ctx->wc);
>> Is this the only field we should check and un-wildcard here. This seems
>> like it would be applicable across other fields.
> 
> if you mean that we should strictly limit the un-wildcarded fields to
> the subfield
> of the zone_src, not all the subfields of it, then yes.
> I think here we only extended to the field pointed by zone_src, not
> across other fields.
> 
>>> +            }
>>> +        } else {
>>> +            mf_mask_field(ofc->zone_src.field, ctx->wc);
>>> +        }
>>> +    }
>>
>> Add a new line after the bracket
>>
>>>      nl_msg_put_u16(ctx->odp_actions, OVS_CT_ATTR_ZONE, zone);
>>>      put_ct_mark(&ctx->xin->flow, ctx->odp_actions, ctx->wc);
>>>      put_ct_label(&ctx->xin->flow, ctx->odp_actions, ctx->wc);
>>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>>> index fb5b9a36d..bee50e530 100644
>>> --- a/tests/system-traffic.at
>>> +++ b/tests/system-traffic.at
>>> @@ -1927,6 +1927,45 @@ tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src=
>>>  OVS_TRAFFIC_VSWITCHD_STOP
>>>  AT_CLEANUP
>>>
>>> +AT_SETUP([conntrack - zones from other field])
>>> +CHECK_CONNTRACK()
>>> +OVS_TRAFFIC_VSWITCHD_START()
>>> +
>>> +ADD_NAMESPACES(at_ns0, at_ns1)
>>> +
>>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
>>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
>>> +
>>> +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
>>> +AT_DATA([flows.txt], [dnl
>>> +priority=1,action=drop
>>> +priority=10,arp,action=normal
>>> +priority=10,icmp,action=normal
>>> +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
>>> +priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
>>> +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5)
>>> +priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1
>>> +])
>>> +
>>> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
>>> +
>>> +OVS_START_L7([at_ns1], [http])
>>> +
>>> +dnl HTTP requests from p0->p1 should work fine.
>>> +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
>>> +
>>> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
>>> +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>)
>>> +])
>>> +
>>> +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | strip_used | grep "+trk" ], [0], [dnl
>>> +ct_state(+trk),ct_zone(0x5),recirc_id(0x2),in_port(ovs-p0),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no), packets:5, bytes:465, used:0.0s, flags:FP., actions:ct(commit,zone=5),ovs-p1
>>> +])
>>
>> This test fails for me. It looks like the order of the output from
>> dpctl/dump-flows is different and the recirc_id is different. Could
>> easily be an issue on my side but I am not sure what it is.
>>
>> +++ /home/magray/ovs/tests/system-kmod-testsuite.dir/at-groups/41/stdout
>>        2021-02-17 04:57:36.171766685 -0500
>> @@ -1,2 +1,2 @@
>> -ct_state(+trk),ct_zone(0x5),recirc_id(0x2),in_port(ovs-p0),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
>> packets:5, bytes:465, used:0.0s, flags:FP., actions:ct(commit,zone=5),ovs-p1
>> +recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_zone(0x5),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>> packets:5, bytes:465, used:0.0s, flags:FP., actions:ct(commit,zone=5),ovs-p1
>>
> 
> yes, the problem is the recirc_id, looks like we need a similar
> function like stirp_used, (strip_recirc_id)
> The case is to make sure that the ct_zone(0x5) should show up in the match.

There are a few other changes as well (eth(), order of in_port)
> 
>> It seems this test does not test the situation that you mention in the
>> commit message: "Consider a situation .. "? Can you add a check for that?
> 
> The commit describes how we discover the bug, but essentially this is
> just following the convention that when you try to load Y to X, (both
> Y and X are variable, not a Imm)
> you should un-wildcard Y in xlate process. I would like to add the
> check in the test suites, but I do not know if the
> current test framework provides enough tools for that. Will try to add
> the check in v2.
> 
> 
> 
>>
>>> +
>>> +OVS_TRAFFIC_VSWITCHD_STOP
>>> +AT_CLEANUP
>>> +
>>> +
>>>  AT_SETUP([conntrack - multiple bridges])
>>>  CHECK_CONNTRACK()
>>>  OVS_TRAFFIC_VSWITCHD_START(
>>>
>>
> 
>
Peng He Feb. 18, 2021, 9:44 a.m. UTC | #5
Hi, I've already sent a v2 patch.

Mark Gray <mark.d.gray@redhat.com> 于2021年2月17日周三 下午6:44写道:
>
> On 17/02/2021 10:40, 贺鹏 wrote:
> > Hi,
> >
> > Thanks for the review.
> >
> > Mark Gray <mark.d.gray@redhat.com> 于2021年2月17日周三 下午6:13写道:
> >>
> >> I'm not too familiar with this code but I have some comments.
> >>
> >> On 15/02/2021 09:50, Peng He wrote:
> >>> CT zone could be set from a field that is not included in frozen
> >>> metedata. Consider the belowing cases which is normally used in
> >>
> >> Nits:
> >>
> >> s/metedata/metadata
> >> s/belowing cases which is/cases below which are
> >
> > sorry for the typo, will fix it in the next version
> >
> >>
> >>> OpenStack security group rules:
> >>>
> >>> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> >>> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> >>>
> >>> The zone is set from the first rule's ct action. These two rules will
> >>> generate two megaflows: the first one uses zone=5 to query the CT module,
> >>> the second one set zone from the first megaflow and commit to CT.
> >>>
> >>> The current implementation will generate a megaflow which does not use
> >>> ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
> >>> set by an Imm not a field.
> >>>
> >>> Consider a situation that one changes the zone id (for example to 15)
> >>> in the first rule however still keep the second rule unchanged. During
> >>> this change, there is traffic hitting the two generated megaflows, the
> >>> revaldiator would revalidate all megaflows, however, the revalidator will
> >>> not change the second megaflow, because zone=5 is recorded in the
> >>> megaflow, so the xlate will still translate the commit action into zone=5,
> >>> and the new traffic will still commit to CT as zone=5, not zone=15,
> >>> resulting in taffic drops and other issues.
> >>>
> >>> Just like OVS set-field action, if the field X is set by Y, we should also
> >>> mask Y as a match in the generated megaflow. An exception is that, if the zone
> >>> is set by the field that is included in the frozen state and this upcall is
> >>> a resume of a thawed xlate, the masking can be skipped, as if one changes
> >>> the previous rule, the generated recirc_id will be changed, and all megaflows
> >>> with the old recirc id will be invalid later, i.e. the revalidator will
> >>> not reuse the old megaflows at all.
> >>>
> >>> Fixes: 07659514c3 ("Add support for connection tracking.")
> >>> Reported-by: Sai Su <susai.ss@bytedance.com>
> >>> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> >>> ---
> >>>  include/openvswitch/meta-flow.h |  1 +
> >>>  lib/meta-flow.c                 | 13 +++++++++++
> >>>  ofproto/ofproto-dpif-xlate.c    | 15 +++++++++++++
> >>>  tests/system-traffic.at         | 39 +++++++++++++++++++++++++++++++++
> >>>  4 files changed, 68 insertions(+)
> >>>
> >>> diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> >>> index 95e52e358..045dce8f5 100644
> >>> --- a/include/openvswitch/meta-flow.h
> >>> +++ b/include/openvswitch/meta-flow.h
> >>> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
> >>>                                const union mf_value *mask,
> >>>                                struct flow *);
> >>>  bool mf_is_tun_metadata(const struct mf_field *);
> >>> +bool mf_is_frozen_metadata(const struct mf_field *);
> >>>  bool mf_is_pipeline_field(const struct mf_field *);
> >>>  bool mf_is_set(const struct mf_field *, const struct flow *);
> >>>  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> >>> diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> >>> index c808d205d..e03cd8d0c 100644
> >>> --- a/lib/meta-flow.c
> >>> +++ b/lib/meta-flow.c
> >>> @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
> >>>             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
> >>>  }
> >>>
> >>> +bool
> >>> +mf_is_frozen_metadata(const struct mf_field *mf)
> >>> +{
> >>> +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> >>> +        return true;
> >>> +    }
> >>> +
> >>> +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> >>> +        return true;
> >>> +    }
> >>> +    return false;
> >>> +}
> >>> +
> >>>  bool
> >>>  mf_is_pipeline_field(const struct mf_field *mf)
> >>>  {
> >>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> >>> index 7108c8a30..5d1f029fd 100644
> >>> --- a/ofproto/ofproto-dpif-xlate.c
> >>> +++ b/ofproto/ofproto-dpif-xlate.c
> >>> @@ -6212,6 +6212,21 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
> >>>                             &ctx->xin->flow, ctx->wc, zone);
> >>>          }
> >>>      }
> >>> +
> >>> +    if (ofc->zone_src.field) {
> >>> +        if (ctx->xin->frozen_state) {
> >>> +            /* If the upcall is a resume of a recirculation, we only need to
> >>> +             * unwildcard the fields that are not in the frozen_metadata, as
> >>> +             * when the rules update, OVS will generate a new recirc_id,
> >>> +             * which will invalidate the megaflow with old the recirc_id.
> >>> +             */
> >>> +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> >>> +                mf_mask_field(ofc->zone_src.field, ctx->wc);
> >> Is this the only field we should check and un-wildcard here. This seems
> >> like it would be applicable across other fields.
> >
> > if you mean that we should strictly limit the un-wildcarded fields to
> > the subfield
> > of the zone_src, not all the subfields of it, then yes.
> > I think here we only extended to the field pointed by zone_src, not
> > across other fields.
> >
> >>> +            }
> >>> +        } else {
> >>> +            mf_mask_field(ofc->zone_src.field, ctx->wc);
> >>> +        }
> >>> +    }
> >>
> >> Add a new line after the bracket
> >>
> >>>      nl_msg_put_u16(ctx->odp_actions, OVS_CT_ATTR_ZONE, zone);
> >>>      put_ct_mark(&ctx->xin->flow, ctx->odp_actions, ctx->wc);
> >>>      put_ct_label(&ctx->xin->flow, ctx->odp_actions, ctx->wc);
> >>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> >>> index fb5b9a36d..bee50e530 100644
> >>> --- a/tests/system-traffic.at
> >>> +++ b/tests/system-traffic.at
> >>> @@ -1927,6 +1927,45 @@ tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src=
> >>>  OVS_TRAFFIC_VSWITCHD_STOP
> >>>  AT_CLEANUP
> >>>
> >>> +AT_SETUP([conntrack - zones from other field])
> >>> +CHECK_CONNTRACK()
> >>> +OVS_TRAFFIC_VSWITCHD_START()
> >>> +
> >>> +ADD_NAMESPACES(at_ns0, at_ns1)
> >>> +
> >>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> >>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> >>> +
> >>> +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> >>> +AT_DATA([flows.txt], [dnl
> >>> +priority=1,action=drop
> >>> +priority=10,arp,action=normal
> >>> +priority=10,icmp,action=normal
> >>> +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> >>> +priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> >>> +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5)
> >>> +priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1
> >>> +])
> >>> +
> >>> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> >>> +
> >>> +OVS_START_L7([at_ns1], [http])
> >>> +
> >>> +dnl HTTP requests from p0->p1 should work fine.
> >>> +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> >>> +
> >>> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> >>> +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>)
> >>> +])
> >>> +
> >>> +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | strip_used | grep "+trk" ], [0], [dnl
> >>> +ct_state(+trk),ct_zone(0x5),recirc_id(0x2),in_port(ovs-p0),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no), packets:5, bytes:465, used:0.0s, flags:FP., actions:ct(commit,zone=5),ovs-p1
> >>> +])
> >>
> >> This test fails for me. It looks like the order of the output from
> >> dpctl/dump-flows is different and the recirc_id is different. Could
> >> easily be an issue on my side but I am not sure what it is.
> >>
> >> +++ /home/magray/ovs/tests/system-kmod-testsuite.dir/at-groups/41/stdout
> >>        2021-02-17 04:57:36.171766685 -0500
> >> @@ -1,2 +1,2 @@
> >> -ct_state(+trk),ct_zone(0x5),recirc_id(0x2),in_port(ovs-p0),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
> >> packets:5, bytes:465, used:0.0s, flags:FP., actions:ct(commit,zone=5),ovs-p1
> >> +recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_zone(0x5),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
> >> packets:5, bytes:465, used:0.0s, flags:FP., actions:ct(commit,zone=5),ovs-p1
> >>
> >
> > yes, the problem is the recirc_id, looks like we need a similar
> > function like stirp_used, (strip_recirc_id)
> > The case is to make sure that the ct_zone(0x5) should show up in the match.
>
> There are a few other changes as well (eth(), order of in_port)
> >
> >> It seems this test does not test the situation that you mention in the
> >> commit message: "Consider a situation .. "? Can you add a check for that?
> >
> > The commit describes how we discover the bug, but essentially this is
> > just following the convention that when you try to load Y to X, (both
> > Y and X are variable, not a Imm)
> > you should un-wildcard Y in xlate process. I would like to add the
> > check in the test suites, but I do not know if the
> > current test framework provides enough tools for that. Will try to add
> > the check in v2.
> >
> >
> >
> >>
> >>> +
> >>> +OVS_TRAFFIC_VSWITCHD_STOP
> >>> +AT_CLEANUP
> >>> +
> >>> +
> >>>  AT_SETUP([conntrack - multiple bridges])
> >>>  CHECK_CONNTRACK()
> >>>  OVS_TRAFFIC_VSWITCHD_START(
> >>>
> >>
> >
> >
>
diff mbox series

Patch

diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
index 95e52e358..045dce8f5 100644
--- a/include/openvswitch/meta-flow.h
+++ b/include/openvswitch/meta-flow.h
@@ -2305,6 +2305,7 @@  void mf_set_flow_value_masked(const struct mf_field *,
                               const union mf_value *mask,
                               struct flow *);
 bool mf_is_tun_metadata(const struct mf_field *);
+bool mf_is_frozen_metadata(const struct mf_field *);
 bool mf_is_pipeline_field(const struct mf_field *);
 bool mf_is_set(const struct mf_field *, const struct flow *);
 void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
diff --git a/lib/meta-flow.c b/lib/meta-flow.c
index c808d205d..e03cd8d0c 100644
--- a/lib/meta-flow.c
+++ b/lib/meta-flow.c
@@ -1788,6 +1788,19 @@  mf_is_tun_metadata(const struct mf_field *mf)
            mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
 }
 
+bool
+mf_is_frozen_metadata(const struct mf_field *mf)
+{
+    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
+        return true;
+    }
+
+    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
+        return true;
+    }
+    return false;
+}
+
 bool
 mf_is_pipeline_field(const struct mf_field *mf)
 {
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 7108c8a30..5d1f029fd 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -6212,6 +6212,21 @@  compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
                            &ctx->xin->flow, ctx->wc, zone);
         }
     }
+
+    if (ofc->zone_src.field) {
+        if (ctx->xin->frozen_state) {
+            /* If the upcall is a resume of a recirculation, we only need to
+             * unwildcard the fields that are not in the frozen_metadata, as
+             * when the rules update, OVS will generate a new recirc_id,
+             * which will invalidate the megaflow with old the recirc_id.
+             */
+            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
+                mf_mask_field(ofc->zone_src.field, ctx->wc);
+            }
+        } else {
+            mf_mask_field(ofc->zone_src.field, ctx->wc);
+        }
+    }
     nl_msg_put_u16(ctx->odp_actions, OVS_CT_ATTR_ZONE, zone);
     put_ct_mark(&ctx->xin->flow, ctx->odp_actions, ctx->wc);
     put_ct_label(&ctx->xin->flow, ctx->odp_actions, ctx->wc);
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index fb5b9a36d..bee50e530 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1927,6 +1927,45 @@  tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src=
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - zones from other field])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=10,icmp,action=normal
+priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
+priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
+priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5)
+priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+OVS_START_L7([at_ns1], [http])
+
+dnl HTTP requests from p0->p1 should work fine.
+NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
+tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>)
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | strip_used | grep "+trk" ], [0], [dnl
+ct_state(+trk),ct_zone(0x5),recirc_id(0x2),in_port(ovs-p0),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no), packets:5, bytes:465, used:0.0s, flags:FP., actions:ct(commit,zone=5),ovs-p1
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+
 AT_SETUP([conntrack - multiple bridges])
 CHECK_CONNTRACK()
 OVS_TRAFFIC_VSWITCHD_START(