diff mbox series

[ovs-dev,v2] ofproto-dpif-xlate: fix zone set from non-frozen-metadata fields

Message ID 20210218094347.19903-1-hepeng.0320@bytedance.com
State Changes Requested
Headers show
Series [ovs-dev,v2] ofproto-dpif-xlate: fix zone set from non-frozen-metadata fields | expand

Commit Message

Peng He Feb. 18, 2021, 9:43 a.m. UTC
CT zone could be set from a field that is not included in frozen
metadata. Consider the example rules which are typically seen in
OpenStack security group rules:

priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2

The zone is set from the first rule's ct action. These two rules will
generate two megaflows: the first one uses zone=5 to query the CT module,
the second one sets the zone-id from the first megaflow and commit to CT.

The current implementation will generate a megaflow that does not use
ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
set by an Imm not a field.

Consider a situation that one changes the zone id (for example to 15)
in the first rule, however, still keep the second rule unchanged. During
this change, there is traffic hitting the two generated megaflows, the
revaldiator would revalidate all megaflows, however, the revalidator will
not change the second megaflow, because zone=5 is recorded in the
megaflow, so the xlate will still translate the commit action into zone=5,
and the new traffic will still commit to CT as zone=5, not zone=15,
resulting in taffic drops and other issues.

Just like OVS set-field convention, if a field X is set by Y
(Y is a variable not an Imm), we should also mask Y as a match
in the generated megaflow. An exception is that if the zone-id is
set by the field that is included in the frozen state (i.e. regs) and this
upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
as the recirc_id is a hash of the values in these fields, and it will change
following the changes of these fields. When the recirc_id changes,
all megaflows with the old recirc id will be invalid later.

Fixes: 07659514c3 ("Add support for connection tracking.")
Reported-by: Sai Su <susai.ss@bytedance.com>
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
---
 include/openvswitch/meta-flow.h |  1 +
 lib/meta-flow.c                 | 13 ++++++++++
 ofproto/ofproto-dpif-xlate.c    | 12 ++++++++++
 tests/system-traffic.at         | 42 +++++++++++++++++++++++++++++++++
 4 files changed, 68 insertions(+)

Comments

0-day Robot Feb. 18, 2021, 9:58 a.m. UTC | #1
Bleep bloop.  Greetings 贺鹏, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Peng He <xnhp0320@gmail.com> needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Peng He <hepeng.0320@bytedance.com>
Lines checked: 163, Warnings: 1, Errors: 1


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Mark Gray Feb. 19, 2021, 1:07 p.m. UTC | #2
On 18/02/2021 09:43, Peng He wrote:

One small comment below. By the way, good catch .. i found it difficult
enough to review it, let alone find it!

> CT zone could be set from a field that is not included in frozen
> metadata. Consider the example rules which are typically seen in
> OpenStack security group rules:
> 
> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> 
> The zone is set from the first rule's ct action. These two rules will
> generate two megaflows: the first one uses zone=5 to query the CT module,
> the second one sets the zone-id from the first megaflow and commit to CT.
> 
> The current implementation will generate a megaflow that does not use
> ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
> set by an Imm not a field.
> 
> Consider a situation that one changes the zone id (for example to 15)
> in the first rule, however, still keep the second rule unchanged. During
> this change, there is traffic hitting the two generated megaflows, the
> revaldiator would revalidate all megaflows, however, the revalidator will
> not change the second megaflow, because zone=5 is recorded in the
> megaflow, so the xlate will still translate the commit action into zone=5,
> and the new traffic will still commit to CT as zone=5, not zone=15,
> resulting in taffic drops and other issues.
> 
> Just like OVS set-field convention, if a field X is set by Y
> (Y is a variable not an Imm), we should also mask Y as a match
> in the generated megaflow. An exception is that if the zone-id is
> set by the field that is included in the frozen state (i.e. regs) and this
> upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
> as the recirc_id is a hash of the values in these fields, and it will change
> following the changes of these fields. When the recirc_id changes,
> all megaflows with the old recirc id will be invalid later.
> 
> Fixes: 07659514c3 ("Add support for connection tracking.")
> Reported-by: Sai Su <susai.ss@bytedance.com>
> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> ---
>  include/openvswitch/meta-flow.h |  1 +
>  lib/meta-flow.c                 | 13 ++++++++++
>  ofproto/ofproto-dpif-xlate.c    | 12 ++++++++++
>  tests/system-traffic.at         | 42 +++++++++++++++++++++++++++++++++
>  4 files changed, 68 insertions(+)
> 
> diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> index 95e52e358..045dce8f5 100644
> --- a/include/openvswitch/meta-flow.h
> +++ b/include/openvswitch/meta-flow.h
> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
>                                const union mf_value *mask,
>                                struct flow *);
>  bool mf_is_tun_metadata(const struct mf_field *);
> +bool mf_is_frozen_metadata(const struct mf_field *);
>  bool mf_is_pipeline_field(const struct mf_field *);
>  bool mf_is_set(const struct mf_field *, const struct flow *);
>  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> index c808d205d..e03cd8d0c 100644
> --- a/lib/meta-flow.c
> +++ b/lib/meta-flow.c
> @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
>             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
>  }
>  
> +bool
> +mf_is_frozen_metadata(const struct mf_field *mf)
> +{
> +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> +        return true;
> +    }
> +
> +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> +        return true;
> +    }
> +    return false;
> +}
> +
>  bool
>  mf_is_pipeline_field(const struct mf_field *mf)
>  {
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 7108c8a30..14d00db1e 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
>  
>      if (ofc->zone_src.field) {
>          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> +        if (ctx->xin->frozen_state) {
> +            /* If the upcall is a resume of a recirculation, we only need to
> +             * unwildcard the fields that are not in the frozen_metadata, as
> +             * when the rules update, OVS will generate a new recirc_id,
> +             * which will invalidate the megaflow with old the recirc_id.
> +             */
> +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> +                mf_mask_field(ofc->zone_src.field, ctx->wc);
> +            }
> +        } else {
> +            mf_mask_field(ofc->zone_src.field, ctx->wc);
> +        }
>      } else {
>          zone = ofc->zone_imm;
>      }
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index fb5b9a36d..84a9759ea 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -1927,6 +1927,48 @@ tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src=
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>  
> +AT_SETUP([conntrack - zones from other field])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> +AT_DATA([flows.txt], [dnl
> +priority=1,action=drop
> +priority=10,arp,action=normal
> +priority=10,icmp,action=normal
> +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> +priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5)
> +priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +OVS_START_L7([at_ns1], [http])
> +
> +dnl HTTP requests from p0->p1 should work fine.
> +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>)
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0x5)" ], [0], [])
> +
> +dnl changes zone-id

Could you add a comment here explaining what you are doing and why? I
realise it is in the commit message but not everyone will look there.
> +AT_CHECK([ovs-ofctl mod-flows br0 'priority=100,ct_state=-trk,tcp,in_port="ovs-p0" actions=ct(table=0,zone=15)'])
> +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0xf)" ], [0], [])

One thing that I noticed is that the ct_zone(0x5) flow is still in the
fast path. I think this is ok though as it should expire and I cant
think of any unwanted side-effects that this would cause? wdyt?
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +
>  AT_SETUP([conntrack - multiple bridges])
>  CHECK_CONNTRACK()
>  OVS_TRAFFIC_VSWITCHD_START(
>
Peng He Feb. 20, 2021, 2:21 a.m. UTC | #3
Hi, Mark,

thanks for the review!

Mark Gray <mark.d.gray@redhat.com> 于2021年2月19日周五 下午9:07写道:
>
> On 18/02/2021 09:43, Peng He wrote:
>
> One small comment below. By the way, good catch .. i found it difficult
> enough to review it, let alone find it!

Thanks.
This is found by exploring our controller's bug, which changes zone-id
every time it restarts.

>
> > CT zone could be set from a field that is not included in frozen
> > metadata. Consider the example rules which are typically seen in
> > OpenStack security group rules:
> >
> > priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> > priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> >
> > The zone is set from the first rule's ct action. These two rules will
> > generate two megaflows: the first one uses zone=5 to query the CT module,
> > the second one sets the zone-id from the first megaflow and commit to CT.
> >
> > The current implementation will generate a megaflow that does not use
> > ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
> > set by an Imm not a field.
> >
> > Consider a situation that one changes the zone id (for example to 15)
> > in the first rule, however, still keep the second rule unchanged. During
> > this change, there is traffic hitting the two generated megaflows, the
> > revaldiator would revalidate all megaflows, however, the revalidator will
> > not change the second megaflow, because zone=5 is recorded in the
> > megaflow, so the xlate will still translate the commit action into zone=5,
> > and the new traffic will still commit to CT as zone=5, not zone=15,
> > resulting in taffic drops and other issues.
> >
> > Just like OVS set-field convention, if a field X is set by Y
> > (Y is a variable not an Imm), we should also mask Y as a match
> > in the generated megaflow. An exception is that if the zone-id is
> > set by the field that is included in the frozen state (i.e. regs) and this
> > upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
> > as the recirc_id is a hash of the values in these fields, and it will change
> > following the changes of these fields. When the recirc_id changes,
> > all megaflows with the old recirc id will be invalid later.
> >
> > Fixes: 07659514c3 ("Add support for connection tracking.")
> > Reported-by: Sai Su <susai.ss@bytedance.com>
> > Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> > ---
> >  include/openvswitch/meta-flow.h |  1 +
> >  lib/meta-flow.c                 | 13 ++++++++++
> >  ofproto/ofproto-dpif-xlate.c    | 12 ++++++++++
> >  tests/system-traffic.at         | 42 +++++++++++++++++++++++++++++++++
> >  4 files changed, 68 insertions(+)
> >
> > diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> > index 95e52e358..045dce8f5 100644
> > --- a/include/openvswitch/meta-flow.h
> > +++ b/include/openvswitch/meta-flow.h
> > @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
> >                                const union mf_value *mask,
> >                                struct flow *);
> >  bool mf_is_tun_metadata(const struct mf_field *);
> > +bool mf_is_frozen_metadata(const struct mf_field *);
> >  bool mf_is_pipeline_field(const struct mf_field *);
> >  bool mf_is_set(const struct mf_field *, const struct flow *);
> >  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> > diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> > index c808d205d..e03cd8d0c 100644
> > --- a/lib/meta-flow.c
> > +++ b/lib/meta-flow.c
> > @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
> >             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
> >  }
> >
> > +bool
> > +mf_is_frozen_metadata(const struct mf_field *mf)
> > +{
> > +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> > +        return true;
> > +    }
> > +
> > +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> > +        return true;
> > +    }
> > +    return false;
> > +}
> > +
> >  bool
> >  mf_is_pipeline_field(const struct mf_field *mf)
> >  {
> > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> > index 7108c8a30..14d00db1e 100644
> > --- a/ofproto/ofproto-dpif-xlate.c
> > +++ b/ofproto/ofproto-dpif-xlate.c
> > @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
> >
> >      if (ofc->zone_src.field) {
> >          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> > +        if (ctx->xin->frozen_state) {
> > +            /* If the upcall is a resume of a recirculation, we only need to
> > +             * unwildcard the fields that are not in the frozen_metadata, as
> > +             * when the rules update, OVS will generate a new recirc_id,
> > +             * which will invalidate the megaflow with old the recirc_id.
> > +             */
> > +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> > +                mf_mask_field(ofc->zone_src.field, ctx->wc);
> > +            }
> > +        } else {
> > +            mf_mask_field(ofc->zone_src.field, ctx->wc);
> > +        }
> >      } else {
> >          zone = ofc->zone_imm;
> >      }
> > diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> > index fb5b9a36d..84a9759ea 100644
> > --- a/tests/system-traffic.at
> > +++ b/tests/system-traffic.at
> > @@ -1927,6 +1927,48 @@ tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src=
> >  OVS_TRAFFIC_VSWITCHD_STOP
> >  AT_CLEANUP
> >
> > +AT_SETUP([conntrack - zones from other field])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=10,icmp,action=normal
> > +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> > +priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> > +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5)
> > +priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +OVS_START_L7([at_ns1], [http])
> > +
> > +dnl HTTP requests from p0->p1 should work fine.
> > +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> > +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>)
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0x5)" ], [0], [])
> > +
> > +dnl changes zone-id
>
> Could you add a comment here explaining what you are doing and why? I
> realise it is in the commit message but not everyone will look there.

Sure, will add it in the next version.

> > +AT_CHECK([ovs-ofctl mod-flows br0 'priority=100,ct_state=-trk,tcp,in_port="ovs-p0" actions=ct(table=0,zone=15)'])
> > +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> > +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0xf)" ], [0], [])
>
> One thing that I noticed is that the ct_zone(0x5) flow is still in the
> fast path. I think this is ok though as it should expire and I cant
> think of any unwanted side-effects that this would cause? wdyt?

I think this will not have side-effects as the ct_zone(0x5) is in a
megaflow that has recirc_id. The rule that
queries CT with a special zone-id has changed into 0xf, however, since
CT_ZONE is not included in the frozen_state metadata,
it will not change the recirc_id's hash value, i.e. the action part of
the megaflow that queries CT (the action is with the form of
"ct(zone=15), recirc(ID)") will
keep using the same recirc_id, so both ct_zone(0x5) and ct_zone(0xf)
will have the same recirc_id(ID) in their matches and only one of them
will get matched,
the old one will expire later since no packets will hit this megaflow,
and all sessions in CT with the old ct_zone will expire too later.


> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +
> >  AT_SETUP([conntrack - multiple bridges])
> >  CHECK_CONNTRACK()
> >  OVS_TRAFFIC_VSWITCHD_START(
> >
>
Peng He Feb. 27, 2021, 9:35 a.m. UTC | #4
Hi,

Sent a new version of the patch.

贺鹏 <xnhp0320@gmail.com> 于2021年2月20日周六 上午10:21写道:
>
> Hi, Mark,
>
> thanks for the review!
>
> Mark Gray <mark.d.gray@redhat.com> 于2021年2月19日周五 下午9:07写道:
> >
> > On 18/02/2021 09:43, Peng He wrote:
> >
> > One small comment below. By the way, good catch .. i found it difficult
> > enough to review it, let alone find it!
>
> Thanks.
> This is found by exploring our controller's bug, which changes zone-id
> every time it restarts.
>
> >
> > > CT zone could be set from a field that is not included in frozen
> > > metadata. Consider the example rules which are typically seen in
> > > OpenStack security group rules:
> > >
> > > priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> > > priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> > >
> > > The zone is set from the first rule's ct action. These two rules will
> > > generate two megaflows: the first one uses zone=5 to query the CT module,
> > > the second one sets the zone-id from the first megaflow and commit to CT.
> > >
> > > The current implementation will generate a megaflow that does not use
> > > ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
> > > set by an Imm not a field.
> > >
> > > Consider a situation that one changes the zone id (for example to 15)
> > > in the first rule, however, still keep the second rule unchanged. During
> > > this change, there is traffic hitting the two generated megaflows, the
> > > revaldiator would revalidate all megaflows, however, the revalidator will
> > > not change the second megaflow, because zone=5 is recorded in the
> > > megaflow, so the xlate will still translate the commit action into zone=5,
> > > and the new traffic will still commit to CT as zone=5, not zone=15,
> > > resulting in taffic drops and other issues.
> > >
> > > Just like OVS set-field convention, if a field X is set by Y
> > > (Y is a variable not an Imm), we should also mask Y as a match
> > > in the generated megaflow. An exception is that if the zone-id is
> > > set by the field that is included in the frozen state (i.e. regs) and this
> > > upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
> > > as the recirc_id is a hash of the values in these fields, and it will change
> > > following the changes of these fields. When the recirc_id changes,
> > > all megaflows with the old recirc id will be invalid later.
> > >
> > > Fixes: 07659514c3 ("Add support for connection tracking.")
> > > Reported-by: Sai Su <susai.ss@bytedance.com>
> > > Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> > > ---
> > >  include/openvswitch/meta-flow.h |  1 +
> > >  lib/meta-flow.c                 | 13 ++++++++++
> > >  ofproto/ofproto-dpif-xlate.c    | 12 ++++++++++
> > >  tests/system-traffic.at         | 42 +++++++++++++++++++++++++++++++++
> > >  4 files changed, 68 insertions(+)
> > >
> > > diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> > > index 95e52e358..045dce8f5 100644
> > > --- a/include/openvswitch/meta-flow.h
> > > +++ b/include/openvswitch/meta-flow.h
> > > @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
> > >                                const union mf_value *mask,
> > >                                struct flow *);
> > >  bool mf_is_tun_metadata(const struct mf_field *);
> > > +bool mf_is_frozen_metadata(const struct mf_field *);
> > >  bool mf_is_pipeline_field(const struct mf_field *);
> > >  bool mf_is_set(const struct mf_field *, const struct flow *);
> > >  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> > > diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> > > index c808d205d..e03cd8d0c 100644
> > > --- a/lib/meta-flow.c
> > > +++ b/lib/meta-flow.c
> > > @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
> > >             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
> > >  }
> > >
> > > +bool
> > > +mf_is_frozen_metadata(const struct mf_field *mf)
> > > +{
> > > +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> > > +        return true;
> > > +    }
> > > +
> > > +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> > > +        return true;
> > > +    }
> > > +    return false;
> > > +}
> > > +
> > >  bool
> > >  mf_is_pipeline_field(const struct mf_field *mf)
> > >  {
> > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> > > index 7108c8a30..14d00db1e 100644
> > > --- a/ofproto/ofproto-dpif-xlate.c
> > > +++ b/ofproto/ofproto-dpif-xlate.c
> > > @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
> > >
> > >      if (ofc->zone_src.field) {
> > >          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> > > +        if (ctx->xin->frozen_state) {
> > > +            /* If the upcall is a resume of a recirculation, we only need to
> > > +             * unwildcard the fields that are not in the frozen_metadata, as
> > > +             * when the rules update, OVS will generate a new recirc_id,
> > > +             * which will invalidate the megaflow with old the recirc_id.
> > > +             */
> > > +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> > > +                mf_mask_field(ofc->zone_src.field, ctx->wc);
> > > +            }
> > > +        } else {
> > > +            mf_mask_field(ofc->zone_src.field, ctx->wc);
> > > +        }
> > >      } else {
> > >          zone = ofc->zone_imm;
> > >      }
> > > diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> > > index fb5b9a36d..84a9759ea 100644
> > > --- a/tests/system-traffic.at
> > > +++ b/tests/system-traffic.at
> > > @@ -1927,6 +1927,48 @@ tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src=
> > >  OVS_TRAFFIC_VSWITCHD_STOP
> > >  AT_CLEANUP
> > >
> > > +AT_SETUP([conntrack - zones from other field])
> > > +CHECK_CONNTRACK()
> > > +OVS_TRAFFIC_VSWITCHD_START()
> > > +
> > > +ADD_NAMESPACES(at_ns0, at_ns1)
> > > +
> > > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> > > +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> > > +
> > > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> > > +AT_DATA([flows.txt], [dnl
> > > +priority=1,action=drop
> > > +priority=10,arp,action=normal
> > > +priority=10,icmp,action=normal
> > > +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> > > +priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> > > +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5)
> > > +priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1
> > > +])
> > > +
> > > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > > +
> > > +OVS_START_L7([at_ns1], [http])
> > > +
> > > +dnl HTTP requests from p0->p1 should work fine.
> > > +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> > > +
> > > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> > > +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>)
> > > +])
> > > +
> > > +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0x5)" ], [0], [])
> > > +
> > > +dnl changes zone-id
> >
> > Could you add a comment here explaining what you are doing and why? I
> > realise it is in the commit message but not everyone will look there.
>
> Sure, will add it in the next version.
>
> > > +AT_CHECK([ovs-ofctl mod-flows br0 'priority=100,ct_state=-trk,tcp,in_port="ovs-p0" actions=ct(table=0,zone=15)'])
> > > +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> > > +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0xf)" ], [0], [])
> >
> > One thing that I noticed is that the ct_zone(0x5) flow is still in the
> > fast path. I think this is ok though as it should expire and I cant
> > think of any unwanted side-effects that this would cause? wdyt?
>
> I think this will not have side-effects as the ct_zone(0x5) is in a
> megaflow that has recirc_id. The rule that
> queries CT with a special zone-id has changed into 0xf, however, since
> CT_ZONE is not included in the frozen_state metadata,
> it will not change the recirc_id's hash value, i.e. the action part of
> the megaflow that queries CT (the action is with the form of
> "ct(zone=15), recirc(ID)") will
> keep using the same recirc_id, so both ct_zone(0x5) and ct_zone(0xf)
> will have the same recirc_id(ID) in their matches and only one of them
> will get matched,
> the old one will expire later since no packets will hit this megaflow,
> and all sessions in CT with the old ct_zone will expire too later.
>
>
> > > +
> > > +OVS_TRAFFIC_VSWITCHD_STOP
> > > +AT_CLEANUP
> > > +
> > > +
> > >  AT_SETUP([conntrack - multiple bridges])
> > >  CHECK_CONNTRACK()
> > >  OVS_TRAFFIC_VSWITCHD_START(
> > >
> >
>
>
> --
> hepeng
diff mbox series

Patch

diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
index 95e52e358..045dce8f5 100644
--- a/include/openvswitch/meta-flow.h
+++ b/include/openvswitch/meta-flow.h
@@ -2305,6 +2305,7 @@  void mf_set_flow_value_masked(const struct mf_field *,
                               const union mf_value *mask,
                               struct flow *);
 bool mf_is_tun_metadata(const struct mf_field *);
+bool mf_is_frozen_metadata(const struct mf_field *);
 bool mf_is_pipeline_field(const struct mf_field *);
 bool mf_is_set(const struct mf_field *, const struct flow *);
 void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
diff --git a/lib/meta-flow.c b/lib/meta-flow.c
index c808d205d..e03cd8d0c 100644
--- a/lib/meta-flow.c
+++ b/lib/meta-flow.c
@@ -1788,6 +1788,19 @@  mf_is_tun_metadata(const struct mf_field *mf)
            mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
 }
 
+bool
+mf_is_frozen_metadata(const struct mf_field *mf)
+{
+    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
+        return true;
+    }
+
+    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
+        return true;
+    }
+    return false;
+}
+
 bool
 mf_is_pipeline_field(const struct mf_field *mf)
 {
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 7108c8a30..14d00db1e 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -6195,6 +6195,18 @@  compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
 
     if (ofc->zone_src.field) {
         zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
+        if (ctx->xin->frozen_state) {
+            /* If the upcall is a resume of a recirculation, we only need to
+             * unwildcard the fields that are not in the frozen_metadata, as
+             * when the rules update, OVS will generate a new recirc_id,
+             * which will invalidate the megaflow with old the recirc_id.
+             */
+            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
+                mf_mask_field(ofc->zone_src.field, ctx->wc);
+            }
+        } else {
+            mf_mask_field(ofc->zone_src.field, ctx->wc);
+        }
     } else {
         zone = ofc->zone_imm;
     }
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index fb5b9a36d..84a9759ea 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1927,6 +1927,48 @@  tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src=
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - zones from other field])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=10,icmp,action=normal
+priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
+priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
+priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5)
+priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+OVS_START_L7([at_ns1], [http])
+
+dnl HTTP requests from p0->p1 should work fine.
+NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
+tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>)
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0x5)" ], [0], [])
+
+dnl changes zone-id
+AT_CHECK([ovs-ofctl mod-flows br0 'priority=100,ct_state=-trk,tcp,in_port="ovs-p0" actions=ct(table=0,zone=15)'])
+NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0xf)" ], [0], [])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+
 AT_SETUP([conntrack - multiple bridges])
 CHECK_CONNTRACK()
 OVS_TRAFFIC_VSWITCHD_START(