diff mbox series

[ovs-dev,v3] ofproto-dpif-xlate: fix zone set from non-frozen-metadata fields

Message ID 20210227093430.44978-1-hepeng.0320@bytedance.com
State Changes Requested
Headers show
Series [ovs-dev,v3] ofproto-dpif-xlate: fix zone set from non-frozen-metadata fields | expand

Commit Message

Peng He Feb. 27, 2021, 9:34 a.m. UTC
CT zone could be set from a field that is not included in frozen
metadata. Consider the example rules which are typically seen in
OpenStack security group rules:

priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2

The zone is set from the first rule's ct action. These two rules will
generate two megaflows: the first one uses zone=5 to query the CT module,
the second one sets the zone-id from the first megaflow and commit to CT.

The current implementation will generate a megaflow that does not use
ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
set by an Imm not a field.

Consider a situation that one changes the zone id (for example to 15)
in the first rule, however, still keep the second rule unchanged. During
this change, there is traffic hitting the two generated megaflows, the
revaldiator would revalidate all megaflows, however, the revalidator will
not change the second megaflow, because zone=5 is recorded in the
megaflow, so the xlate will still translate the commit action into zone=5,
and the new traffic will still commit to CT as zone=5, not zone=15,
resulting in taffic drops and other issues.

Just like OVS set-field convention, if a field X is set by Y
(Y is a variable not an Imm), we should also mask Y as a match
in the generated megaflow. An exception is that if the zone-id is
set by the field that is included in the frozen state (i.e. regs) and this
upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
as the recirc_id is a hash of the values in these fields, and it will change
following the changes of these fields. When the recirc_id changes,
all megaflows with the old recirc id will be invalid later.

Fixes: 07659514c3 ("Add support for connection tracking.")
Reported-by: Sai Su <susai.ss@bytedance.com>
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
---
 include/openvswitch/meta-flow.h |  1 +
 lib/meta-flow.c                 | 13 ++++++++++
 ofproto/ofproto-dpif-xlate.c    | 12 +++++++++
 tests/system-traffic.at         | 45 +++++++++++++++++++++++++++++++++
 4 files changed, 71 insertions(+)

Comments

0-day Robot Feb. 27, 2021, 9:58 a.m. UTC | #1
Bleep bloop.  Greetings 贺鹏, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Peng He <xnhp0320@gmail.com> needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Peng He <hepeng.0320@bytedance.com>
WARNING: Line has trailing whitespace
#147 FILE: tests/system-traffic.at:1961:
dnl This is to test when the zoneid is set by a field variable like NXM_NX_CT_ZONE, the OVS xlate should generate a megaflow with a form of 

Lines checked: 166, Warnings: 2, Errors: 1


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Mark Gray March 5, 2021, 11:54 a.m. UTC | #2
On 27/02/2021 09:34, Peng He wrote:
> CT zone could be set from a field that is not included in frozen
> metadata. Consider the example rules which are typically seen in
> OpenStack security group rules:
> 
> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> 
> The zone is set from the first rule's ct action. These two rules will
> generate two megaflows: the first one uses zone=5 to query the CT module,
> the second one sets the zone-id from the first megaflow and commit to CT.
> 
> The current implementation will generate a megaflow that does not use
> ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
> set by an Imm not a field.
> 
> Consider a situation that one changes the zone id (for example to 15)
> in the first rule, however, still keep the second rule unchanged. During
> this change, there is traffic hitting the two generated megaflows, the
> revaldiator would revalidate all megaflows, however, the revalidator will
> not change the second megaflow, because zone=5 is recorded in the
> megaflow, so the xlate will still translate the commit action into zone=5,
> and the new traffic will still commit to CT as zone=5, not zone=15,
> resulting in taffic drops and other issues.
> 
> Just like OVS set-field convention, if a field X is set by Y
> (Y is a variable not an Imm), we should also mask Y as a match
> in the generated megaflow. An exception is that if the zone-id is
> set by the field that is included in the frozen state (i.e. regs) and this
> upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
> as the recirc_id is a hash of the values in these fields, and it will change
> following the changes of these fields. When the recirc_id changes,
> all megaflows with the old recirc id will be invalid later.

This looks good to me and all the unit-tests pass.

There is some trailing whitespace. You can run
"./utilities/checkpatch.py" when submitting to catch them before the
0-day robot does. Its a bit of a nit and I don't know if this won't get
committed because of it. That's a decision for a maintainer.
> 
> Fixes: 07659514c3 ("Add support for connection tracking.")
> Reported-by: Sai Su <susai.ss@bytedance.com>
> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> ---
>  include/openvswitch/meta-flow.h |  1 +
>  lib/meta-flow.c                 | 13 ++++++++++
>  ofproto/ofproto-dpif-xlate.c    | 12 +++++++++
>  tests/system-traffic.at         | 45 +++++++++++++++++++++++++++++++++
>  4 files changed, 71 insertions(+)
> 
> diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> index 95e52e358..045dce8f5 100644
> --- a/include/openvswitch/meta-flow.h
> +++ b/include/openvswitch/meta-flow.h
> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
>                                const union mf_value *mask,
>                                struct flow *);
>  bool mf_is_tun_metadata(const struct mf_field *);
> +bool mf_is_frozen_metadata(const struct mf_field *);
>  bool mf_is_pipeline_field(const struct mf_field *);
>  bool mf_is_set(const struct mf_field *, const struct flow *);
>  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> index c808d205d..e03cd8d0c 100644
> --- a/lib/meta-flow.c
> +++ b/lib/meta-flow.c
> @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
>             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
>  }
>  
> +bool
> +mf_is_frozen_metadata(const struct mf_field *mf)
> +{
> +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> +        return true;
> +    }
> +
> +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> +        return true;
> +    }
> +    return false;
> +}
> +
>  bool
>  mf_is_pipeline_field(const struct mf_field *mf)
>  {
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 7108c8a30..14d00db1e 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
>  
>      if (ofc->zone_src.field) {
>          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> +        if (ctx->xin->frozen_state) {
> +            /* If the upcall is a resume of a recirculation, we only need to
> +             * unwildcard the fields that are not in the frozen_metadata, as
> +             * when the rules update, OVS will generate a new recirc_id,
> +             * which will invalidate the megaflow with old the recirc_id.
> +             */
> +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> +                mf_mask_field(ofc->zone_src.field, ctx->wc);
> +            }
> +        } else {
> +            mf_mask_field(ofc->zone_src.field, ctx->wc);
> +        }
>      } else {
>          zone = ofc->zone_imm;
>      }
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index fb5b9a36d..8b84642e7 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -1927,6 +1927,51 @@ tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src=
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>  
> +AT_SETUP([conntrack - zones from other field])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> +AT_DATA([flows.txt], [dnl
> +priority=1,action=drop
> +priority=10,arp,action=normal
> +priority=10,icmp,action=normal
> +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> +priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5)
> +priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +OVS_START_L7([at_ns1], [http])
> +
> +dnl HTTP requests from p0->p1 should work fine.
> +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>)
> +])
> +
> +dnl This is to test when the zoneid is set by a field variable like NXM_NX_CT_ZONE, the OVS xlate should generate a megaflow with a form of 
> +dnl "ct_zone(5), ...  actions: ct(commit, zone=5)". The match "ct_zone(5)" is needed as if we changes the zoneid into 15 in the following,
> +dnl the old "ct_zone(5), ... actions: ct(commit, zone=5)" megaflow will not get hit, and OVS will generate a new megaflow with the match "ct_zone(0xf)".
> +dnl This will make sure that the new packets are committing to zoneid 15 rather than old 5.
> +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0x5)" ], [0], [])
> +
> +AT_CHECK([ovs-ofctl mod-flows br0 'priority=100,ct_state=-trk,tcp,in_port="ovs-p0" actions=ct(table=0,zone=15)'])
> +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0xf)" ], [0], [])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +
>  AT_SETUP([conntrack - multiple bridges])
>  CHECK_CONNTRACK()
>  OVS_TRAFFIC_VSWITCHD_START(
>
Peng He March 6, 2021, 2:37 a.m. UTC | #3
Mark Gray <mark.d.gray@redhat.com> 于2021年3月5日周五 下午7:54写道:
>
> On 27/02/2021 09:34, Peng He wrote:
> > CT zone could be set from a field that is not included in frozen
> > metadata. Consider the example rules which are typically seen in
> > OpenStack security group rules:
> >
> > priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> > priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> >
> > The zone is set from the first rule's ct action. These two rules will
> > generate two megaflows: the first one uses zone=5 to query the CT module,
> > the second one sets the zone-id from the first megaflow and commit to CT.
> >
> > The current implementation will generate a megaflow that does not use
> > ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
> > set by an Imm not a field.
> >
> > Consider a situation that one changes the zone id (for example to 15)
> > in the first rule, however, still keep the second rule unchanged. During
> > this change, there is traffic hitting the two generated megaflows, the
> > revaldiator would revalidate all megaflows, however, the revalidator will
> > not change the second megaflow, because zone=5 is recorded in the
> > megaflow, so the xlate will still translate the commit action into zone=5,
> > and the new traffic will still commit to CT as zone=5, not zone=15,
> > resulting in taffic drops and other issues.
> >
> > Just like OVS set-field convention, if a field X is set by Y
> > (Y is a variable not an Imm), we should also mask Y as a match
> > in the generated megaflow. An exception is that if the zone-id is
> > set by the field that is included in the frozen state (i.e. regs) and this
> > upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
> > as the recirc_id is a hash of the values in these fields, and it will change
> > following the changes of these fields. When the recirc_id changes,
> > all megaflows with the old recirc id will be invalid later.
>
> This looks good to me and all the unit-tests pass.
>
> There is some trailing whitespace. You can run
> "./utilities/checkpatch.py" when submitting to catch them before the
> 0-day robot does. Its a bit of a nit and I don't know if this won't get
> committed because of it. That's a decision for a maintainer.

thanks for your kind suggestion
.
better send a new version, but before that maybe there are
some more suggestions from maintainers, and it's better
to wait and then resend.

> >
> > Fixes: 07659514c3 ("Add support for connection tracking.")
> > Reported-by: Sai Su <susai.ss@bytedance.com>
> > Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> > ---
> >  include/openvswitch/meta-flow.h |  1 +
> >  lib/meta-flow.c                 | 13 ++++++++++
> >  ofproto/ofproto-dpif-xlate.c    | 12 +++++++++
> >  tests/system-traffic.at         | 45 +++++++++++++++++++++++++++++++++
> >  4 files changed, 71 insertions(+)
> >
> > diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> > index 95e52e358..045dce8f5 100644
> > --- a/include/openvswitch/meta-flow.h
> > +++ b/include/openvswitch/meta-flow.h
> > @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
> >                                const union mf_value *mask,
> >                                struct flow *);
> >  bool mf_is_tun_metadata(const struct mf_field *);
> > +bool mf_is_frozen_metadata(const struct mf_field *);
> >  bool mf_is_pipeline_field(const struct mf_field *);
> >  bool mf_is_set(const struct mf_field *, const struct flow *);
> >  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> > diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> > index c808d205d..e03cd8d0c 100644
> > --- a/lib/meta-flow.c
> > +++ b/lib/meta-flow.c
> > @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
> >             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
> >  }
> >
> > +bool
> > +mf_is_frozen_metadata(const struct mf_field *mf)
> > +{
> > +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> > +        return true;
> > +    }
> > +
> > +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> > +        return true;
> > +    }
> > +    return false;
> > +}
> > +
> >  bool
> >  mf_is_pipeline_field(const struct mf_field *mf)
> >  {
> > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> > index 7108c8a30..14d00db1e 100644
> > --- a/ofproto/ofproto-dpif-xlate.c
> > +++ b/ofproto/ofproto-dpif-xlate.c
> > @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
> >
> >      if (ofc->zone_src.field) {
> >          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> > +        if (ctx->xin->frozen_state) {
> > +            /* If the upcall is a resume of a recirculation, we only need to
> > +             * unwildcard the fields that are not in the frozen_metadata, as
> > +             * when the rules update, OVS will generate a new recirc_id,
> > +             * which will invalidate the megaflow with old the recirc_id.
> > +             */
> > +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> > +                mf_mask_field(ofc->zone_src.field, ctx->wc);
> > +            }
> > +        } else {
> > +            mf_mask_field(ofc->zone_src.field, ctx->wc);
> > +        }
> >      } else {
> >          zone = ofc->zone_imm;
> >      }
> > diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> > index fb5b9a36d..8b84642e7 100644
> > --- a/tests/system-traffic.at
> > +++ b/tests/system-traffic.at
> > @@ -1927,6 +1927,51 @@ tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src=
> >  OVS_TRAFFIC_VSWITCHD_STOP
> >  AT_CLEANUP
> >
> > +AT_SETUP([conntrack - zones from other field])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=10,icmp,action=normal
> > +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> > +priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> > +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5)
> > +priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +OVS_START_L7([at_ns1], [http])
> > +
> > +dnl HTTP requests from p0->p1 should work fine.
> > +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> > +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>)
> > +])
> > +
> > +dnl This is to test when the zoneid is set by a field variable like NXM_NX_CT_ZONE, the OVS xlate should generate a megaflow with a form of
> > +dnl "ct_zone(5), ...  actions: ct(commit, zone=5)". The match "ct_zone(5)" is needed as if we changes the zoneid into 15 in the following,
> > +dnl the old "ct_zone(5), ... actions: ct(commit, zone=5)" megaflow will not get hit, and OVS will generate a new megaflow with the match "ct_zone(0xf)".
> > +dnl This will make sure that the new packets are committing to zoneid 15 rather than old 5.
> > +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0x5)" ], [0], [])
> > +
> > +AT_CHECK([ovs-ofctl mod-flows br0 'priority=100,ct_state=-trk,tcp,in_port="ovs-p0" actions=ct(table=0,zone=15)'])
> > +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> > +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0xf)" ], [0], [])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +
> >  AT_SETUP([conntrack - multiple bridges])
> >  CHECK_CONNTRACK()
> >  OVS_TRAFFIC_VSWITCHD_START(
> >
>
Peng He March 19, 2021, 8:42 a.m. UTC | #4
Hi,

Ping for this patch.

贺鹏 <xnhp0320@gmail.com> 于2021年3月6日周六 上午10:37写道:
>
> Mark Gray <mark.d.gray@redhat.com> 于2021年3月5日周五 下午7:54写道:
> >
> > On 27/02/2021 09:34, Peng He wrote:
> > > CT zone could be set from a field that is not included in frozen
> > > metadata. Consider the example rules which are typically seen in
> > > OpenStack security group rules:
> > >
> > > priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> > > priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> > >
> > > The zone is set from the first rule's ct action. These two rules will
> > > generate two megaflows: the first one uses zone=5 to query the CT module,
> > > the second one sets the zone-id from the first megaflow and commit to CT.
> > >
> > > The current implementation will generate a megaflow that does not use
> > > ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
> > > set by an Imm not a field.
> > >
> > > Consider a situation that one changes the zone id (for example to 15)
> > > in the first rule, however, still keep the second rule unchanged. During
> > > this change, there is traffic hitting the two generated megaflows, the
> > > revaldiator would revalidate all megaflows, however, the revalidator will
> > > not change the second megaflow, because zone=5 is recorded in the
> > > megaflow, so the xlate will still translate the commit action into zone=5,
> > > and the new traffic will still commit to CT as zone=5, not zone=15,
> > > resulting in taffic drops and other issues.
> > >
> > > Just like OVS set-field convention, if a field X is set by Y
> > > (Y is a variable not an Imm), we should also mask Y as a match
> > > in the generated megaflow. An exception is that if the zone-id is
> > > set by the field that is included in the frozen state (i.e. regs) and this
> > > upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
> > > as the recirc_id is a hash of the values in these fields, and it will change
> > > following the changes of these fields. When the recirc_id changes,
> > > all megaflows with the old recirc id will be invalid later.
> >
> > This looks good to me and all the unit-tests pass.
> >
> > There is some trailing whitespace. You can run
> > "./utilities/checkpatch.py" when submitting to catch them before the
> > 0-day robot does. Its a bit of a nit and I don't know if this won't get
> > committed because of it. That's a decision for a maintainer.
>
> thanks for your kind suggestion
> .
> better send a new version, but before that maybe there are
> some more suggestions from maintainers, and it's better
> to wait and then resend.
>
> > >
> > > Fixes: 07659514c3 ("Add support for connection tracking.")
> > > Reported-by: Sai Su <susai.ss@bytedance.com>
> > > Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> > > ---
> > >  include/openvswitch/meta-flow.h |  1 +
> > >  lib/meta-flow.c                 | 13 ++++++++++
> > >  ofproto/ofproto-dpif-xlate.c    | 12 +++++++++
> > >  tests/system-traffic.at         | 45 +++++++++++++++++++++++++++++++++
> > >  4 files changed, 71 insertions(+)
> > >
> > > diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> > > index 95e52e358..045dce8f5 100644
> > > --- a/include/openvswitch/meta-flow.h
> > > +++ b/include/openvswitch/meta-flow.h
> > > @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
> > >                                const union mf_value *mask,
> > >                                struct flow *);
> > >  bool mf_is_tun_metadata(const struct mf_field *);
> > > +bool mf_is_frozen_metadata(const struct mf_field *);
> > >  bool mf_is_pipeline_field(const struct mf_field *);
> > >  bool mf_is_set(const struct mf_field *, const struct flow *);
> > >  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> > > diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> > > index c808d205d..e03cd8d0c 100644
> > > --- a/lib/meta-flow.c
> > > +++ b/lib/meta-flow.c
> > > @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
> > >             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
> > >  }
> > >
> > > +bool
> > > +mf_is_frozen_metadata(const struct mf_field *mf)
> > > +{
> > > +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> > > +        return true;
> > > +    }
> > > +
> > > +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> > > +        return true;
> > > +    }
> > > +    return false;
> > > +}
> > > +
> > >  bool
> > >  mf_is_pipeline_field(const struct mf_field *mf)
> > >  {
> > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> > > index 7108c8a30..14d00db1e 100644
> > > --- a/ofproto/ofproto-dpif-xlate.c
> > > +++ b/ofproto/ofproto-dpif-xlate.c
> > > @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
> > >
> > >      if (ofc->zone_src.field) {
> > >          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> > > +        if (ctx->xin->frozen_state) {
> > > +            /* If the upcall is a resume of a recirculation, we only need to
> > > +             * unwildcard the fields that are not in the frozen_metadata, as
> > > +             * when the rules update, OVS will generate a new recirc_id,
> > > +             * which will invalidate the megaflow with old the recirc_id.
> > > +             */
> > > +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> > > +                mf_mask_field(ofc->zone_src.field, ctx->wc);
> > > +            }
> > > +        } else {
> > > +            mf_mask_field(ofc->zone_src.field, ctx->wc);
> > > +        }
> > >      } else {
> > >          zone = ofc->zone_imm;
> > >      }
> > > diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> > > index fb5b9a36d..8b84642e7 100644
> > > --- a/tests/system-traffic.at
> > > +++ b/tests/system-traffic.at
> > > @@ -1927,6 +1927,51 @@ tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src=
> > >  OVS_TRAFFIC_VSWITCHD_STOP
> > >  AT_CLEANUP
> > >
> > > +AT_SETUP([conntrack - zones from other field])
> > > +CHECK_CONNTRACK()
> > > +OVS_TRAFFIC_VSWITCHD_START()
> > > +
> > > +ADD_NAMESPACES(at_ns0, at_ns1)
> > > +
> > > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> > > +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> > > +
> > > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> > > +AT_DATA([flows.txt], [dnl
> > > +priority=1,action=drop
> > > +priority=10,arp,action=normal
> > > +priority=10,icmp,action=normal
> > > +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> > > +priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> > > +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5)
> > > +priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1
> > > +])
> > > +
> > > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > > +
> > > +OVS_START_L7([at_ns1], [http])
> > > +
> > > +dnl HTTP requests from p0->p1 should work fine.
> > > +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> > > +
> > > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> > > +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>)
> > > +])
> > > +
> > > +dnl This is to test when the zoneid is set by a field variable like NXM_NX_CT_ZONE, the OVS xlate should generate a megaflow with a form of
> > > +dnl "ct_zone(5), ...  actions: ct(commit, zone=5)". The match "ct_zone(5)" is needed as if we changes the zoneid into 15 in the following,
> > > +dnl the old "ct_zone(5), ... actions: ct(commit, zone=5)" megaflow will not get hit, and OVS will generate a new megaflow with the match "ct_zone(0xf)".
> > > +dnl This will make sure that the new packets are committing to zoneid 15 rather than old 5.
> > > +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0x5)" ], [0], [])
> > > +
> > > +AT_CHECK([ovs-ofctl mod-flows br0 'priority=100,ct_state=-trk,tcp,in_port="ovs-p0" actions=ct(table=0,zone=15)'])
> > > +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> > > +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0xf)" ], [0], [])
> > > +
> > > +OVS_TRAFFIC_VSWITCHD_STOP
> > > +AT_CLEANUP
> > > +
> > > +
> > >  AT_SETUP([conntrack - multiple bridges])
> > >  CHECK_CONNTRACK()
> > >  OVS_TRAFFIC_VSWITCHD_START(
> > >
> >
>
>
> --
> hepeng
Peng He April 26, 2021, 10:35 a.m. UTC | #5
Hi,

it has been two months now, can anyone review this patch?

贺鹏 <xnhp0320@gmail.com> 于2021年3月19日周五 下午4:42写道:
>
> Hi,
>
> Ping for this patch.
>
> 贺鹏 <xnhp0320@gmail.com> 于2021年3月6日周六 上午10:37写道:
> >
> > Mark Gray <mark.d.gray@redhat.com> 于2021年3月5日周五 下午7:54写道:
> > >
> > > On 27/02/2021 09:34, Peng He wrote:
> > > > CT zone could be set from a field that is not included in frozen
> > > > metadata. Consider the example rules which are typically seen in
> > > > OpenStack security group rules:
> > > >
> > > > priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> > > > priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> > > >
> > > > The zone is set from the first rule's ct action. These two rules will
> > > > generate two megaflows: the first one uses zone=5 to query the CT module,
> > > > the second one sets the zone-id from the first megaflow and commit to CT.
> > > >
> > > > The current implementation will generate a megaflow that does not use
> > > > ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
> > > > set by an Imm not a field.
> > > >
> > > > Consider a situation that one changes the zone id (for example to 15)
> > > > in the first rule, however, still keep the second rule unchanged. During
> > > > this change, there is traffic hitting the two generated megaflows, the
> > > > revaldiator would revalidate all megaflows, however, the revalidator will
> > > > not change the second megaflow, because zone=5 is recorded in the
> > > > megaflow, so the xlate will still translate the commit action into zone=5,
> > > > and the new traffic will still commit to CT as zone=5, not zone=15,
> > > > resulting in taffic drops and other issues.
> > > >
> > > > Just like OVS set-field convention, if a field X is set by Y
> > > > (Y is a variable not an Imm), we should also mask Y as a match
> > > > in the generated megaflow. An exception is that if the zone-id is
> > > > set by the field that is included in the frozen state (i.e. regs) and this
> > > > upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
> > > > as the recirc_id is a hash of the values in these fields, and it will change
> > > > following the changes of these fields. When the recirc_id changes,
> > > > all megaflows with the old recirc id will be invalid later.
> > >
> > > This looks good to me and all the unit-tests pass.
> > >
> > > There is some trailing whitespace. You can run
> > > "./utilities/checkpatch.py" when submitting to catch them before the
> > > 0-day robot does. Its a bit of a nit and I don't know if this won't get
> > > committed because of it. That's a decision for a maintainer.
> >
> > thanks for your kind suggestion
> > .
> > better send a new version, but before that maybe there are
> > some more suggestions from maintainers, and it's better
> > to wait and then resend.
> >
> > > >
> > > > Fixes: 07659514c3 ("Add support for connection tracking.")
> > > > Reported-by: Sai Su <susai.ss@bytedance.com>
> > > > Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> > > > ---
> > > >  include/openvswitch/meta-flow.h |  1 +
> > > >  lib/meta-flow.c                 | 13 ++++++++++
> > > >  ofproto/ofproto-dpif-xlate.c    | 12 +++++++++
> > > >  tests/system-traffic.at         | 45 +++++++++++++++++++++++++++++++++
> > > >  4 files changed, 71 insertions(+)
> > > >
> > > > diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> > > > index 95e52e358..045dce8f5 100644
> > > > --- a/include/openvswitch/meta-flow.h
> > > > +++ b/include/openvswitch/meta-flow.h
> > > > @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
> > > >                                const union mf_value *mask,
> > > >                                struct flow *);
> > > >  bool mf_is_tun_metadata(const struct mf_field *);
> > > > +bool mf_is_frozen_metadata(const struct mf_field *);
> > > >  bool mf_is_pipeline_field(const struct mf_field *);
> > > >  bool mf_is_set(const struct mf_field *, const struct flow *);
> > > >  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> > > > diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> > > > index c808d205d..e03cd8d0c 100644
> > > > --- a/lib/meta-flow.c
> > > > +++ b/lib/meta-flow.c
> > > > @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
> > > >             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
> > > >  }
> > > >
> > > > +bool
> > > > +mf_is_frozen_metadata(const struct mf_field *mf)
> > > > +{
> > > > +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> > > > +        return true;
> > > > +    }
> > > > +
> > > > +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> > > > +        return true;
> > > > +    }
> > > > +    return false;
> > > > +}
> > > > +
> > > >  bool
> > > >  mf_is_pipeline_field(const struct mf_field *mf)
> > > >  {
> > > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> > > > index 7108c8a30..14d00db1e 100644
> > > > --- a/ofproto/ofproto-dpif-xlate.c
> > > > +++ b/ofproto/ofproto-dpif-xlate.c
> > > > @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
> > > >
> > > >      if (ofc->zone_src.field) {
> > > >          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> > > > +        if (ctx->xin->frozen_state) {
> > > > +            /* If the upcall is a resume of a recirculation, we only need to
> > > > +             * unwildcard the fields that are not in the frozen_metadata, as
> > > > +             * when the rules update, OVS will generate a new recirc_id,
> > > > +             * which will invalidate the megaflow with old the recirc_id.
> > > > +             */
> > > > +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> > > > +                mf_mask_field(ofc->zone_src.field, ctx->wc);
> > > > +            }
> > > > +        } else {
> > > > +            mf_mask_field(ofc->zone_src.field, ctx->wc);
> > > > +        }
> > > >      } else {
> > > >          zone = ofc->zone_imm;
> > > >      }
> > > > diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> > > > index fb5b9a36d..8b84642e7 100644
> > > > --- a/tests/system-traffic.at
> > > > +++ b/tests/system-traffic.at
> > > > @@ -1927,6 +1927,51 @@ tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src=
> > > >  OVS_TRAFFIC_VSWITCHD_STOP
> > > >  AT_CLEANUP
> > > >
> > > > +AT_SETUP([conntrack - zones from other field])
> > > > +CHECK_CONNTRACK()
> > > > +OVS_TRAFFIC_VSWITCHD_START()
> > > > +
> > > > +ADD_NAMESPACES(at_ns0, at_ns1)
> > > > +
> > > > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> > > > +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> > > > +
> > > > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> > > > +AT_DATA([flows.txt], [dnl
> > > > +priority=1,action=drop
> > > > +priority=10,arp,action=normal
> > > > +priority=10,icmp,action=normal
> > > > +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> > > > +priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> > > > +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5)
> > > > +priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1
> > > > +])
> > > > +
> > > > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > > > +
> > > > +OVS_START_L7([at_ns1], [http])
> > > > +
> > > > +dnl HTTP requests from p0->p1 should work fine.
> > > > +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> > > > +
> > > > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> > > > +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>)
> > > > +])
> > > > +
> > > > +dnl This is to test when the zoneid is set by a field variable like NXM_NX_CT_ZONE, the OVS xlate should generate a megaflow with a form of
> > > > +dnl "ct_zone(5), ...  actions: ct(commit, zone=5)". The match "ct_zone(5)" is needed as if we changes the zoneid into 15 in the following,
> > > > +dnl the old "ct_zone(5), ... actions: ct(commit, zone=5)" megaflow will not get hit, and OVS will generate a new megaflow with the match "ct_zone(0xf)".
> > > > +dnl This will make sure that the new packets are committing to zoneid 15 rather than old 5.
> > > > +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0x5)" ], [0], [])
> > > > +
> > > > +AT_CHECK([ovs-ofctl mod-flows br0 'priority=100,ct_state=-trk,tcp,in_port="ovs-p0" actions=ct(table=0,zone=15)'])
> > > > +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
> > > > +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0xf)" ], [0], [])
> > > > +
> > > > +OVS_TRAFFIC_VSWITCHD_STOP
> > > > +AT_CLEANUP
> > > > +
> > > > +
> > > >  AT_SETUP([conntrack - multiple bridges])
> > > >  CHECK_CONNTRACK()
> > > >  OVS_TRAFFIC_VSWITCHD_START(
> > > >
> > >
> >
> >
> > --
> > hepeng
>
>
>
> --
> hepeng
Mark Gray April 29, 2021, 4:44 p.m. UTC | #6
On 26/04/2021 11:35, 贺鹏 wrote:
> Hi,
> 
> it has been two months now, can anyone review this patch?
> 

I reviewed this a few months ago and it seemed ok but a maintainer may
want to pass an eye over it

Acked by: Mark D. Gray <mark.d.gray@redhat.com>
Ilya Maximets May 19, 2021, 12:50 p.m. UTC | #7
On 2/27/21 10:34 AM, Peng He wrote:
> CT zone could be set from a field that is not included in frozen
> metadata. Consider the example rules which are typically seen in
> OpenStack security group rules:
> 
> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> 
> The zone is set from the first rule's ct action. These two rules will
> generate two megaflows: the first one uses zone=5 to query the CT module,
> the second one sets the zone-id from the first megaflow and commit to CT.
> 
> The current implementation will generate a megaflow that does not use
> ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
> set by an Imm not a field.
> 
> Consider a situation that one changes the zone id (for example to 15)
> in the first rule, however, still keep the second rule unchanged. During
> this change, there is traffic hitting the two generated megaflows, the
> revaldiator would revalidate all megaflows, however, the revalidator will
> not change the second megaflow, because zone=5 is recorded in the
> megaflow, so the xlate will still translate the commit action into zone=5,
> and the new traffic will still commit to CT as zone=5, not zone=15,
> resulting in taffic drops and other issues.
> 
> Just like OVS set-field convention, if a field X is set by Y
> (Y is a variable not an Imm), we should also mask Y as a match
> in the generated megaflow. An exception is that if the zone-id is
> set by the field that is included in the frozen state (i.e. regs) and this
> upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
> as the recirc_id is a hash of the values in these fields, and it will change
> following the changes of these fields. When the recirc_id changes,
> all megaflows with the old recirc id will be invalid later.
> 
> Fixes: 07659514c3 ("Add support for connection tracking.")
> Reported-by: Sai Su <susai.ss@bytedance.com>
> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> ---
>  include/openvswitch/meta-flow.h |  1 +
>  lib/meta-flow.c                 | 13 ++++++++++
>  ofproto/ofproto-dpif-xlate.c    | 12 +++++++++
>  tests/system-traffic.at         | 45 +++++++++++++++++++++++++++++++++
>  4 files changed, 71 insertions(+)
> 
> diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> index 95e52e358..045dce8f5 100644
> --- a/include/openvswitch/meta-flow.h
> +++ b/include/openvswitch/meta-flow.h
> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
>                                const union mf_value *mask,
>                                struct flow *);
>  bool mf_is_tun_metadata(const struct mf_field *);
> +bool mf_is_frozen_metadata(const struct mf_field *);
>  bool mf_is_pipeline_field(const struct mf_field *);
>  bool mf_is_set(const struct mf_field *, const struct flow *);
>  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> index c808d205d..e03cd8d0c 100644
> --- a/lib/meta-flow.c
> +++ b/lib/meta-flow.c
> @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
>             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
>  }
>  
> +bool
> +mf_is_frozen_metadata(const struct mf_field *mf)
> +{
> +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> +        return true;
> +    }
> +
> +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> +        return true;
> +    }
> +    return false;
> +}
> +
>  bool
>  mf_is_pipeline_field(const struct mf_field *mf)
>  {
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 7108c8a30..14d00db1e 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
>  
>      if (ofc->zone_src.field) {
>          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> +        if (ctx->xin->frozen_state) {
> +            /* If the upcall is a resume of a recirculation, we only need to
> +             * unwildcard the fields that are not in the frozen_metadata, as
> +             * when the rules update, OVS will generate a new recirc_id,
> +             * which will invalidate the megaflow with old the recirc_id.
> +             */
> +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> +                mf_mask_field(ofc->zone_src.field, ctx->wc);
> +            }
> +        } else {
> +            mf_mask_field(ofc->zone_src.field, ctx->wc);
> +        }

IIUC, in current code above block is equal to just a single line:

    mf_mask_field(ofc->zone_src.field, ctx->wc);

That is because zone is not part of a frozen metadata, right?

Can we just remove all this extra logic and unwildcard unconditionally?
I understand that this code might save us one match in case someday
zone will be part of a frozen metadata, but is that really important?
Is there any harm in unwildcarding this field unconditionally?

Best regards, Ilya Maximets.
Peng He May 21, 2021, 3 a.m. UTC | #8
Hi, Ilya



Ilya Maximets <i.maximets@ovn.org> 于2021年5月19日周三 下午8:50写道:
>
> On 2/27/21 10:34 AM, Peng He wrote:
> > CT zone could be set from a field that is not included in frozen
> > metadata. Consider the example rules which are typically seen in
> > OpenStack security group rules:
> >
> > priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> > priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> >
> > The zone is set from the first rule's ct action. These two rules will
> > generate two megaflows: the first one uses zone=5 to query the CT module,
> > the second one sets the zone-id from the first megaflow and commit to CT.
> >
> > The current implementation will generate a megaflow that does not use
> > ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
> > set by an Imm not a field.
> >
> > Consider a situation that one changes the zone id (for example to 15)
> > in the first rule, however, still keep the second rule unchanged. During
> > this change, there is traffic hitting the two generated megaflows, the
> > revaldiator would revalidate all megaflows, however, the revalidator will
> > not change the second megaflow, because zone=5 is recorded in the
> > megaflow, so the xlate will still translate the commit action into zone=5,
> > and the new traffic will still commit to CT as zone=5, not zone=15,
> > resulting in taffic drops and other issues.
> >
> > Just like OVS set-field convention, if a field X is set by Y
> > (Y is a variable not an Imm), we should also mask Y as a match
> > in the generated megaflow. An exception is that if the zone-id is
> > set by the field that is included in the frozen state (i.e. regs) and this
> > upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
> > as the recirc_id is a hash of the values in these fields, and it will change
> > following the changes of these fields. When the recirc_id changes,
> > all megaflows with the old recirc id will be invalid later.
> >
> > Fixes: 07659514c3 ("Add support for connection tracking.")
> > Reported-by: Sai Su <susai.ss@bytedance.com>
> > Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> > ---
> >  include/openvswitch/meta-flow.h |  1 +
> >  lib/meta-flow.c                 | 13 ++++++++++
> >  ofproto/ofproto-dpif-xlate.c    | 12 +++++++++
> >  tests/system-traffic.at         | 45 +++++++++++++++++++++++++++++++++
> >  4 files changed, 71 insertions(+)
> >
> > diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> > index 95e52e358..045dce8f5 100644
> > --- a/include/openvswitch/meta-flow.h
> > +++ b/include/openvswitch/meta-flow.h
> > @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
> >                                const union mf_value *mask,
> >                                struct flow *);
> >  bool mf_is_tun_metadata(const struct mf_field *);
> > +bool mf_is_frozen_metadata(const struct mf_field *);
> >  bool mf_is_pipeline_field(const struct mf_field *);
> >  bool mf_is_set(const struct mf_field *, const struct flow *);
> >  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> > diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> > index c808d205d..e03cd8d0c 100644
> > --- a/lib/meta-flow.c
> > +++ b/lib/meta-flow.c
> > @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
> >             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
> >  }
> >
> > +bool
> > +mf_is_frozen_metadata(const struct mf_field *mf)
> > +{
> > +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> > +        return true;
> > +    }
> > +
> > +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> > +        return true;
> > +    }
> > +    return false;
> > +}
> > +
> >  bool
> >  mf_is_pipeline_field(const struct mf_field *mf)
> >  {
> > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> > index 7108c8a30..14d00db1e 100644
> > --- a/ofproto/ofproto-dpif-xlate.c
> > +++ b/ofproto/ofproto-dpif-xlate.c
> > @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
> >
> >      if (ofc->zone_src.field) {
> >          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> > +        if (ctx->xin->frozen_state) {
> > +            /* If the upcall is a resume of a recirculation, we only need to
> > +             * unwildcard the fields that are not in the frozen_metadata, as
> > +             * when the rules update, OVS will generate a new recirc_id,
> > +             * which will invalidate the megaflow with old the recirc_id.
> > +             */
> > +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> > +                mf_mask_field(ofc->zone_src.field, ctx->wc);
> > +            }
> > +        } else {
> > +            mf_mask_field(ofc->zone_src.field, ctx->wc);
> > +        }
>
> IIUC, in current code above block is equal to just a single line:
>
>     mf_mask_field(ofc->zone_src.field, ctx->wc);
>
> That is because zone is not part of a frozen metadata, right?

Yes, basically this is the reason.
I think the original code only considers the case that the zone id is
set by IMM, not
by a reg.

>
> Can we just remove all this extra logic and unwildcard unconditionally?
> I understand that this code might save us one match in case someday
> zone will be part of a frozen metadata, but is that really important?
> Is there any harm in unwildcarding this field unconditionally?

I think the code can reduce the number of megaflows in case that the rule
sets the zone-id using IMM. But I think unconditionally un-wildcarding is also
OK.


>
> Best regards, Ilya Maximets.
Ilya Maximets May 21, 2021, 7:50 p.m. UTC | #9
On 5/21/21 5:00 AM, 贺鹏 wrote:
> Hi, Ilya
> 
> 
> 
> Ilya Maximets <i.maximets@ovn.org> 于2021年5月19日周三 下午8:50写道:
>>
>> On 2/27/21 10:34 AM, Peng He wrote:
>>> CT zone could be set from a field that is not included in frozen
>>> metadata. Consider the example rules which are typically seen in
>>> OpenStack security group rules:
>>>
>>> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
>>> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
>>>
>>> The zone is set from the first rule's ct action. These two rules will
>>> generate two megaflows: the first one uses zone=5 to query the CT module,
>>> the second one sets the zone-id from the first megaflow and commit to CT.
>>>
>>> The current implementation will generate a megaflow that does not use
>>> ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
>>> set by an Imm not a field.
>>>
>>> Consider a situation that one changes the zone id (for example to 15)
>>> in the first rule, however, still keep the second rule unchanged. During
>>> this change, there is traffic hitting the two generated megaflows, the
>>> revaldiator would revalidate all megaflows, however, the revalidator will
>>> not change the second megaflow, because zone=5 is recorded in the
>>> megaflow, so the xlate will still translate the commit action into zone=5,
>>> and the new traffic will still commit to CT as zone=5, not zone=15,
>>> resulting in taffic drops and other issues.
>>>
>>> Just like OVS set-field convention, if a field X is set by Y
>>> (Y is a variable not an Imm), we should also mask Y as a match
>>> in the generated megaflow. An exception is that if the zone-id is
>>> set by the field that is included in the frozen state (i.e. regs) and this
>>> upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
>>> as the recirc_id is a hash of the values in these fields, and it will change
>>> following the changes of these fields. When the recirc_id changes,
>>> all megaflows with the old recirc id will be invalid later.
>>>
>>> Fixes: 07659514c3 ("Add support for connection tracking.")
>>> Reported-by: Sai Su <susai.ss@bytedance.com>
>>> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
>>> ---
>>>  include/openvswitch/meta-flow.h |  1 +
>>>  lib/meta-flow.c                 | 13 ++++++++++
>>>  ofproto/ofproto-dpif-xlate.c    | 12 +++++++++
>>>  tests/system-traffic.at         | 45 +++++++++++++++++++++++++++++++++
>>>  4 files changed, 71 insertions(+)
>>>
>>> diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
>>> index 95e52e358..045dce8f5 100644
>>> --- a/include/openvswitch/meta-flow.h
>>> +++ b/include/openvswitch/meta-flow.h
>>> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
>>>                                const union mf_value *mask,
>>>                                struct flow *);
>>>  bool mf_is_tun_metadata(const struct mf_field *);
>>> +bool mf_is_frozen_metadata(const struct mf_field *);
>>>  bool mf_is_pipeline_field(const struct mf_field *);
>>>  bool mf_is_set(const struct mf_field *, const struct flow *);
>>>  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
>>> diff --git a/lib/meta-flow.c b/lib/meta-flow.c
>>> index c808d205d..e03cd8d0c 100644
>>> --- a/lib/meta-flow.c
>>> +++ b/lib/meta-flow.c
>>> @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
>>>             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
>>>  }
>>>
>>> +bool
>>> +mf_is_frozen_metadata(const struct mf_field *mf)
>>> +{
>>> +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
>>> +        return true;
>>> +    }
>>> +
>>> +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
>>> +        return true;
>>> +    }
>>> +    return false;
>>> +}
>>> +
>>>  bool
>>>  mf_is_pipeline_field(const struct mf_field *mf)
>>>  {
>>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>>> index 7108c8a30..14d00db1e 100644
>>> --- a/ofproto/ofproto-dpif-xlate.c
>>> +++ b/ofproto/ofproto-dpif-xlate.c
>>> @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
>>>
>>>      if (ofc->zone_src.field) {
>>>          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
>>> +        if (ctx->xin->frozen_state) {
>>> +            /* If the upcall is a resume of a recirculation, we only need to
>>> +             * unwildcard the fields that are not in the frozen_metadata, as
>>> +             * when the rules update, OVS will generate a new recirc_id,
>>> +             * which will invalidate the megaflow with old the recirc_id.
>>> +             */
>>> +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
>>> +                mf_mask_field(ofc->zone_src.field, ctx->wc);
>>> +            }
>>> +        } else {
>>> +            mf_mask_field(ofc->zone_src.field, ctx->wc);
>>> +        }
>>
>> IIUC, in current code above block is equal to just a single line:
>>
>>     mf_mask_field(ofc->zone_src.field, ctx->wc);
>>
>> That is because zone is not part of a frozen metadata, right?
> 
> Yes, basically this is the reason.
> I think the original code only considers the case that the zone id is
> set by IMM, not
> by a reg.
> 
>>
>> Can we just remove all this extra logic and unwildcard unconditionally?
>> I understand that this code might save us one match in case someday
>> zone will be part of a frozen metadata, but is that really important?
>> Is there any harm in unwildcarding this field unconditionally?
> 
> I think the code can reduce the number of megaflows in case that the rule
> sets the zone-id using IMM. But I think unconditionally un-wildcarding is also
> OK.

From what I managed to test, if value is taken from any register
populated by Imm, there are no extra matches in datapath flows.
Maybe I'm missing some cases, though.

I tried to experiment a bit with this code and the test case.
First thing that I found is that we, probably, need to use
mf_write_subfield_flow() instead of mf_mask_field(), because
we should not unwildcard the whole field if only part of it
used.

Second thing is that I tried to write following set of flows
that might not make much sense, but should probably work fine:

priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0,commit,exec(load:0xffff0005->NXM_NX_CT_LABEL[[0..31]]))
priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_LABEL[[0..15]]),2

And what I see is that in resulted datapath flows there is no
match on ct_label.  This happens because masks of ct_label and
ct_mark gets saved at the beginning of the function and restored
later, i.e. our un-wildcarding is lost.

The third thing is that if I'm un-wildcarding before saving or
after restoring, I see a bit weird datapath flow after modification
of the first flow to:

priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=15,table=0,commit,exec(load:0xffff000f->NXM_NX_CT_LABEL[[0..31]]))

There is a new datapath flow with a correct ct_label match, but
the old flow gets updated to have ct_label match on 0.  This might
make some sense for this set of flows, but I'm not sure.

Before flow-mod:

recirc_id(0),in_port(ovs-p0),ct_state(-trk),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
    actions:ct(commit,zone=5,label=0xffff0005/0xffffffff),recirc(0x1)

recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0x5/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
    actions:ct(commit,zone=5,label=0x5/0xffff),ovs-p1

After:

recirc_id(0),in_port(ovs-p0),ct_state(-trk),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
    actions:ct(commit,zone=15,label=0xffff000f/0xffffffff),recirc(0x1)

recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
    actions:ct(commit,label=0/0xffff),ovs-p1

recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0xf/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
    actions:ct(commit,zone=15,label=0xf/0xffff),ovs-p1

Best regards, Ilya Maximets.
Peng He July 24, 2021, 7:27 a.m. UTC | #10
Hi,
friendly ping for this patch.

Ilya Maximets <i.maximets@ovn.org> 于2021年5月22日周六 上午3:50写道:
>
> On 5/21/21 5:00 AM, 贺鹏 wrote:
> > Hi, Ilya
> >
> >
> >
> > Ilya Maximets <i.maximets@ovn.org> 于2021年5月19日周三 下午8:50写道:
> >>
> >> On 2/27/21 10:34 AM, Peng He wrote:
> >>> CT zone could be set from a field that is not included in frozen
> >>> metadata. Consider the example rules which are typically seen in
> >>> OpenStack security group rules:
> >>>
> >>> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> >>> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> >>>
> >>> The zone is set from the first rule's ct action. These two rules will
> >>> generate two megaflows: the first one uses zone=5 to query the CT module,
> >>> the second one sets the zone-id from the first megaflow and commit to CT.
> >>>
> >>> The current implementation will generate a megaflow that does not use
> >>> ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
> >>> set by an Imm not a field.
> >>>
> >>> Consider a situation that one changes the zone id (for example to 15)
> >>> in the first rule, however, still keep the second rule unchanged. During
> >>> this change, there is traffic hitting the two generated megaflows, the
> >>> revaldiator would revalidate all megaflows, however, the revalidator will
> >>> not change the second megaflow, because zone=5 is recorded in the
> >>> megaflow, so the xlate will still translate the commit action into zone=5,
> >>> and the new traffic will still commit to CT as zone=5, not zone=15,
> >>> resulting in taffic drops and other issues.
> >>>
> >>> Just like OVS set-field convention, if a field X is set by Y
> >>> (Y is a variable not an Imm), we should also mask Y as a match
> >>> in the generated megaflow. An exception is that if the zone-id is
> >>> set by the field that is included in the frozen state (i.e. regs) and this
> >>> upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
> >>> as the recirc_id is a hash of the values in these fields, and it will change
> >>> following the changes of these fields. When the recirc_id changes,
> >>> all megaflows with the old recirc id will be invalid later.
> >>>
> >>> Fixes: 07659514c3 ("Add support for connection tracking.")
> >>> Reported-by: Sai Su <susai.ss@bytedance.com>
> >>> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> >>> ---
> >>>  include/openvswitch/meta-flow.h |  1 +
> >>>  lib/meta-flow.c                 | 13 ++++++++++
> >>>  ofproto/ofproto-dpif-xlate.c    | 12 +++++++++
> >>>  tests/system-traffic.at         | 45 +++++++++++++++++++++++++++++++++
> >>>  4 files changed, 71 insertions(+)
> >>>
> >>> diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> >>> index 95e52e358..045dce8f5 100644
> >>> --- a/include/openvswitch/meta-flow.h
> >>> +++ b/include/openvswitch/meta-flow.h
> >>> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
> >>>                                const union mf_value *mask,
> >>>                                struct flow *);
> >>>  bool mf_is_tun_metadata(const struct mf_field *);
> >>> +bool mf_is_frozen_metadata(const struct mf_field *);
> >>>  bool mf_is_pipeline_field(const struct mf_field *);
> >>>  bool mf_is_set(const struct mf_field *, const struct flow *);
> >>>  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> >>> diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> >>> index c808d205d..e03cd8d0c 100644
> >>> --- a/lib/meta-flow.c
> >>> +++ b/lib/meta-flow.c
> >>> @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
> >>>             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
> >>>  }
> >>>
> >>> +bool
> >>> +mf_is_frozen_metadata(const struct mf_field *mf)
> >>> +{
> >>> +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> >>> +        return true;
> >>> +    }
> >>> +
> >>> +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> >>> +        return true;
> >>> +    }
> >>> +    return false;
> >>> +}
> >>> +
> >>>  bool
> >>>  mf_is_pipeline_field(const struct mf_field *mf)
> >>>  {
> >>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> >>> index 7108c8a30..14d00db1e 100644
> >>> --- a/ofproto/ofproto-dpif-xlate.c
> >>> +++ b/ofproto/ofproto-dpif-xlate.c
> >>> @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
> >>>
> >>>      if (ofc->zone_src.field) {
> >>>          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> >>> +        if (ctx->xin->frozen_state) {
> >>> +            /* If the upcall is a resume of a recirculation, we only need to
> >>> +             * unwildcard the fields that are not in the frozen_metadata, as
> >>> +             * when the rules update, OVS will generate a new recirc_id,
> >>> +             * which will invalidate the megaflow with old the recirc_id.
> >>> +             */
> >>> +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> >>> +                mf_mask_field(ofc->zone_src.field, ctx->wc);
> >>> +            }
> >>> +        } else {
> >>> +            mf_mask_field(ofc->zone_src.field, ctx->wc);
> >>> +        }
> >>
> >> IIUC, in current code above block is equal to just a single line:
> >>
> >>     mf_mask_field(ofc->zone_src.field, ctx->wc);
> >>
> >> That is because zone is not part of a frozen metadata, right?
> >
> > Yes, basically this is the reason.
> > I think the original code only considers the case that the zone id is
> > set by IMM, not
> > by a reg.
> >
> >>
> >> Can we just remove all this extra logic and unwildcard unconditionally?
> >> I understand that this code might save us one match in case someday
> >> zone will be part of a frozen metadata, but is that really important?
> >> Is there any harm in unwildcarding this field unconditionally?
> >
> > I think the code can reduce the number of megaflows in case that the rule
> > sets the zone-id using IMM. But I think unconditionally un-wildcarding is also
> > OK.
>
> From what I managed to test, if value is taken from any register
> populated by Imm, there are no extra matches in datapath flows.
> Maybe I'm missing some cases, though.
>
> I tried to experiment a bit with this code and the test case.
> First thing that I found is that we, probably, need to use
> mf_write_subfield_flow() instead of mf_mask_field(), because
> we should not unwildcard the whole field if only part of it
> used.
>
> Second thing is that I tried to write following set of flows
> that might not make much sense, but should probably work fine:
>
> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0,commit,exec(load:0xffff0005->NXM_NX_CT_LABEL[[0..31]]))
> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_LABEL[[0..15]]),2
>
> And what I see is that in resulted datapath flows there is no
> match on ct_label.  This happens because masks of ct_label and
> ct_mark gets saved at the beginning of the function and restored
> later, i.e. our un-wildcarding is lost.
>
> The third thing is that if I'm un-wildcarding before saving or
> after restoring, I see a bit weird datapath flow after modification
> of the first flow to:
>
> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=15,table=0,commit,exec(load:0xffff000f->NXM_NX_CT_LABEL[[0..31]]))
>
> There is a new datapath flow with a correct ct_label match, but
> the old flow gets updated to have ct_label match on 0.  This might
> make some sense for this set of flows, but I'm not sure.
>
> Before flow-mod:
>
> recirc_id(0),in_port(ovs-p0),ct_state(-trk),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>     actions:ct(commit,zone=5,label=0xffff0005/0xffffffff),recirc(0x1)
>
> recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0x5/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>     actions:ct(commit,zone=5,label=0x5/0xffff),ovs-p1
>
> After:
>
> recirc_id(0),in_port(ovs-p0),ct_state(-trk),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>     actions:ct(commit,zone=15,label=0xffff000f/0xffffffff),recirc(0x1)
>
> recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>     actions:ct(commit,label=0/0xffff),ovs-p1
>
> recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0xf/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>     actions:ct(commit,zone=15,label=0xf/0xffff),ovs-p1
>
> Best regards, Ilya Maximets.
Ilya Maximets July 25, 2021, 11:22 p.m. UTC | #11
Re-adding the list and people from CC.

On 6/27/21 8:16 AM, 贺鹏 wrote:
> Hi, Ilya,
> 
> sorry for replying so late, being really busy this month.

Busy time for everyone.
BTW, I'm taking some time off next week.  Maybe someone else will be able
to take a look at the changes while I'm not here.  My replies inline.

> I've read through your comments. I personally prefer to keep the
> number of megaflows as few as possible, as megaflow is becoming
> expensive as the size of flow struct is large.
> below are my experiments.
> 
> Ilya Maximets <i.maximets@ovn.org> 于2021年5月22日周六 上午3:50写道:
>>
>> On 5/21/21 5:00 AM, 贺鹏 wrote:
>>> Hi, Ilya
>>>
>>>
>>>
>>> Ilya Maximets <i.maximets@ovn.org> 于2021年5月19日周三 下午8:50写道:
>>>>
>>>> On 2/27/21 10:34 AM, Peng He wrote:
>>>>> CT zone could be set from a field that is not included in frozen
>>>>> metadata. Consider the example rules which are typically seen in
>>>>> OpenStack security group rules:
>>>>>
>>>>> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
>>>>> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
>>>>>
>>>>> The zone is set from the first rule's ct action. These two rules will
>>>>> generate two megaflows: the first one uses zone=5 to query the CT module,
>>>>> the second one sets the zone-id from the first megaflow and commit to CT.
>>>>>
>>>>> The current implementation will generate a megaflow that does not use
>>>>> ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
>>>>> set by an Imm not a field.
>>>>>
>>>>> Consider a situation that one changes the zone id (for example to 15)
>>>>> in the first rule, however, still keep the second rule unchanged. During
>>>>> this change, there is traffic hitting the two generated megaflows, the
>>>>> revaldiator would revalidate all megaflows, however, the revalidator will
>>>>> not change the second megaflow, because zone=5 is recorded in the
>>>>> megaflow, so the xlate will still translate the commit action into zone=5,
>>>>> and the new traffic will still commit to CT as zone=5, not zone=15,
>>>>> resulting in taffic drops and other issues.
>>>>>
>>>>> Just like OVS set-field convention, if a field X is set by Y
>>>>> (Y is a variable not an Imm), we should also mask Y as a match
>>>>> in the generated megaflow. An exception is that if the zone-id is
>>>>> set by the field that is included in the frozen state (i.e. regs) and this
>>>>> upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
>>>>> as the recirc_id is a hash of the values in these fields, and it will change
>>>>> following the changes of these fields. When the recirc_id changes,
>>>>> all megaflows with the old recirc id will be invalid later.
>>>>>
>>>>> Fixes: 07659514c3 ("Add support for connection tracking.")
>>>>> Reported-by: Sai Su <susai.ss@bytedance.com>
>>>>> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
>>>>> ---
>>>>>  include/openvswitch/meta-flow.h |  1 +
>>>>>  lib/meta-flow.c                 | 13 ++++++++++
>>>>>  ofproto/ofproto-dpif-xlate.c    | 12 +++++++++
>>>>>  tests/system-traffic.at         | 45 +++++++++++++++++++++++++++++++++
>>>>>  4 files changed, 71 insertions(+)
>>>>>
>>>>> diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
>>>>> index 95e52e358..045dce8f5 100644
>>>>> --- a/include/openvswitch/meta-flow.h
>>>>> +++ b/include/openvswitch/meta-flow.h
>>>>> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
>>>>>                                const union mf_value *mask,
>>>>>                                struct flow *);
>>>>>  bool mf_is_tun_metadata(const struct mf_field *);
>>>>> +bool mf_is_frozen_metadata(const struct mf_field *);
>>>>>  bool mf_is_pipeline_field(const struct mf_field *);
>>>>>  bool mf_is_set(const struct mf_field *, const struct flow *);
>>>>>  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
>>>>> diff --git a/lib/meta-flow.c b/lib/meta-flow.c
>>>>> index c808d205d..e03cd8d0c 100644
>>>>> --- a/lib/meta-flow.c
>>>>> +++ b/lib/meta-flow.c
>>>>> @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
>>>>>             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
>>>>>  }
>>>>>
>>>>> +bool
>>>>> +mf_is_frozen_metadata(const struct mf_field *mf)
>>>>> +{
>>>>> +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
>>>>> +        return true;
>>>>> +    }
>>>>> +
>>>>> +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
>>>>> +        return true;
>>>>> +    }
>>>>> +    return false;
>>>>> +}
>>>>> +
>>>>>  bool
>>>>>  mf_is_pipeline_field(const struct mf_field *mf)
>>>>>  {
>>>>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>>>>> index 7108c8a30..14d00db1e 100644
>>>>> --- a/ofproto/ofproto-dpif-xlate.c
>>>>> +++ b/ofproto/ofproto-dpif-xlate.c
>>>>> @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
>>>>>
>>>>>      if (ofc->zone_src.field) {
>>>>>          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
>>>>> +        if (ctx->xin->frozen_state) {
>>>>> +            /* If the upcall is a resume of a recirculation, we only need to
>>>>> +             * unwildcard the fields that are not in the frozen_metadata, as
>>>>> +             * when the rules update, OVS will generate a new recirc_id,
>>>>> +             * which will invalidate the megaflow with old the recirc_id.
>>>>> +             */
>>>>> +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
>>>>> +                mf_mask_field(ofc->zone_src.field, ctx->wc);
>>>>> +            }
>>>>> +        } else {
>>>>> +            mf_mask_field(ofc->zone_src.field, ctx->wc);
>>>>> +        }
>>>>
>>>> IIUC, in current code above block is equal to just a single line:
>>>>
>>>>     mf_mask_field(ofc->zone_src.field, ctx->wc);
>>>>
>>>> That is because zone is not part of a frozen metadata, right?
>>>
>>> Yes, basically this is the reason.
>>> I think the original code only considers the case that the zone id is
>>> set by IMM, not
>>> by a reg.
>>>
>>>>
>>>> Can we just remove all this extra logic and unwildcard unconditionally?
>>>> I understand that this code might save us one match in case someday
>>>> zone will be part of a frozen metadata, but is that really important?
>>>> Is there any harm in unwildcarding this field unconditionally?
>>>
>>> I think the code can reduce the number of megaflows in case that the rule
>>> sets the zone-id using IMM. But I think unconditionally un-wildcarding is also
>>> OK.
>>
>> From what I managed to test, if value is taken from any register
>> populated by Imm, there are no extra matches in datapath flows.
>> Maybe I'm missing some cases, though.
>>
>> I tried to experiment a bit with this code and the test case.
>> First thing that I found is that we, probably, need to use
>> mf_write_subfield_flow() instead of mf_mask_field(), because
>> we should not unwildcard the whole field if only part of it
>> used.
>>
>> Second thing is that I tried to write following set of flows
>> that might not make much sense, but should probably work fine:
>>
>> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0,commit,exec(load:0xffff0005->NXM_NX_CT_LABEL[[0..31]]))
>> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_LABEL[[0..15]]),2
>>
>> And what I see is that in resulted datapath flows there is no
>> match on ct_label.  This happens because masks of ct_label and
>> ct_mark gets saved at the beginning of the function and restored
>> later, i.e. our un-wildcarding is lost.
> 
> Yes, good catch!
> 
> Yes, I am using the belowing code:
> 
>      if (ofc->zone_src.field) {
>          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> +        union mf_subvalue value;
> +        memset(&value, 0xff, sizeof(value));
>          if (ctx->xin->frozen_state) {
>              /* If the upcall is a resume of a recirculation, we only need to
>               * unwildcard the fields that are not in the frozen_metadata, as
> @@ -6202,15 +6189,30 @@ compose_conntrack_action(struct xlate_ctx
> *ctx, struct ofpact_conntrack *ofc,
>               * which will invalidate the megaflow with old the recirc_id.
>               */
>              if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> -                mf_mask_field(ofc->zone_src.field, ctx->wc);
> +                mf_write_subfield_flow(&ofc->zone_src, &value,
> &ctx->wc->masks);
>              }
>          } else {
> -            mf_mask_field(ofc->zone_src.field, ctx->wc);
> +            mf_write_subfield_flow(&ofc->zone_src, &value, &ctx->wc->masks);
>          }
>      } else {
>          zone = ofc->zone_imm;
>      }
> 
> Not sure if it fits the idiom usage of mf APIs.

I'm not an expert in this part of the code, but this seems fine to me.
I would re-order the definition part though, oterwise it's hard to read.
Something like:

if (ofc->zone_src.field) {
    union mf_subvalue value;
    memset(&value, 0xff, sizeof(value));

    zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
    if (ctx->xin->frozen_state) {
        ...


> 
>>
>> The third thing is that if I'm un-wildcarding before saving or
>> after restoring, I see a bit weird datapath flow after modification
>> of the first flow to:
>>
>> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=15,table=0,commit,exec(load:0xffff000f->NXM_NX_CT_LABEL[[0..31]]))
>>
>> There is a new datapath flow with a correct ct_label match, but
>> the old flow gets updated to have ct_label match on 0.  This might
>> make some sense for this set of flows, but I'm not sure.
>>
>> Before flow-mod:
>>
>> recirc_id(0),in_port(ovs-p0),ct_state(-trk),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>>     actions:ct(commit,zone=5,label=0xffff0005/0xffffffff),recirc(0x1)
>>
>> recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0x5/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>>     actions:ct(commit,zone=5,label=0x5/0xffff),ovs-p1
>>
>> After:
>>
>> recirc_id(0),in_port(ovs-p0),ct_state(-trk),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>>     actions:ct(commit,zone=15,label=0xffff000f/0xffffffff),recirc(0x1)
>>
>> recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>>     actions:ct(commit,label=0/0xffff),ovs-p1
>>
>> recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0xf/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>>     actions:ct(commit,zone=15,label=0xf/0xffff),ovs-p1
>>
> 
> I do not encounter such an issue, my changes are here:

Hmm.  Maybe I did something wrong.
Could you prepare v2, so we can continue discussion from there?

Best regards, Ilya Maximets.

> 
>  compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
>                           bool is_last_action)
>  {
> -    ovs_u128 old_ct_label_mask = ctx->wc->masks.ct_label;
> -    uint32_t old_ct_mark_mask = ctx->wc->masks.ct_mark;
> -    size_t ct_offset;
>      uint16_t zone;
> -
>      uint16_t zone;
> -
> -    /* Ensure that any prior actions are applied before composing the new
> -     * conntrack action. */
> -    xlate_commit_actions(ctx);
> -
> -    /* Process nested actions first, to populate the key. */
> -    ctx->ct_nat_action = NULL;
> -    ctx->wc->masks.ct_mark = 0;
> -    ctx->wc->masks.ct_label = OVS_U128_ZERO;
> -    do_xlate_actions(ofc->actions, ofpact_ct_get_action_len(ofc), ctx,
> -                     is_last_action, false);
> -
>      if (ofc->zone_src.field) {
>          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> +        union mf_subvalue value;
> +        memset(&value, 0xff, sizeof(value));
>          if (ctx->xin->frozen_state) {
>              /* If the upcall is a resume of a recirculation, we only need to
>               * unwildcard the fields that are not in the frozen_metadata, as
> @@ -6202,15 +6189,30 @@ compose_conntrack_action(struct xlate_ctx
> *ctx, struct ofpact_conntrack *ofc,
>               * which will invalidate the megaflow with old the recirc_id.
>               */
>              if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> -                mf_mask_field(ofc->zone_src.field, ctx->wc);
> +                mf_write_subfield_flow(&ofc->zone_src, &value,
> &ctx->wc->masks);
>              }
>          } else {
> -            mf_mask_field(ofc->zone_src.field, ctx->wc);
> +            mf_write_subfield_flow(&ofc->zone_src, &value, &ctx->wc->masks);
>          }
>      } else {
>          zone = ofc->zone_imm;
>      }
> 
> +    size_t ct_offset;
> +    ovs_u128 old_ct_label_mask = ctx->wc->masks.ct_label;
> +    uint32_t old_ct_mark_mask = ctx->wc->masks.ct_mark;
> +    /* Ensure that any prior actions are applied before composing the new
> +     * conntrack action. */
> +    xlate_commit_actions(ctx);
> +
> +    /* Process nested actions first, to populate the key. */
> +    ctx->ct_nat_action = NULL;
> +    ctx->wc->masks.ct_mark = 0;
> +    ctx->wc->masks.ct_label = OVS_U128_ZERO;
> +    do_xlate_actions(ofc->actions, ofpact_ct_get_action_len(ofc), ctx,
> +                     is_last_action, false);
> +
> +
> 
> and my test is OK.
> root@debian:~/ovs# ovs-appctl dpctl/dump-flows | grep label
> ct_state(+trk),ct_label(0xffff0005/0xffff),recirc_id(0x2),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
> packets:5, bytes:465, used:1.323s, flags:FP.,
> actions:ct(commit,zone=5),3
> ct_state(-trk),recirc_id(0),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
> packets:11, bytes:1004, used:1.293s, flags:SFP.,
> actions:ct(commit,zone=15,label=0xffff000f/0xffffffff),recirc(0x2)
> ct_state(+trk),ct_label(0xffff000f/0xffff),recirc_id(0x2),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
> packets:5, bytes:465, used:1.293s, flags:FP.,
> actions:ct(commit,zone=15),3
> 
> 
> 
>> Best regards, Ilya Maximets.
> 
> 
>
Peng He July 26, 2021, 7:31 a.m. UTC | #12
Sure,
will send a v2. Thanks for the review.


Ilya Maximets <i.maximets@ovn.org> 于2021年7月26日周一 上午7:22写道:

>
> Re-adding the list and people from CC.
>
> On 6/27/21 8:16 AM, 贺鹏 wrote:
> > Hi, Ilya,
> >
> > sorry for replying so late, being really busy this month.
>
> Busy time for everyone.
> BTW, I'm taking some time off next week.  Maybe someone else will be able
> to take a look at the changes while I'm not here.  My replies inline.

Have a good rest.

>
> > I've read through your comments. I personally prefer to keep the
> > number of megaflows as few as possible, as megaflow is becoming
> > expensive as the size of flow struct is large.
> > below are my experiments.
> >
> > Ilya Maximets <i.maximets@ovn.org> 于2021年5月22日周六 上午3:50写道:
> >>
> >> On 5/21/21 5:00 AM, 贺鹏 wrote:
> >>> Hi, Ilya
> >>>
> >>>
> >>>
> >>> Ilya Maximets <i.maximets@ovn.org> 于2021年5月19日周三 下午8:50写道:
> >>>>
> >>>> On 2/27/21 10:34 AM, Peng He wrote:
> >>>>> CT zone could be set from a field that is not included in frozen
> >>>>> metadata. Consider the example rules which are typically seen in
> >>>>> OpenStack security group rules:
> >>>>>
> >>>>> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> >>>>> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> >>>>>
> >>>>> The zone is set from the first rule's ct action. These two rules will
> >>>>> generate two megaflows: the first one uses zone=5 to query the CT module,
> >>>>> the second one sets the zone-id from the first megaflow and commit to CT.
> >>>>>
> >>>>> The current implementation will generate a megaflow that does not use
> >>>>> ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
> >>>>> set by an Imm not a field.
> >>>>>
> >>>>> Consider a situation that one changes the zone id (for example to 15)
> >>>>> in the first rule, however, still keep the second rule unchanged. During
> >>>>> this change, there is traffic hitting the two generated megaflows, the
> >>>>> revaldiator would revalidate all megaflows, however, the revalidator will
> >>>>> not change the second megaflow, because zone=5 is recorded in the
> >>>>> megaflow, so the xlate will still translate the commit action into zone=5,
> >>>>> and the new traffic will still commit to CT as zone=5, not zone=15,
> >>>>> resulting in taffic drops and other issues.
> >>>>>
> >>>>> Just like OVS set-field convention, if a field X is set by Y
> >>>>> (Y is a variable not an Imm), we should also mask Y as a match
> >>>>> in the generated megaflow. An exception is that if the zone-id is
> >>>>> set by the field that is included in the frozen state (i.e. regs) and this
> >>>>> upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
> >>>>> as the recirc_id is a hash of the values in these fields, and it will change
> >>>>> following the changes of these fields. When the recirc_id changes,
> >>>>> all megaflows with the old recirc id will be invalid later.
> >>>>>
> >>>>> Fixes: 07659514c3 ("Add support for connection tracking.")
> >>>>> Reported-by: Sai Su <susai.ss@bytedance.com>
> >>>>> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> >>>>> ---
> >>>>>  include/openvswitch/meta-flow.h |  1 +
> >>>>>  lib/meta-flow.c                 | 13 ++++++++++
> >>>>>  ofproto/ofproto-dpif-xlate.c    | 12 +++++++++
> >>>>>  tests/system-traffic.at         | 45 +++++++++++++++++++++++++++++++++
> >>>>>  4 files changed, 71 insertions(+)
> >>>>>
> >>>>> diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> >>>>> index 95e52e358..045dce8f5 100644
> >>>>> --- a/include/openvswitch/meta-flow.h
> >>>>> +++ b/include/openvswitch/meta-flow.h
> >>>>> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
> >>>>>                                const union mf_value *mask,
> >>>>>                                struct flow *);
> >>>>>  bool mf_is_tun_metadata(const struct mf_field *);
> >>>>> +bool mf_is_frozen_metadata(const struct mf_field *);
> >>>>>  bool mf_is_pipeline_field(const struct mf_field *);
> >>>>>  bool mf_is_set(const struct mf_field *, const struct flow *);
> >>>>>  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> >>>>> diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> >>>>> index c808d205d..e03cd8d0c 100644
> >>>>> --- a/lib/meta-flow.c
> >>>>> +++ b/lib/meta-flow.c
> >>>>> @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
> >>>>>             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
> >>>>>  }
> >>>>>
> >>>>> +bool
> >>>>> +mf_is_frozen_metadata(const struct mf_field *mf)
> >>>>> +{
> >>>>> +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> >>>>> +        return true;
> >>>>> +    }
> >>>>> +
> >>>>> +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> >>>>> +        return true;
> >>>>> +    }
> >>>>> +    return false;
> >>>>> +}
> >>>>> +
> >>>>>  bool
> >>>>>  mf_is_pipeline_field(const struct mf_field *mf)
> >>>>>  {
> >>>>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> >>>>> index 7108c8a30..14d00db1e 100644
> >>>>> --- a/ofproto/ofproto-dpif-xlate.c
> >>>>> +++ b/ofproto/ofproto-dpif-xlate.c
> >>>>> @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
> >>>>>
> >>>>>      if (ofc->zone_src.field) {
> >>>>>          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> >>>>> +        if (ctx->xin->frozen_state) {
> >>>>> +            /* If the upcall is a resume of a recirculation, we only need to
> >>>>> +             * unwildcard the fields that are not in the frozen_metadata, as
> >>>>> +             * when the rules update, OVS will generate a new recirc_id,
> >>>>> +             * which will invalidate the megaflow with old the recirc_id.
> >>>>> +             */
> >>>>> +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> >>>>> +                mf_mask_field(ofc->zone_src.field, ctx->wc);
> >>>>> +            }
> >>>>> +        } else {
> >>>>> +            mf_mask_field(ofc->zone_src.field, ctx->wc);
> >>>>> +        }
> >>>>
> >>>> IIUC, in current code above block is equal to just a single line:
> >>>>
> >>>>     mf_mask_field(ofc->zone_src.field, ctx->wc);
> >>>>
> >>>> That is because zone is not part of a frozen metadata, right?
> >>>
> >>> Yes, basically this is the reason.
> >>> I think the original code only considers the case that the zone id is
> >>> set by IMM, not
> >>> by a reg.
> >>>
> >>>>
> >>>> Can we just remove all this extra logic and unwildcard unconditionally?
> >>>> I understand that this code might save us one match in case someday
> >>>> zone will be part of a frozen metadata, but is that really important?
> >>>> Is there any harm in unwildcarding this field unconditionally?
> >>>
> >>> I think the code can reduce the number of megaflows in case that the rule
> >>> sets the zone-id using IMM. But I think unconditionally un-wildcarding is also
> >>> OK.
> >>
> >> From what I managed to test, if value is taken from any register
> >> populated by Imm, there are no extra matches in datapath flows.
> >> Maybe I'm missing some cases, though.
> >>
> >> I tried to experiment a bit with this code and the test case.
> >> First thing that I found is that we, probably, need to use
> >> mf_write_subfield_flow() instead of mf_mask_field(), because
> >> we should not unwildcard the whole field if only part of it
> >> used.
> >>
> >> Second thing is that I tried to write following set of flows
> >> that might not make much sense, but should probably work fine:
> >>
> >> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0,commit,exec(load:0xffff0005->NXM_NX_CT_LABEL[[0..31]]))
> >> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_LABEL[[0..15]]),2
> >>
> >> And what I see is that in resulted datapath flows there is no
> >> match on ct_label.  This happens because masks of ct_label and
> >> ct_mark gets saved at the beginning of the function and restored
> >> later, i.e. our un-wildcarding is lost.
> >
> > Yes, good catch!
> >
> > Yes, I am using the belowing code:
> >
> >      if (ofc->zone_src.field) {
> >          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> > +        union mf_subvalue value;
> > +        memset(&value, 0xff, sizeof(value));
> >          if (ctx->xin->frozen_state) {
> >              /* If the upcall is a resume of a recirculation, we only need to
> >               * unwildcard the fields that are not in the frozen_metadata, as
> > @@ -6202,15 +6189,30 @@ compose_conntrack_action(struct xlate_ctx
> > *ctx, struct ofpact_conntrack *ofc,
> >               * which will invalidate the megaflow with old the recirc_id.
> >               */
> >              if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> > -                mf_mask_field(ofc->zone_src.field, ctx->wc);
> > +                mf_write_subfield_flow(&ofc->zone_src, &value,
> > &ctx->wc->masks);
> >              }
> >          } else {
> > -            mf_mask_field(ofc->zone_src.field, ctx->wc);
> > +            mf_write_subfield_flow(&ofc->zone_src, &value, &ctx->wc->masks);
> >          }
> >      } else {
> >          zone = ofc->zone_imm;
> >      }
> >
> > Not sure if it fits the idiom usage of mf APIs.
>
> I'm not an expert in this part of the code, but this seems fine to me.
> I would re-order the definition part though, oterwise it's hard to read.
> Something like:
>
> if (ofc->zone_src.field) {
>     union mf_subvalue value;
>     memset(&value, 0xff, sizeof(value));
>
>     zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
>     if (ctx->xin->frozen_state) {
>         ...
>
>
> >
> >>
> >> The third thing is that if I'm un-wildcarding before saving or
> >> after restoring, I see a bit weird datapath flow after modification
> >> of the first flow to:
> >>
> >> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=15,table=0,commit,exec(load:0xffff000f->NXM_NX_CT_LABEL[[0..31]]))
> >>
> >> There is a new datapath flow with a correct ct_label match, but
> >> the old flow gets updated to have ct_label match on 0.  This might
> >> make some sense for this set of flows, but I'm not sure.
> >>
> >> Before flow-mod:
> >>
> >> recirc_id(0),in_port(ovs-p0),ct_state(-trk),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
> >>     actions:ct(commit,zone=5,label=0xffff0005/0xffffffff),recirc(0x1)
> >>
> >> recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0x5/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
> >>     actions:ct(commit,zone=5,label=0x5/0xffff),ovs-p1
> >>
> >> After:
> >>
> >> recirc_id(0),in_port(ovs-p0),ct_state(-trk),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
> >>     actions:ct(commit,zone=15,label=0xffff000f/0xffffffff),recirc(0x1)
> >>
> >> recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
> >>     actions:ct(commit,label=0/0xffff),ovs-p1
> >>
> >> recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0xf/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
> >>     actions:ct(commit,zone=15,label=0xf/0xffff),ovs-p1
> >>
> >
> > I do not encounter such an issue, my changes are here:
>
> Hmm.  Maybe I did something wrong.
> Could you prepare v2, so we can continue discussion from there?
>
> Best regards, Ilya Maximets.
>
> >
> >  compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
> >                           bool is_last_action)
> >  {
> > -    ovs_u128 old_ct_label_mask = ctx->wc->masks.ct_label;
> > -    uint32_t old_ct_mark_mask = ctx->wc->masks.ct_mark;
> > -    size_t ct_offset;
> >      uint16_t zone;
> > -
> >      uint16_t zone;
> > -
> > -    /* Ensure that any prior actions are applied before composing the new
> > -     * conntrack action. */
> > -    xlate_commit_actions(ctx);
> > -
> > -    /* Process nested actions first, to populate the key. */
> > -    ctx->ct_nat_action = NULL;
> > -    ctx->wc->masks.ct_mark = 0;
> > -    ctx->wc->masks.ct_label = OVS_U128_ZERO;
> > -    do_xlate_actions(ofc->actions, ofpact_ct_get_action_len(ofc), ctx,
> > -                     is_last_action, false);
> > -
> >      if (ofc->zone_src.field) {
> >          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> > +        union mf_subvalue value;
> > +        memset(&value, 0xff, sizeof(value));
> >          if (ctx->xin->frozen_state) {
> >              /* If the upcall is a resume of a recirculation, we only need to
> >               * unwildcard the fields that are not in the frozen_metadata, as
> > @@ -6202,15 +6189,30 @@ compose_conntrack_action(struct xlate_ctx
> > *ctx, struct ofpact_conntrack *ofc,
> >               * which will invalidate the megaflow with old the recirc_id.
> >               */
> >              if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> > -                mf_mask_field(ofc->zone_src.field, ctx->wc);
> > +                mf_write_subfield_flow(&ofc->zone_src, &value,
> > &ctx->wc->masks);
> >              }
> >          } else {
> > -            mf_mask_field(ofc->zone_src.field, ctx->wc);
> > +            mf_write_subfield_flow(&ofc->zone_src, &value, &ctx->wc->masks);
> >          }
> >      } else {
> >          zone = ofc->zone_imm;
> >      }
> >
> > +    size_t ct_offset;
> > +    ovs_u128 old_ct_label_mask = ctx->wc->masks.ct_label;
> > +    uint32_t old_ct_mark_mask = ctx->wc->masks.ct_mark;
> > +    /* Ensure that any prior actions are applied before composing the new
> > +     * conntrack action. */
> > +    xlate_commit_actions(ctx);
> > +
> > +    /* Process nested actions first, to populate the key. */
> > +    ctx->ct_nat_action = NULL;
> > +    ctx->wc->masks.ct_mark = 0;
> > +    ctx->wc->masks.ct_label = OVS_U128_ZERO;
> > +    do_xlate_actions(ofc->actions, ofpact_ct_get_action_len(ofc), ctx,
> > +                     is_last_action, false);
> > +
> > +
> >
> > and my test is OK.
> > root@debian:~/ovs# ovs-appctl dpctl/dump-flows | grep label
> > ct_state(+trk),ct_label(0xffff0005/0xffff),recirc_id(0x2),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
> > packets:5, bytes:465, used:1.323s, flags:FP.,
> > actions:ct(commit,zone=5),3
> > ct_state(-trk),recirc_id(0),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
> > packets:11, bytes:1004, used:1.293s, flags:SFP.,
> > actions:ct(commit,zone=15,label=0xffff000f/0xffffffff),recirc(0x2)
> > ct_state(+trk),ct_label(0xffff000f/0xffff),recirc_id(0x2),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
> > packets:5, bytes:465, used:1.293s, flags:FP.,
> > actions:ct(commit,zone=15),3
> >
> >
> >
> >> Best regards, Ilya Maximets.
> >
> >
> >
>


--
hepeng
diff mbox series

Patch

diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
index 95e52e358..045dce8f5 100644
--- a/include/openvswitch/meta-flow.h
+++ b/include/openvswitch/meta-flow.h
@@ -2305,6 +2305,7 @@  void mf_set_flow_value_masked(const struct mf_field *,
                               const union mf_value *mask,
                               struct flow *);
 bool mf_is_tun_metadata(const struct mf_field *);
+bool mf_is_frozen_metadata(const struct mf_field *);
 bool mf_is_pipeline_field(const struct mf_field *);
 bool mf_is_set(const struct mf_field *, const struct flow *);
 void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
diff --git a/lib/meta-flow.c b/lib/meta-flow.c
index c808d205d..e03cd8d0c 100644
--- a/lib/meta-flow.c
+++ b/lib/meta-flow.c
@@ -1788,6 +1788,19 @@  mf_is_tun_metadata(const struct mf_field *mf)
            mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
 }
 
+bool
+mf_is_frozen_metadata(const struct mf_field *mf)
+{
+    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
+        return true;
+    }
+
+    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
+        return true;
+    }
+    return false;
+}
+
 bool
 mf_is_pipeline_field(const struct mf_field *mf)
 {
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 7108c8a30..14d00db1e 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -6195,6 +6195,18 @@  compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
 
     if (ofc->zone_src.field) {
         zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
+        if (ctx->xin->frozen_state) {
+            /* If the upcall is a resume of a recirculation, we only need to
+             * unwildcard the fields that are not in the frozen_metadata, as
+             * when the rules update, OVS will generate a new recirc_id,
+             * which will invalidate the megaflow with old the recirc_id.
+             */
+            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
+                mf_mask_field(ofc->zone_src.field, ctx->wc);
+            }
+        } else {
+            mf_mask_field(ofc->zone_src.field, ctx->wc);
+        }
     } else {
         zone = ofc->zone_imm;
     }
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index fb5b9a36d..8b84642e7 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1927,6 +1927,51 @@  tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src=
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - zones from other field])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=10,icmp,action=normal
+priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
+priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
+priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5)
+priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+OVS_START_L7([at_ns1], [http])
+
+dnl HTTP requests from p0->p1 should work fine.
+NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
+tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>)
+])
+
+dnl This is to test when the zoneid is set by a field variable like NXM_NX_CT_ZONE, the OVS xlate should generate a megaflow with a form of 
+dnl "ct_zone(5), ...  actions: ct(commit, zone=5)". The match "ct_zone(5)" is needed as if we changes the zoneid into 15 in the following,
+dnl the old "ct_zone(5), ... actions: ct(commit, zone=5)" megaflow will not get hit, and OVS will generate a new megaflow with the match "ct_zone(0xf)".
+dnl This will make sure that the new packets are committing to zoneid 15 rather than old 5.
+AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0x5)" ], [0], [])
+
+AT_CHECK([ovs-ofctl mod-flows br0 'priority=100,ct_state=-trk,tcp,in_port="ovs-p0" actions=ct(table=0,zone=15)'])
+NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0xf)" ], [0], [])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+
 AT_SETUP([conntrack - multiple bridges])
 CHECK_CONNTRACK()
 OVS_TRAFFIC_VSWITCHD_START(