Message ID | 20210801130911.66655-1-hepeng.0320@bytedance.com |
---|---|
State | Accepted |
Delegated to: | Ilya Maximets |
Headers | show |
Series | [ovs-dev,v4] ofproto-dpif-xlate: fix zone set from non-frozen-metadata fields | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | warning | apply and check: warning |
ovsrobot/github-robot | fail | github build: failed |
Bleep bloop. Greetings Peng He, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: ERROR: Author Peng He <xnhp0320@gmail.com> needs to sign off. WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Peng He <hepeng.0320@bytedance.com> Lines checked: 236, Warnings: 1, Errors: 1 Please check this out. If you feel there has been an error, please email aconole@redhat.com Thanks, 0-day Robot
On 8/1/21 15:09, Peng He wrote: > CT zone could be set from a field that is not included in frozen > metadata. Consider the example rules which are typically seen in > OpenStack security group rules: > > priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0) > priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2 > > The zone is set from the first rule's ct action. These two rules will > generate two megaflows: the first one uses zone=5 to query the CT module, > the second one sets the zone-id from the first megaflow and commit to CT. > > The current implementation will generate a megaflow that does not use > ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is > set by an Imm not a field. > > Consider a situation that one changes the zone id (for example to 15) > in the first rule, however, still keep the second rule unchanged. During > this change, there is traffic hitting the two generated megaflows, the > revaldiator would revalidate all megaflows, however, the revalidator will > not change the second megaflow, because zone=5 is recorded in the > megaflow, so the xlate will still translate the commit action into zone=5, > and the new traffic will still commit to CT as zone=5, not zone=15, > resulting in taffic drops and other issues. > > Just like OVS set-field convention, if a field X is set by Y > (Y is a variable not an Imm), we should also mask Y as a match > in the generated megaflow. An exception is that if the zone-id is > set by the field that is included in the frozen state (i.e. regs) and this > upcall is a resume of a thawed xlate, the un-wildcarding can be skipped, > as the recirc_id is a hash of the values in these fields, and it will change > following the changes of these fields. When the recirc_id changes, > all megaflows with the old recirc id will be invalid later. > > Fixes: 07659514c3 ("Add support for connection tracking.") > Reported-by: Sai Su <susai.ss@bytedance.com> > Signed-off-by: Peng He <hepeng.0320@bytedance.com> > Acked-by: Mark D. Gray <mark.d.gray@redhat.com> > --- > include/openvswitch/meta-flow.h | 1 + > lib/meta-flow.c | 13 +++++ > ofproto/ofproto-dpif-xlate.c | 32 +++++++++---- > tests/system-traffic.at | 85 +++++++++++++++++++++++++++++++++ > 4 files changed, 123 insertions(+), 8 deletions(-) > > diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h > index 95e52e358..045dce8f5 100644 > --- a/include/openvswitch/meta-flow.h > +++ b/include/openvswitch/meta-flow.h > @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *, > const union mf_value *mask, > struct flow *); > bool mf_is_tun_metadata(const struct mf_field *); > +bool mf_is_frozen_metadata(const struct mf_field *); > bool mf_is_pipeline_field(const struct mf_field *); > bool mf_is_set(const struct mf_field *, const struct flow *); > void mf_mask_field(const struct mf_field *, struct flow_wildcards *); > diff --git a/lib/meta-flow.c b/lib/meta-flow.c > index c808d205d..e03cd8d0c 100644 > --- a/lib/meta-flow.c > +++ b/lib/meta-flow.c > @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf) > mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS; > } > > +bool > +mf_is_frozen_metadata(const struct mf_field *mf) > +{ > + if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) { > + return true; > + } > + > + if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) { > + return true; > + } > + return false; > +} > + > bool > mf_is_pipeline_field(const struct mf_field *mf) > { > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > index 7108c8a30..7b9f3369b 100644 > --- a/ofproto/ofproto-dpif-xlate.c > +++ b/ofproto/ofproto-dpif-xlate.c > @@ -6177,11 +6177,32 @@ static void > compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc, > bool is_last_action) > { > - ovs_u128 old_ct_label_mask = ctx->wc->masks.ct_label; > - uint32_t old_ct_mark_mask = ctx->wc->masks.ct_mark; > - size_t ct_offset; > uint16_t zone; > + if (ofc->zone_src.field) { > + union mf_subvalue value; > + memset(&value, 0xff, sizeof(value)); > + > + zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow); > + if (ctx->xin->frozen_state) { > + /* If the upcall is a resume of a recirculation, we only need to > + * unwildcard the fields that are not in the frozen_metadata, as > + * when the rules update, OVS will generate a new recirc_id, > + * which will invalidate the megaflow with old the recirc_id. > + */ > + if (!mf_is_frozen_metadata(ofc->zone_src.field)) { > + mf_write_subfield_flow(&ofc->zone_src, &value, > + &ctx->wc->masks); > + } > + } else { > + mf_write_subfield_flow(&ofc->zone_src, &value, &ctx->wc->masks); > + } > + } else { > + zone = ofc->zone_imm; > + } > > + size_t ct_offset; > + ovs_u128 old_ct_label_mask = ctx->wc->masks.ct_label; > + uint32_t old_ct_mark_mask = ctx->wc->masks.ct_mark; > /* Ensure that any prior actions are applied before composing the new > * conntrack action. */ > xlate_commit_actions(ctx); > @@ -6193,11 +6214,6 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc, > do_xlate_actions(ofc->actions, ofpact_ct_get_action_len(ofc), ctx, > is_last_action, false); > > - if (ofc->zone_src.field) { > - zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow); > - } else { > - zone = ofc->zone_imm; > - } > > ct_offset = nl_msg_start_nested(ctx->odp_actions, OVS_ACTION_ATTR_CT); > if (ofc->flags & NX_CT_F_COMMIT) { > diff --git a/tests/system-traffic.at b/tests/system-traffic.at > index fb5b9a36d..c3917aa0d 100644 > --- a/tests/system-traffic.at > +++ b/tests/system-traffic.at > @@ -1927,6 +1927,91 @@ tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src= > OVS_TRAFFIC_VSWITCHD_STOP > AT_CLEANUP > > +AT_SETUP([conntrack - zones from other field]) > +CHECK_CONNTRACK() > +OVS_TRAFFIC_VSWITCHD_START() > + > +ADD_NAMESPACES(at_ns0, at_ns1) > + > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") > +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24") > + > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0. > +AT_DATA([flows.txt], [dnl > +priority=1,action=drop > +priority=10,arp,action=normal > +priority=10,icmp,action=normal > +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0) > +priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2 > +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5) > +priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1 > +]) > + > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) > + > +OVS_START_L7([at_ns1], [http]) > + > +dnl HTTP requests from p0->p1 should work fine. > +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) > + > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl > +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>) > +]) > + > +dnl This is to test when the zoneid is set by a field variable like NXM_NX_CT_ZONE, the OVS xlate should generate a megaflow with a form of > +dnl "ct_zone(5), ... actions: ct(commit, zone=5)". The match "ct_zone(5)" is needed as if we changes the zoneid into 15 in the following, > +dnl the old "ct_zone(5), ... actions: ct(commit, zone=5)" megaflow will not get hit, and OVS will generate a new megaflow with the match "ct_zone(0xf)". > +dnl This will make sure that the new packets are committing to zoneid 15 rather than old 5. > +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0x5)" ], [0], []) > + > +AT_CHECK([ovs-ofctl mod-flows br0 'priority=100,ct_state=-trk,tcp,in_port="ovs-p0" actions=ct(table=0,zone=15)']) > +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) > +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0xf)" ], [0], []) > + > +OVS_TRAFFIC_VSWITCHD_STOP > +AT_CLEANUP > + > + > +AT_SETUP([conntrack - zones from other field, more tests]) > +CHECK_CONNTRACK() > +OVS_TRAFFIC_VSWITCHD_START() > + > +ADD_NAMESPACES(at_ns0, at_ns1) > + > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") > +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24") > + > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0. > +AT_DATA([flows.txt], [dnl > +priority=1,action=drop > +priority=10,arp,action=normal > +priority=10,icmp,action=normal > +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0,commit,exec(load:0xffff0005->NXM_NX_CT_LABEL[[0..31]])) > +priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_LABEL[[0..15]]),2 > +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5) > +priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1 > +]) > + > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) > + > +OVS_START_L7([at_ns1], [http]) > + > +dnl HTTP requests from p0->p1 should work fine. > +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) > + > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl > +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,labels=0xffff0005,protoinfo=(state=<cleared>) > +]) > + > +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_label(0xffff0005/0xffff)" ], [0], []) > + > +AT_CHECK([ovs-ofctl mod-flows br0 'priority=100,ct_state=-trk,tcp,in_port="ovs-p0" actions=ct(table=0,zone=15,commit,exec(load:0xffff000f->NXM_NX_CT_LABEL[[0..31]]))']) > +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) > +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_label(0xffff00ff/0xffff)" ], [0], []) Hi. Sorry it took so long. I re-run all the tests and this version seems to behave as expected. Except for a typo? in a second test where it should be 0xffff000f instead of 0xffff00ff. I fixed that. I also had to add one additional 'sed' to the second test in order to normalize the output between kernel and userspace datapaths (kernel clears non-masked bits while returning a datapath flow while userspace doesn't). With that, applied and backported down to 2.13. Best regards, Ilya Maximets.
diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h index 95e52e358..045dce8f5 100644 --- a/include/openvswitch/meta-flow.h +++ b/include/openvswitch/meta-flow.h @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *, const union mf_value *mask, struct flow *); bool mf_is_tun_metadata(const struct mf_field *); +bool mf_is_frozen_metadata(const struct mf_field *); bool mf_is_pipeline_field(const struct mf_field *); bool mf_is_set(const struct mf_field *, const struct flow *); void mf_mask_field(const struct mf_field *, struct flow_wildcards *); diff --git a/lib/meta-flow.c b/lib/meta-flow.c index c808d205d..e03cd8d0c 100644 --- a/lib/meta-flow.c +++ b/lib/meta-flow.c @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf) mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS; } +bool +mf_is_frozen_metadata(const struct mf_field *mf) +{ + if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) { + return true; + } + + if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) { + return true; + } + return false; +} + bool mf_is_pipeline_field(const struct mf_field *mf) { diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index 7108c8a30..7b9f3369b 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -6177,11 +6177,32 @@ static void compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc, bool is_last_action) { - ovs_u128 old_ct_label_mask = ctx->wc->masks.ct_label; - uint32_t old_ct_mark_mask = ctx->wc->masks.ct_mark; - size_t ct_offset; uint16_t zone; + if (ofc->zone_src.field) { + union mf_subvalue value; + memset(&value, 0xff, sizeof(value)); + + zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow); + if (ctx->xin->frozen_state) { + /* If the upcall is a resume of a recirculation, we only need to + * unwildcard the fields that are not in the frozen_metadata, as + * when the rules update, OVS will generate a new recirc_id, + * which will invalidate the megaflow with old the recirc_id. + */ + if (!mf_is_frozen_metadata(ofc->zone_src.field)) { + mf_write_subfield_flow(&ofc->zone_src, &value, + &ctx->wc->masks); + } + } else { + mf_write_subfield_flow(&ofc->zone_src, &value, &ctx->wc->masks); + } + } else { + zone = ofc->zone_imm; + } + size_t ct_offset; + ovs_u128 old_ct_label_mask = ctx->wc->masks.ct_label; + uint32_t old_ct_mark_mask = ctx->wc->masks.ct_mark; /* Ensure that any prior actions are applied before composing the new * conntrack action. */ xlate_commit_actions(ctx); @@ -6193,11 +6214,6 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc, do_xlate_actions(ofc->actions, ofpact_ct_get_action_len(ofc), ctx, is_last_action, false); - if (ofc->zone_src.field) { - zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow); - } else { - zone = ofc->zone_imm; - } ct_offset = nl_msg_start_nested(ctx->odp_actions, OVS_ACTION_ATTR_CT); if (ofc->flags & NX_CT_F_COMMIT) { diff --git a/tests/system-traffic.at b/tests/system-traffic.at index fb5b9a36d..c3917aa0d 100644 --- a/tests/system-traffic.at +++ b/tests/system-traffic.at @@ -1927,6 +1927,91 @@ tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=<cleared>,dport=<cleared>),reply=(src= OVS_TRAFFIC_VSWITCHD_STOP AT_CLEANUP +AT_SETUP([conntrack - zones from other field]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24") + +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=10,icmp,action=normal +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0) +priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2 +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5) +priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +OVS_START_L7([at_ns1], [http]) + +dnl HTTP requests from p0->p1 should work fine. +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,protoinfo=(state=<cleared>) +]) + +dnl This is to test when the zoneid is set by a field variable like NXM_NX_CT_ZONE, the OVS xlate should generate a megaflow with a form of +dnl "ct_zone(5), ... actions: ct(commit, zone=5)". The match "ct_zone(5)" is needed as if we changes the zoneid into 15 in the following, +dnl the old "ct_zone(5), ... actions: ct(commit, zone=5)" megaflow will not get hit, and OVS will generate a new megaflow with the match "ct_zone(0xf)". +dnl This will make sure that the new packets are committing to zoneid 15 rather than old 5. +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0x5)" ], [0], []) + +AT_CHECK([ovs-ofctl mod-flows br0 'priority=100,ct_state=-trk,tcp,in_port="ovs-p0" actions=ct(table=0,zone=15)']) +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_zone(0xf)" ], [0], []) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + + +AT_SETUP([conntrack - zones from other field, more tests]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24") + +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=10,icmp,action=normal +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0,commit,exec(load:0xffff0005->NXM_NX_CT_LABEL[[0..31]])) +priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_LABEL[[0..15]]),2 +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=5) +priority=100,in_port=2,ct_state=+trk,ct_zone=5,tcp,action=1 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +OVS_START_L7([at_ns1], [http]) + +dnl HTTP requests from p0->p1 should work fine. +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),zone=5,labels=0xffff0005,protoinfo=(state=<cleared>) +]) + +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_label(0xffff0005/0xffff)" ], [0], []) + +AT_CHECK([ovs-ofctl mod-flows br0 'priority=100,ct_state=-trk,tcp,in_port="ovs-p0" actions=ct(table=0,zone=15,commit,exec(load:0xffff000f->NXM_NX_CT_LABEL[[0..31]]))']) +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) +AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 | grep "+trk" | grep -q "ct_label(0xffff00ff/0xffff)" ], [0], []) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + AT_SETUP([conntrack - multiple bridges]) CHECK_CONNTRACK() OVS_TRAFFIC_VSWITCHD_START(