diff mbox series

[ovs-dev] classifier: adjust segment boundary to execute prerequisite processing

Message ID 20220516200413.32620-1-aconole@redhat.com
State Superseded
Headers show
Series [ovs-dev] classifier: adjust segment boundary to execute prerequisite processing | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation fail test: fail

Commit Message

Aaron Conole May 16, 2022, 8:04 p.m. UTC
During flow processing, the flow wildcards are checked as a series of
stages, and these stages are intended to carry dependencies in a single
direction.  But when the neighbor discovery processing, for example, was
executed there is an incorrect dependency chain - we need fields from
stage 4 to determine whether we need fields from stage 3.

We can build a set of flow rules to demonstrate this:
  table=0,priority=100,ipv6,ipv6_src=1000::/10 actions=resubmit(,1)
  table=0,priority=0 actions=NORMAL
  table=1,priority=110,ipv6,ipv6_dst=1000::3 actions=resubmit(,2)
  table=1,priority=100,ipv6,ipv6_dst=1000::4 actions=resubmit(,2)
  table=1,priority=0 actions=NORMAL
  table=2,priority=120,icmp6,nw_ttl=255,icmp_type=135,icmp_code=0,nd_sll=10:de:ad:be:ef:10 actions=NORMAL
  table=2,priority=100,tcp actions=NORMAL
  table=2,priority=100,icmp6 actions=NORMAL
  table=2,priority=0 actions=NORMAL

With this set of flows, any IPv6 packet that executes through this pipeline
will have the corresponding nd_sll field flagged as required match for
classification even if that field doesn't make sense in such a context
(for example, TCP packets).  When the corresponding flow is installed into
the kernel datapath, this field is not reflected when the revalidator
executes the dump stage (see net/openvswitch/flow_netlink.c for more details).

During the sweep stage, revalidator will compare the dumped WC with a
generated WC - these will mismatch because the generated WC will match on
the Neighbor Discovery fields, while the datapath WC will not match on
these fields.  We will then invalidate the flow and as a side effect
force an upcall.

By redefining the boundary, we shift these fields to the l4 subtable, and
cause masks to be generated matching just the requisite fields.  The list
of fields being shifted:

    struct in6_addr nd_target;
    struct eth_addr arp_sha;
    struct eth_addr arp_tha;
    ovs_be16 tcp_flags;
    ovs_be16 pad2;
    struct ovs_key_nsh nsh;

A standout field would be tcp_flags moving from l3 subtable matches to
the l4 subtable matches.  This reverts a partial performance optimization
in the case of stateless firewalling.  The tcp_flags field might have
been a good candidate to retain in the l3 segment, but it got overloaded
with ICMPv6 ND matching, and therefore we can't preserve this kind of
optimization.

Two other approaches were considered - moving the nd_target field alone
and collapsing the l3/l4 segments into a single subtable for matching.
Moving any field individually introduces ABI mismatch, and doesn't
completely address the problems with other neighbor discovery related
fields (such as nd_sll/nd_tll).  Collapsing the two subtables creates
an issue with datapath flow explosion, since the l3 and l4 fields will
be unwildcarded together (this can be seen with some of the existing
classifier tests).

A simple test is added to showcase the behavior.

Fixes: 476f36e83bc5 ("Classifier: Staged subtable matching.")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2081773
Reported-by: Numan Siddique <nusiddiq@redhat.com>
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 include/openvswitch/flow.h |  7 +++----
 tests/classifier.at        | 27 +++++++++++++++++++++++++++
 2 files changed, 30 insertions(+), 4 deletions(-)

Comments

Eelco Chaudron May 17, 2022, 8:47 a.m. UTC | #1
On 16 May 2022, at 22:04, Aaron Conole wrote:

> During flow processing, the flow wildcards are checked as a series of
> stages, and these stages are intended to carry dependencies in a single
> direction.  But when the neighbor discovery processing, for example, was
> executed there is an incorrect dependency chain - we need fields from
> stage 4 to determine whether we need fields from stage 3.
>
> We can build a set of flow rules to demonstrate this:
>   table=0,priority=100,ipv6,ipv6_src=1000::/10 actions=resubmit(,1)
>   table=0,priority=0 actions=NORMAL
>   table=1,priority=110,ipv6,ipv6_dst=1000::3 actions=resubmit(,2)
>   table=1,priority=100,ipv6,ipv6_dst=1000::4 actions=resubmit(,2)
>   table=1,priority=0 actions=NORMAL
>   table=2,priority=120,icmp6,nw_ttl=255,icmp_type=135,icmp_code=0,nd_sll=10:de:ad:be:ef:10 actions=NORMAL
>   table=2,priority=100,tcp actions=NORMAL
>   table=2,priority=100,icmp6 actions=NORMAL
>   table=2,priority=0 actions=NORMAL
>
> With this set of flows, any IPv6 packet that executes through this pipeline
> will have the corresponding nd_sll field flagged as required match for
> classification even if that field doesn't make sense in such a context
> (for example, TCP packets).  When the corresponding flow is installed into
> the kernel datapath, this field is not reflected when the revalidator
> executes the dump stage (see net/openvswitch/flow_netlink.c for more details).
>
> During the sweep stage, revalidator will compare the dumped WC with a
> generated WC - these will mismatch because the generated WC will match on
> the Neighbor Discovery fields, while the datapath WC will not match on
> these fields.  We will then invalidate the flow and as a side effect
> force an upcall.
>
> By redefining the boundary, we shift these fields to the l4 subtable, and
> cause masks to be generated matching just the requisite fields.  The list
> of fields being shifted:
>
>     struct in6_addr nd_target;
>     struct eth_addr arp_sha;
>     struct eth_addr arp_tha;
>     ovs_be16 tcp_flags;
>     ovs_be16 pad2;
>     struct ovs_key_nsh nsh;
>
> A standout field would be tcp_flags moving from l3 subtable matches to
> the l4 subtable matches.  This reverts a partial performance optimization
> in the case of stateless firewalling.  The tcp_flags field might have
> been a good candidate to retain in the l3 segment, but it got overloaded
> with ICMPv6 ND matching, and therefore we can't preserve this kind of
> optimization.
>
> Two other approaches were considered - moving the nd_target field alone
> and collapsing the l3/l4 segments into a single subtable for matching.
> Moving any field individually introduces ABI mismatch, and doesn't
> completely address the problems with other neighbor discovery related
> fields (such as nd_sll/nd_tll).  Collapsing the two subtables creates
> an issue with datapath flow explosion, since the l3 and l4 fields will
> be unwildcarded together (this can be seen with some of the existing
> classifier tests).
>
> A simple test is added to showcase the behavior.
>
> Fixes: 476f36e83bc5 ("Classifier: Staged subtable matching.")
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2081773
> Reported-by: Numan Siddique <nusiddiq@redhat.com>
> Suggested-by: Ilya Maximets <i.maximets@ovn.org>
> Signed-off-by: Aaron Conole <aconole@redhat.com>
> ---

Bit later response (was planning on replying to the RFC patch), but I needed a bit more time (and experimenting) with this area of the code :)

The change looks good, and thanks for adding so many details in the commit message!

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Ferriter, Cian May 19, 2022, 1:25 p.m. UTC | #2
Hi Aaron,

<snip commit message away>

> ---
>  include/openvswitch/flow.h |  7 +++----
>  tests/classifier.at        | 27 +++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h
> index 3054015d93..df10cf579e 100644
> --- a/include/openvswitch/flow.h
> +++ b/include/openvswitch/flow.h
> @@ -141,15 +141,14 @@ struct flow {
>      uint8_t nw_tos;             /* IP ToS (including DSCP and ECN). */
>      uint8_t nw_ttl;             /* IP TTL/Hop Limit. */
>      uint8_t nw_proto;           /* IP protocol or low 8 bits of ARP opcode. */
> +    /* L4 (64-bit aligned) */
>      struct in6_addr nd_target;  /* IPv6 neighbor discovery (ND) target. */
>      struct eth_addr arp_sha;    /* ARP/ND source hardware address. */
>      struct eth_addr arp_tha;    /* ARP/ND target hardware address. */
> -    ovs_be16 tcp_flags;         /* TCP flags/ICMPv6 ND options type.
> -                                 * With L3 to avoid matching L4. */
> +    ovs_be16 tcp_flags;         /* TCP flags/ICMPv6 ND options type. */
>      ovs_be16 pad2;              /* Pad to 64 bits. */
>      struct ovs_key_nsh nsh;     /* Network Service Header keys */
> 
> -    /* L4 (64-bit aligned) */
>      ovs_be16 tp_src;            /* TCP/UDP/SCTP source port/ICMP type. */
>      ovs_be16 tp_dst;            /* TCP/UDP/SCTP destination port/ICMP code. */
>      ovs_be16 ct_tp_src;         /* CT original tuple source port/ICMP type. */
> @@ -179,7 +178,7 @@ BUILD_ASSERT_DECL(offsetof(struct flow, igmp_group_ip4) + sizeof(uint32_t)

About the following BUILD_ASSRT_DECL which is in include/openvswitch/flow.h directly below struct flow.h:
/* Remember to update FLOW_WC_SEQ when changing 'struct flow'. */
BUILD_ASSERT_DECL(offsetof(struct flow, igmp_group_ip4) + sizeof(uint32_t)
                  == sizeof(struct flow_tnl) + sizeof(struct ovs_key_nsh) + 300
                  && FLOW_WC_SEQ == 42);

Should we increment the FLOW_WC_SEQ value? The comment reads:
/* This sequence number should be incremented whenever anything involving flows
 * or the wildcarding of flows changes.  This will cause build assertion
 * failures in places which likely need to be updated. */

We haven't changed anything in the order or content of struct flow but we have changed (fixed) how flows are wildcarded. It doesn't look like any of the code asserting FLOW_WC_SEQ == 42 and relying on struct flow order and content needs updating.
Just wondering your/others thoughts about whether to increment to 43 or not.

>  enum {
>      FLOW_SEGMENT_1_ENDS_AT = offsetof(struct flow, dl_dst),
>      FLOW_SEGMENT_2_ENDS_AT = offsetof(struct flow, nw_src),
> -    FLOW_SEGMENT_3_ENDS_AT = offsetof(struct flow, tp_src),
> +    FLOW_SEGMENT_3_ENDS_AT = offsetof(struct flow, nd_target),
>  };
>  BUILD_ASSERT_DECL(FLOW_SEGMENT_1_ENDS_AT % sizeof(uint64_t) == 0);
>  BUILD_ASSERT_DECL(FLOW_SEGMENT_2_ENDS_AT % sizeof(uint64_t) == 0);

IIUC, we are moving the L4 boundary 56B earlier in struct flow. This means L3/L4 segment boundary is still at a U64 boundary. I know we have BUILD_ASSERTs checking this, but I just thought to double check myself.

> diff --git a/tests/classifier.at b/tests/classifier.at
> index cdcd72c156..a0a8fe0359 100644
> --- a/tests/classifier.at
> +++ b/tests/classifier.at
> @@ -129,6 +129,33 @@ Datapath actions: 3
>  OVS_VSWITCHD_STOP(["/'prefixes' with incompatible field: ipv6_label/d"])
>  AT_CLEANUP
> 
> +AT_SETUP([flow classifier - ipv6 ND dependancy])
> +OVS_VSWITCHD_START
> +add_of_ports br0 1 2
> +AT_DATA([flows.txt], [dnl
> + table=0,priority=100,ipv6,ipv6_src=1000::/10 actions=resubmit(,1)
> + table=0,priority=0 actions=NORMAL
> + table=1,priority=110,ipv6,ipv6_dst=1000::3 actions=resubmit(,2)
> + table=1,priority=100,ipv6,ipv6_dst=1000::4 actions=resubmit(,2)
> + table=1,priority=0 actions=NORMAL
> + table=2,priority=120,icmp6,nw_ttl=255,icmp_type=135,icmp_code=0,nd_target=1000::1 actions=NORMAL
> + table=2,priority=100,tcp actions=NORMAL
> + table=2,priority=100,icmp6 actions=NORMAL
> + table=2,priority=0 actions=NORMAL
> +])
> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> +
> +# test ICMPv6 echo request (which should have no nd_target field)
> +AT_CHECK([ovs-appctl ofproto/trace br0
> "in_port=1,eth_src=f6:d2:b0:19:5e:7b,eth_dst=d2:49:19:91:78:fe,dl_type=0x86dd,ipv6_src=1000::3,ipv6_ds
> t=1000::4,nw_proto=58,icmpv6_type=128,icmpv6_code=0"], [0], [stdout])
> +AT_CHECK([tail -2 stdout], [0],
> +  [Megaflow:
> recirc_id=0,eth,icmp6,in_port=1,dl_src=f6:d2:b0:19:5e:7b,dl_dst=d2:49:19:91:78:fe,ipv6_src=1000::/10,i
> pv6_dst=1000::4,nw_ttl=0,nw_frag=no
> +Datapath actions: 100,2
> +])
> +
> +AT_CLEANUP
> +
> +
> +

Nit: The other tests in the file have one line after the "AT_CLEANUP", maybe this should be the same instead of 3 lines?

>  AT_BANNER([conjunctive match])
> 
>  AT_SETUP([single conjunctive match])

I've applied the patch to the latest master, run the unit test and it passes for me.
I also ran the test after reverting just the C code changes in this patch and correctly fails as expected. The Megaflow in the test has a ",nd_target=::" at the end, which the test correctly picks up as a failure. The test is correctly doing its job!

I have looked at the performance tests we run and none of them are affected by moving tcp_flags from the l3 subtable match to l4.

Out of curiosity, I also ran the classifier benchmark tests (ovstest test-classifier benchmark) and didn't see a performance difference before and after the patch. Having a look at tests/test-classifier.c, I can see that the tcp_flags field isn't tested, so it makes sense there's no difference in performance there.

Regardless of the questions and comments above, I'm happy to Ack this patch in its current state:
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Numan Siddique May 19, 2022, 5:36 p.m. UTC | #3
On Thu, May 19, 2022 at 9:25 AM Ferriter, Cian <cian.ferriter@intel.com> wrote:
>
> Hi Aaron,
>
> <snip commit message away>
>
> > ---
> >  include/openvswitch/flow.h |  7 +++----
> >  tests/classifier.at        | 27 +++++++++++++++++++++++++++
> >  2 files changed, 30 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h
> > index 3054015d93..df10cf579e 100644
> > --- a/include/openvswitch/flow.h
> > +++ b/include/openvswitch/flow.h
> > @@ -141,15 +141,14 @@ struct flow {
> >      uint8_t nw_tos;             /* IP ToS (including DSCP and ECN). */
> >      uint8_t nw_ttl;             /* IP TTL/Hop Limit. */
> >      uint8_t nw_proto;           /* IP protocol or low 8 bits of ARP opcode. */
> > +    /* L4 (64-bit aligned) */
> >      struct in6_addr nd_target;  /* IPv6 neighbor discovery (ND) target. */
> >      struct eth_addr arp_sha;    /* ARP/ND source hardware address. */
> >      struct eth_addr arp_tha;    /* ARP/ND target hardware address. */
> > -    ovs_be16 tcp_flags;         /* TCP flags/ICMPv6 ND options type.
> > -                                 * With L3 to avoid matching L4. */
> > +    ovs_be16 tcp_flags;         /* TCP flags/ICMPv6 ND options type. */
> >      ovs_be16 pad2;              /* Pad to 64 bits. */
> >      struct ovs_key_nsh nsh;     /* Network Service Header keys */
> >
> > -    /* L4 (64-bit aligned) */
> >      ovs_be16 tp_src;            /* TCP/UDP/SCTP source port/ICMP type. */
> >      ovs_be16 tp_dst;            /* TCP/UDP/SCTP destination port/ICMP code. */
> >      ovs_be16 ct_tp_src;         /* CT original tuple source port/ICMP type. */
> > @@ -179,7 +178,7 @@ BUILD_ASSERT_DECL(offsetof(struct flow, igmp_group_ip4) + sizeof(uint32_t)
>
> About the following BUILD_ASSRT_DECL which is in include/openvswitch/flow.h directly below struct flow.h:
> /* Remember to update FLOW_WC_SEQ when changing 'struct flow'. */
> BUILD_ASSERT_DECL(offsetof(struct flow, igmp_group_ip4) + sizeof(uint32_t)
>                   == sizeof(struct flow_tnl) + sizeof(struct ovs_key_nsh) + 300
>                   && FLOW_WC_SEQ == 42);
>
> Should we increment the FLOW_WC_SEQ value? The comment reads:
> /* This sequence number should be incremented whenever anything involving flows
>  * or the wildcarding of flows changes.  This will cause build assertion
>  * failures in places which likely need to be updated. */
>
> We haven't changed anything in the order or content of struct flow but we have changed (fixed) how flows are wildcarded. It doesn't look like any of the code asserting FLOW_WC_SEQ == 42 and relying on struct flow order and content needs updating.
> Just wondering your/others thoughts about whether to increment to 43 or not.
>
> >  enum {
> >      FLOW_SEGMENT_1_ENDS_AT = offsetof(struct flow, dl_dst),
> >      FLOW_SEGMENT_2_ENDS_AT = offsetof(struct flow, nw_src),
> > -    FLOW_SEGMENT_3_ENDS_AT = offsetof(struct flow, tp_src),
> > +    FLOW_SEGMENT_3_ENDS_AT = offsetof(struct flow, nd_target),
> >  };
> >  BUILD_ASSERT_DECL(FLOW_SEGMENT_1_ENDS_AT % sizeof(uint64_t) == 0);
> >  BUILD_ASSERT_DECL(FLOW_SEGMENT_2_ENDS_AT % sizeof(uint64_t) == 0);
>
> IIUC, we are moving the L4 boundary 56B earlier in struct flow. This means L3/L4 segment boundary is still at a U64 boundary. I know we have BUILD_ASSERTs checking this, but I just thought to double check myself.
>
> > diff --git a/tests/classifier.at b/tests/classifier.at
> > index cdcd72c156..a0a8fe0359 100644
> > --- a/tests/classifier.at
> > +++ b/tests/classifier.at
> > @@ -129,6 +129,33 @@ Datapath actions: 3
> >  OVS_VSWITCHD_STOP(["/'prefixes' with incompatible field: ipv6_label/d"])
> >  AT_CLEANUP
> >
> > +AT_SETUP([flow classifier - ipv6 ND dependancy])
> > +OVS_VSWITCHD_START
> > +add_of_ports br0 1 2
> > +AT_DATA([flows.txt], [dnl
> > + table=0,priority=100,ipv6,ipv6_src=1000::/10 actions=resubmit(,1)
> > + table=0,priority=0 actions=NORMAL
> > + table=1,priority=110,ipv6,ipv6_dst=1000::3 actions=resubmit(,2)
> > + table=1,priority=100,ipv6,ipv6_dst=1000::4 actions=resubmit(,2)
> > + table=1,priority=0 actions=NORMAL
> > + table=2,priority=120,icmp6,nw_ttl=255,icmp_type=135,icmp_code=0,nd_target=1000::1 actions=NORMAL
> > + table=2,priority=100,tcp actions=NORMAL
> > + table=2,priority=100,icmp6 actions=NORMAL
> > + table=2,priority=0 actions=NORMAL
> > +])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +
> > +# test ICMPv6 echo request (which should have no nd_target field)
> > +AT_CHECK([ovs-appctl ofproto/trace br0
> > "in_port=1,eth_src=f6:d2:b0:19:5e:7b,eth_dst=d2:49:19:91:78:fe,dl_type=0x86dd,ipv6_src=1000::3,ipv6_ds
> > t=1000::4,nw_proto=58,icmpv6_type=128,icmpv6_code=0"], [0], [stdout])
> > +AT_CHECK([tail -2 stdout], [0],
> > +  [Megaflow:
> > recirc_id=0,eth,icmp6,in_port=1,dl_src=f6:d2:b0:19:5e:7b,dl_dst=d2:49:19:91:78:fe,ipv6_src=1000::/10,i
> > pv6_dst=1000::4,nw_ttl=0,nw_frag=no
> > +Datapath actions: 100,2
> > +])
> > +
> > +AT_CLEANUP
> > +
> > +
> > +
>
> Nit: The other tests in the file have one line after the "AT_CLEANUP", maybe this should be the same instead of 3 lines?
>
> >  AT_BANNER([conjunctive match])
> >
> >  AT_SETUP([single conjunctive match])
>
> I've applied the patch to the latest master, run the unit test and it passes for me.
> I also ran the test after reverting just the C code changes in this patch and correctly fails as expected. The Megaflow in the test has a ",nd_target=::" at the end, which the test correctly picks up as a failure. The test is correctly doing its job!
>
> I have looked at the performance tests we run and none of them are affected by moving tcp_flags from the l3 subtable match to l4.
>
> Out of curiosity, I also ran the classifier benchmark tests (ovstest test-classifier benchmark) and didn't see a performance difference before and after the patch. Having a look at tests/test-classifier.c, I can see that the tcp_flags field isn't tested, so it makes sense there's no difference in performance there.
>
> Regardless of the questions and comments above, I'm happy to Ack this patch in its current state:
> Acked-by: Cian Ferriter <cian.ferriter@intel.com>

Thanks for fixing this issue.
Tested the patch and it fixes the issue on my setup.

Tested-by: Numan Siddique <numans@ovn.org>

Thanks
Numan

> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Aaron Conole May 19, 2022, 8:53 p.m. UTC | #4
"Ferriter, Cian" <cian.ferriter@intel.com> writes:

> Hi Aaron,

Hi Cian,

> <snip commit message away>
>
>> ---
>>  include/openvswitch/flow.h |  7 +++----
>>  tests/classifier.at        | 27 +++++++++++++++++++++++++++
>>  2 files changed, 30 insertions(+), 4 deletions(-)
>> 
>> diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h
>> index 3054015d93..df10cf579e 100644
>> --- a/include/openvswitch/flow.h
>> +++ b/include/openvswitch/flow.h
>> @@ -141,15 +141,14 @@ struct flow {
>>      uint8_t nw_tos;             /* IP ToS (including DSCP and ECN). */
>>      uint8_t nw_ttl;             /* IP TTL/Hop Limit. */
>>      uint8_t nw_proto;           /* IP protocol or low 8 bits of ARP opcode. */
>> +    /* L4 (64-bit aligned) */
>>      struct in6_addr nd_target;  /* IPv6 neighbor discovery (ND) target. */
>>      struct eth_addr arp_sha;    /* ARP/ND source hardware address. */
>>      struct eth_addr arp_tha;    /* ARP/ND target hardware address. */
>> -    ovs_be16 tcp_flags;         /* TCP flags/ICMPv6 ND options type.
>> -                                 * With L3 to avoid matching L4. */
>> +    ovs_be16 tcp_flags;         /* TCP flags/ICMPv6 ND options type. */
>>      ovs_be16 pad2;              /* Pad to 64 bits. */
>>      struct ovs_key_nsh nsh;     /* Network Service Header keys */
>> 
>> -    /* L4 (64-bit aligned) */
>>      ovs_be16 tp_src;            /* TCP/UDP/SCTP source port/ICMP type. */
>>      ovs_be16 tp_dst;            /* TCP/UDP/SCTP destination port/ICMP code. */
>>      ovs_be16 ct_tp_src;         /* CT original tuple source port/ICMP type. */
>> @@ -179,7 +178,7 @@ BUILD_ASSERT_DECL(offsetof(struct flow, igmp_group_ip4) + sizeof(uint32_t)
>
> About the following BUILD_ASSRT_DECL which is in include/openvswitch/flow.h directly below struct flow.h:
> /* Remember to update FLOW_WC_SEQ when changing 'struct flow'. */
> BUILD_ASSERT_DECL(offsetof(struct flow, igmp_group_ip4) + sizeof(uint32_t)
>                   == sizeof(struct flow_tnl) + sizeof(struct ovs_key_nsh) + 300
>                   && FLOW_WC_SEQ == 42);
>
> Should we increment the FLOW_WC_SEQ value? The comment reads:
> /* This sequence number should be incremented whenever anything involving flows
>  * or the wildcarding of flows changes.  This will cause build assertion
>  * failures in places which likely need to be updated. */
>
> We haven't changed anything in the order or content of struct flow but we have changed (fixed) how flows are wildcarded. It doesn't look like any of the code asserting FLOW_WC_SEQ == 42 and relying on struct flow order and content needs updating.
> Just wondering your/others thoughts about whether to increment to 43 or not.

I decided not to increment it because it's really a subtable
configuration for the classifier rather than a flow structure related
change.  In ofproto/ofpcoto.c we initialize the classifier with the map
stored in flow.c that holds the various segments.  I consider it more of
a classifier configuration, in that sense.  As you note, it doesn't
modify the layout or size of flow struct.  Maybe we can have a more
precise comment around that area.

>>  enum {
>>      FLOW_SEGMENT_1_ENDS_AT = offsetof(struct flow, dl_dst),
>>      FLOW_SEGMENT_2_ENDS_AT = offsetof(struct flow, nw_src),
>> -    FLOW_SEGMENT_3_ENDS_AT = offsetof(struct flow, tp_src),
>> +    FLOW_SEGMENT_3_ENDS_AT = offsetof(struct flow, nd_target),
>>  };
>>  BUILD_ASSERT_DECL(FLOW_SEGMENT_1_ENDS_AT % sizeof(uint64_t) == 0);
>>  BUILD_ASSERT_DECL(FLOW_SEGMENT_2_ENDS_AT % sizeof(uint64_t) == 0);
>
> IIUC, we are moving the L4 boundary 56B earlier in struct flow. This
> means L3/L4 segment boundary is still at a U64 boundary. I know we
> have BUILD_ASSERTs checking this, but I just thought to double check
> myself.

Yes - it's 448 bits or 7 u64 words.  This is a bit lucky for us -
otherwise we might have had to get more creative :)

>> diff --git a/tests/classifier.at b/tests/classifier.at
>> index cdcd72c156..a0a8fe0359 100644
>> --- a/tests/classifier.at
>> +++ b/tests/classifier.at
>> @@ -129,6 +129,33 @@ Datapath actions: 3
>>  OVS_VSWITCHD_STOP(["/'prefixes' with incompatible field: ipv6_label/d"])
>>  AT_CLEANUP
>> 
>> +AT_SETUP([flow classifier - ipv6 ND dependancy])
>> +OVS_VSWITCHD_START
>> +add_of_ports br0 1 2
>> +AT_DATA([flows.txt], [dnl
>> + table=0,priority=100,ipv6,ipv6_src=1000::/10 actions=resubmit(,1)
>> + table=0,priority=0 actions=NORMAL
>> + table=1,priority=110,ipv6,ipv6_dst=1000::3 actions=resubmit(,2)
>> + table=1,priority=100,ipv6,ipv6_dst=1000::4 actions=resubmit(,2)
>> + table=1,priority=0 actions=NORMAL
>> + table=2,priority=120,icmp6,nw_ttl=255,icmp_type=135,icmp_code=0,nd_target=1000::1 actions=NORMAL
>> + table=2,priority=100,tcp actions=NORMAL
>> + table=2,priority=100,icmp6 actions=NORMAL
>> + table=2,priority=0 actions=NORMAL
>> +])
>> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
>> +
>> +# test ICMPv6 echo request (which should have no nd_target field)
>> +AT_CHECK([ovs-appctl ofproto/trace br0
>> "in_port=1,eth_src=f6:d2:b0:19:5e:7b,eth_dst=d2:49:19:91:78:fe,dl_type=0x86dd,ipv6_src=1000::3,ipv6_ds
>> t=1000::4,nw_proto=58,icmpv6_type=128,icmpv6_code=0"], [0], [stdout])
>> +AT_CHECK([tail -2 stdout], [0],
>> +  [Megaflow:
>> recirc_id=0,eth,icmp6,in_port=1,dl_src=f6:d2:b0:19:5e:7b,dl_dst=d2:49:19:91:78:fe,ipv6_src=1000::/10,i
>> pv6_dst=1000::4,nw_ttl=0,nw_frag=no
>> +Datapath actions: 100,2
>> +])
>> +
>> +AT_CLEANUP
>> +
>> +
>> +
>
> Nit: The other tests in the file have one line after the "AT_CLEANUP",
> maybe this should be the same instead of 3 lines?

Hrrm.. not sure how that got in.  I can spin a v2 with this cleaned up.

>>  AT_BANNER([conjunctive match])
>> 
>>  AT_SETUP([single conjunctive match])
>
> I've applied the patch to the latest master, run the unit test and it passes for me.
> I also ran the test after reverting just the C code changes in this
> patch and correctly fails as expected. The Megaflow in the test has a
> ",nd_target=::" at the end, which the test correctly picks up as a
> failure. The test is correctly doing its job!

:)

> I have looked at the performance tests we run and none of them are
> affected by moving tcp_flags from the l3 subtable match to l4.

That's good to know - it's an area I'm concerned about.

> Out of curiosity, I also ran the classifier benchmark tests (ovstest
> test-classifier benchmark) and didn't see a performance difference
> before and after the patch. Having a look at tests/test-classifier.c,
> I can see that the tcp_flags field isn't tested, so it makes sense
> there's no difference in performance there.
>
> Regardless of the questions and comments above, I'm happy to Ack this patch in its current state:
> Acked-by: Cian Ferriter <cian.ferriter@intel.com>

Thanks!

> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ferriter, Cian May 20, 2022, 8:26 a.m. UTC | #5
> -----Original Message-----
> From: Aaron Conole <aconole@redhat.com>
> Sent: Thursday 19 May 2022 21:53
> To: Ferriter, Cian <cian.ferriter@intel.com>
> Cc: dev@openvswitch.org; Flavio Leitner <fbl@sysclose.org>; Ilya Maximets <i.maximets@ovn.org>; Rashid
> Khan <rkhan@redhat.com>
> Subject: Re: [ovs-dev] [PATCH] classifier: adjust segment boundary to execute prerequisite processing
> 
> "Ferriter, Cian" <cian.ferriter@intel.com> writes:
> 
> > Hi Aaron,
> 
> Hi Cian,
> 
> > <snip commit message away>
> >
> >> ---
> >>  include/openvswitch/flow.h |  7 +++----
> >>  tests/classifier.at        | 27 +++++++++++++++++++++++++++
> >>  2 files changed, 30 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h
> >> index 3054015d93..df10cf579e 100644
> >> --- a/include/openvswitch/flow.h
> >> +++ b/include/openvswitch/flow.h
> >> @@ -141,15 +141,14 @@ struct flow {
> >>      uint8_t nw_tos;             /* IP ToS (including DSCP and ECN). */
> >>      uint8_t nw_ttl;             /* IP TTL/Hop Limit. */
> >>      uint8_t nw_proto;           /* IP protocol or low 8 bits of ARP opcode. */
> >> +    /* L4 (64-bit aligned) */
> >>      struct in6_addr nd_target;  /* IPv6 neighbor discovery (ND) target. */
> >>      struct eth_addr arp_sha;    /* ARP/ND source hardware address. */
> >>      struct eth_addr arp_tha;    /* ARP/ND target hardware address. */
> >> -    ovs_be16 tcp_flags;         /* TCP flags/ICMPv6 ND options type.
> >> -                                 * With L3 to avoid matching L4. */
> >> +    ovs_be16 tcp_flags;         /* TCP flags/ICMPv6 ND options type. */
> >>      ovs_be16 pad2;              /* Pad to 64 bits. */
> >>      struct ovs_key_nsh nsh;     /* Network Service Header keys */
> >>
> >> -    /* L4 (64-bit aligned) */
> >>      ovs_be16 tp_src;            /* TCP/UDP/SCTP source port/ICMP type. */
> >>      ovs_be16 tp_dst;            /* TCP/UDP/SCTP destination port/ICMP code. */
> >>      ovs_be16 ct_tp_src;         /* CT original tuple source port/ICMP type. */
> >> @@ -179,7 +178,7 @@ BUILD_ASSERT_DECL(offsetof(struct flow, igmp_group_ip4) + sizeof(uint32_t)
> >
> > About the following BUILD_ASSRT_DECL which is in include/openvswitch/flow.h directly below struct
> flow.h:
> > /* Remember to update FLOW_WC_SEQ when changing 'struct flow'. */
> > BUILD_ASSERT_DECL(offsetof(struct flow, igmp_group_ip4) + sizeof(uint32_t)
> >                   == sizeof(struct flow_tnl) + sizeof(struct ovs_key_nsh) + 300
> >                   && FLOW_WC_SEQ == 42);
> >
> > Should we increment the FLOW_WC_SEQ value? The comment reads:
> > /* This sequence number should be incremented whenever anything involving flows
> >  * or the wildcarding of flows changes.  This will cause build assertion
> >  * failures in places which likely need to be updated. */
> >
> > We haven't changed anything in the order or content of struct flow but we have changed (fixed) how
> flows are wildcarded. It doesn't look like any of the code asserting FLOW_WC_SEQ == 42 and relying on
> struct flow order and content needs updating.
> > Just wondering your/others thoughts about whether to increment to 43 or not.
> 
> I decided not to increment it because it's really a subtable
> configuration for the classifier rather than a flow structure related
> change.  In ofproto/ofpcoto.c we initialize the classifier with the map
> stored in flow.c that holds the various segments.  I consider it more of
> a classifier configuration, in that sense.  As you note, it doesn't
> modify the layout or size of flow struct.  Maybe we can have a more
> precise comment around that area.
> 

That makes sense, I agree.

> >>  enum {
> >>      FLOW_SEGMENT_1_ENDS_AT = offsetof(struct flow, dl_dst),
> >>      FLOW_SEGMENT_2_ENDS_AT = offsetof(struct flow, nw_src),
> >> -    FLOW_SEGMENT_3_ENDS_AT = offsetof(struct flow, tp_src),
> >> +    FLOW_SEGMENT_3_ENDS_AT = offsetof(struct flow, nd_target),
> >>  };
> >>  BUILD_ASSERT_DECL(FLOW_SEGMENT_1_ENDS_AT % sizeof(uint64_t) == 0);
> >>  BUILD_ASSERT_DECL(FLOW_SEGMENT_2_ENDS_AT % sizeof(uint64_t) == 0);
> >
> > IIUC, we are moving the L4 boundary 56B earlier in struct flow. This
> > means L3/L4 segment boundary is still at a U64 boundary. I know we
> > have BUILD_ASSERTs checking this, but I just thought to double check
> > myself.
> 
> Yes - it's 448 bits or 7 u64 words.  This is a bit lucky for us -
> otherwise we might have had to get more creative :)
> 
> >> diff --git a/tests/classifier.at b/tests/classifier.at
> >> index cdcd72c156..a0a8fe0359 100644
> >> --- a/tests/classifier.at
> >> +++ b/tests/classifier.at
> >> @@ -129,6 +129,33 @@ Datapath actions: 3
> >>  OVS_VSWITCHD_STOP(["/'prefixes' with incompatible field: ipv6_label/d"])
> >>  AT_CLEANUP
> >>
> >> +AT_SETUP([flow classifier - ipv6 ND dependancy])
> >> +OVS_VSWITCHD_START
> >> +add_of_ports br0 1 2
> >> +AT_DATA([flows.txt], [dnl
> >> + table=0,priority=100,ipv6,ipv6_src=1000::/10 actions=resubmit(,1)
> >> + table=0,priority=0 actions=NORMAL
> >> + table=1,priority=110,ipv6,ipv6_dst=1000::3 actions=resubmit(,2)
> >> + table=1,priority=100,ipv6,ipv6_dst=1000::4 actions=resubmit(,2)
> >> + table=1,priority=0 actions=NORMAL
> >> + table=2,priority=120,icmp6,nw_ttl=255,icmp_type=135,icmp_code=0,nd_target=1000::1 actions=NORMAL
> >> + table=2,priority=100,tcp actions=NORMAL
> >> + table=2,priority=100,icmp6 actions=NORMAL
> >> + table=2,priority=0 actions=NORMAL
> >> +])
> >> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> >> +
> >> +# test ICMPv6 echo request (which should have no nd_target field)
> >> +AT_CHECK([ovs-appctl ofproto/trace br0
> >>
> "in_port=1,eth_src=f6:d2:b0:19:5e:7b,eth_dst=d2:49:19:91:78:fe,dl_type=0x86dd,ipv6_src=1000::3,ipv6_ds
> >> t=1000::4,nw_proto=58,icmpv6_type=128,icmpv6_code=0"], [0], [stdout])
> >> +AT_CHECK([tail -2 stdout], [0],
> >> +  [Megaflow:
> >>
> recirc_id=0,eth,icmp6,in_port=1,dl_src=f6:d2:b0:19:5e:7b,dl_dst=d2:49:19:91:78:fe,ipv6_src=1000::/10,i
> >> pv6_dst=1000::4,nw_ttl=0,nw_frag=no
> >> +Datapath actions: 100,2
> >> +])
> >> +
> >> +AT_CLEANUP
> >> +
> >> +
> >> +
> >
> > Nit: The other tests in the file have one line after the "AT_CLEANUP",
> > maybe this should be the same instead of 3 lines?
> 
> Hrrm.. not sure how that got in.  I can spin a v2 with this cleaned up.
> 

Thanks Aaron.

> >>  AT_BANNER([conjunctive match])
> >>
> >>  AT_SETUP([single conjunctive match])
> >
> > I've applied the patch to the latest master, run the unit test and it passes for me.
> > I also ran the test after reverting just the C code changes in this
> > patch and correctly fails as expected. The Megaflow in the test has a
> > ",nd_target=::" at the end, which the test correctly picks up as a
> > failure. The test is correctly doing its job!
> 
> :)
> 
> > I have looked at the performance tests we run and none of them are
> > affected by moving tcp_flags from the l3 subtable match to l4.
> 
> That's good to know - it's an area I'm concerned about.
> 
> > Out of curiosity, I also ran the classifier benchmark tests (ovstest
> > test-classifier benchmark) and didn't see a performance difference
> > before and after the patch. Having a look at tests/test-classifier.c,
> > I can see that the tcp_flags field isn't tested, so it makes sense
> > there's no difference in performance there.
> >
> > Regardless of the questions and comments above, I'm happy to Ack this patch in its current state:
> > Acked-by: Cian Ferriter <cian.ferriter@intel.com>
> 
> Thanks!
> 
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff mbox series

Patch

diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h
index 3054015d93..df10cf579e 100644
--- a/include/openvswitch/flow.h
+++ b/include/openvswitch/flow.h
@@ -141,15 +141,14 @@  struct flow {
     uint8_t nw_tos;             /* IP ToS (including DSCP and ECN). */
     uint8_t nw_ttl;             /* IP TTL/Hop Limit. */
     uint8_t nw_proto;           /* IP protocol or low 8 bits of ARP opcode. */
+    /* L4 (64-bit aligned) */
     struct in6_addr nd_target;  /* IPv6 neighbor discovery (ND) target. */
     struct eth_addr arp_sha;    /* ARP/ND source hardware address. */
     struct eth_addr arp_tha;    /* ARP/ND target hardware address. */
-    ovs_be16 tcp_flags;         /* TCP flags/ICMPv6 ND options type.
-                                 * With L3 to avoid matching L4. */
+    ovs_be16 tcp_flags;         /* TCP flags/ICMPv6 ND options type. */
     ovs_be16 pad2;              /* Pad to 64 bits. */
     struct ovs_key_nsh nsh;     /* Network Service Header keys */
 
-    /* L4 (64-bit aligned) */
     ovs_be16 tp_src;            /* TCP/UDP/SCTP source port/ICMP type. */
     ovs_be16 tp_dst;            /* TCP/UDP/SCTP destination port/ICMP code. */
     ovs_be16 ct_tp_src;         /* CT original tuple source port/ICMP type. */
@@ -179,7 +178,7 @@  BUILD_ASSERT_DECL(offsetof(struct flow, igmp_group_ip4) + sizeof(uint32_t)
 enum {
     FLOW_SEGMENT_1_ENDS_AT = offsetof(struct flow, dl_dst),
     FLOW_SEGMENT_2_ENDS_AT = offsetof(struct flow, nw_src),
-    FLOW_SEGMENT_3_ENDS_AT = offsetof(struct flow, tp_src),
+    FLOW_SEGMENT_3_ENDS_AT = offsetof(struct flow, nd_target),
 };
 BUILD_ASSERT_DECL(FLOW_SEGMENT_1_ENDS_AT % sizeof(uint64_t) == 0);
 BUILD_ASSERT_DECL(FLOW_SEGMENT_2_ENDS_AT % sizeof(uint64_t) == 0);
diff --git a/tests/classifier.at b/tests/classifier.at
index cdcd72c156..a0a8fe0359 100644
--- a/tests/classifier.at
+++ b/tests/classifier.at
@@ -129,6 +129,33 @@  Datapath actions: 3
 OVS_VSWITCHD_STOP(["/'prefixes' with incompatible field: ipv6_label/d"])
 AT_CLEANUP
 
+AT_SETUP([flow classifier - ipv6 ND dependancy])
+OVS_VSWITCHD_START
+add_of_ports br0 1 2
+AT_DATA([flows.txt], [dnl
+ table=0,priority=100,ipv6,ipv6_src=1000::/10 actions=resubmit(,1)
+ table=0,priority=0 actions=NORMAL
+ table=1,priority=110,ipv6,ipv6_dst=1000::3 actions=resubmit(,2)
+ table=1,priority=100,ipv6,ipv6_dst=1000::4 actions=resubmit(,2)
+ table=1,priority=0 actions=NORMAL
+ table=2,priority=120,icmp6,nw_ttl=255,icmp_type=135,icmp_code=0,nd_target=1000::1 actions=NORMAL
+ table=2,priority=100,tcp actions=NORMAL
+ table=2,priority=100,icmp6 actions=NORMAL
+ table=2,priority=0 actions=NORMAL
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+# test ICMPv6 echo request (which should have no nd_target field)
+AT_CHECK([ovs-appctl ofproto/trace br0 "in_port=1,eth_src=f6:d2:b0:19:5e:7b,eth_dst=d2:49:19:91:78:fe,dl_type=0x86dd,ipv6_src=1000::3,ipv6_dst=1000::4,nw_proto=58,icmpv6_type=128,icmpv6_code=0"], [0], [stdout])
+AT_CHECK([tail -2 stdout], [0],
+  [Megaflow: recirc_id=0,eth,icmp6,in_port=1,dl_src=f6:d2:b0:19:5e:7b,dl_dst=d2:49:19:91:78:fe,ipv6_src=1000::/10,ipv6_dst=1000::4,nw_ttl=0,nw_frag=no
+Datapath actions: 100,2
+])
+
+AT_CLEANUP
+
+
+
 AT_BANNER([conjunctive match])
 
 AT_SETUP([single conjunctive match])