Message ID | 20220511145241.909506-1-aconole@redhat.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev,RFC] classifier: change segment 3 boundary to defer dependant processing | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/intel-ovs-compilation | success | test: success |
Hi Aaron! This will move the following fields from the L4 segment to the L3 segment: struct in6_addr nd_target; /* IPv6 neighbor discovery (ND) target. */ struct eth_addr arp_sha; /* ARP/ND source hardware address. */ struct eth_addr arp_tha; /* ARP/ND target hardware address. */ ovs_be16 tcp_flags; /* TCP flags/ICMPv6 ND options type. * With L3 to avoid matching L4. */ ovs_be16 pad2; /* Pad to 64 bits. */ struct ovs_key_nsh nsh; /* Network Service Header keys */ The one that stands out to me is tcp_flags. I remember that we had a reason that we wanted to include this in the L3 segment. I think it was because we might want to match on flags without matching on anything else in L4, but I don't remember the exact details. There is a comment to that effect above. I suggest including the list of fields being moved in the commit message (i.e. the code excerpt above) so that it's clearer what's changing. Some more below: On Wed, May 11, 2022 at 10:52:41AM -0400, Aaron Conole wrote: > During the sweep stage, revalidator will compare the dumped WC with a > generated WC - these will mismatch because the generated WC will include > such dependent ND fields. We will then invalidate the flow and as a side > effect force an upcall. I find "include" (or "exclude") a little ambiguous here. I think that saying that the generated or dumped WC "matches" or "does not match" on a particular field would be clearer. Some more like this below: > By redefining the boundary, we shift these fields to the l4 subtable, and > cause masks to be generated by including the requisite fields. Two > other approaches were considered - moving the nd_target field alone > (which leaves arp_sha/nd_sll, arp_tha/nd_tll, and tcp_flags still as > issues so this was avoided) and collapsing the l3/l4 segments into one > (which would result in flow explosion in real scenarios, so it was also > discarded as a solution). > > Fixes: 476f36e83bc5 ("Classifier: Staged subtable matching.") > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2081773 > Reported-by: Numan Siddique <nusiddiq@redhat.com> > Suggested-by: Ilya Maximets <i.maximets@ovn.org> > Signed-off-by: Aaron Conole <aconole@redhat.com> Thanks for adding a test!
Ben Pfaff <blp@ovn.org> writes: > Hi Aaron! This will move the following fields from the L4 segment to > the L3 segment: > > struct in6_addr nd_target; /* IPv6 neighbor discovery (ND) target. */ > struct eth_addr arp_sha; /* ARP/ND source hardware address. */ > struct eth_addr arp_tha; /* ARP/ND target hardware address. */ > ovs_be16 tcp_flags; /* TCP flags/ICMPv6 ND options type. > * With L3 to avoid matching L4. */ > ovs_be16 pad2; /* Pad to 64 bits. */ > struct ovs_key_nsh nsh; /* Network Service Header keys */ > > > The one that stands out to me is tcp_flags. I remember that we had a > reason that we wanted to include this in the L3 segment. I think it was > because we might want to match on flags without matching on anything > else in L4, but I don't remember the exact details. There is a comment > to that effect above. I guess the reason that would be done would be to implement a stateless firewall using the openflow rules. I haven't seen many flow pipelines recently that use tcp_flags, but I can see it might help those kinds of scenarios where we want to allow/drop packets based on some kinds of flags (or drop xmas packets before they get processed). Do you think it will be a problem? For now, I assume most folks are using the ct() action and that should be based on ct_state field (which isn't impacted). Unfortunately, tcp_flags got overloaded with nd_target, and making any kind of change there will be an ABI breakage when we backport (which was a consideration for moving the nd_target field, as well). In the PATCH form, I plan to expand on that a little bit. Maybe a future series could split this field out. > I suggest including the list of fields being moved in the commit message > (i.e. the code excerpt above) so that it's clearer what's changing. ACK > Some more below: > > On Wed, May 11, 2022 at 10:52:41AM -0400, Aaron Conole wrote: >> During the sweep stage, revalidator will compare the dumped WC with a >> generated WC - these will mismatch because the generated WC will include >> such dependent ND fields. We will then invalidate the flow and as a side >> effect force an upcall. > > I find "include" (or "exclude") a little ambiguous here. I think that > saying that the generated or dumped WC "matches" or "does not match" on > a particular field would be clearer. Some more like this below: Makes sense to me. >> By redefining the boundary, we shift these fields to the l4 subtable, and >> cause masks to be generated by including the requisite fields. Two >> other approaches were considered - moving the nd_target field alone >> (which leaves arp_sha/nd_sll, arp_tha/nd_tll, and tcp_flags still as >> issues so this was avoided) and collapsing the l3/l4 segments into one >> (which would result in flow explosion in real scenarios, so it was also >> discarded as a solution). >> >> Fixes: 476f36e83bc5 ("Classifier: Staged subtable matching.") >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2081773 >> Reported-by: Numan Siddique <nusiddiq@redhat.com> >> Suggested-by: Ilya Maximets <i.maximets@ovn.org> >> Signed-off-by: Aaron Conole <aconole@redhat.com> > > Thanks for adding a test! :)
On Thu, May 12, 2022 at 8:55 AM Aaron Conole <aconole@redhat.com> wrote: > > Ben Pfaff <blp@ovn.org> writes: > > > Hi Aaron! This will move the following fields from the L4 segment to > > the L3 segment: > > > > struct in6_addr nd_target; /* IPv6 neighbor discovery (ND) target. */ > > struct eth_addr arp_sha; /* ARP/ND source hardware address. */ > > struct eth_addr arp_tha; /* ARP/ND target hardware address. */ > > ovs_be16 tcp_flags; /* TCP flags/ICMPv6 ND options type. > > * With L3 to avoid matching L4. */ > > ovs_be16 pad2; /* Pad to 64 bits. */ > > struct ovs_key_nsh nsh; /* Network Service Header keys */ > > > > > > The one that stands out to me is tcp_flags. I remember that we had a > > reason that we wanted to include this in the L3 segment. I think it was > > because we might want to match on flags without matching on anything > > else in L4, but I don't remember the exact details. There is a comment > > to that effect above. > > I guess the reason that would be done would be to implement a stateless > firewall using the openflow rules. I haven't seen many flow pipelines > recently that use tcp_flags, but I can see it might help those kinds of > scenarios where we want to allow/drop packets based on some kinds of > flags (or drop xmas packets before they get processed). > > Do you think it will be a problem? For now, I assume most folks are > using the ct() action and that should be based on ct_state field (which > isn't impacted). I don't know whether it will be a problem. Everything like this always comes back to "how well do we know how our users use Open vSwitch?" and the answer is always "not very well" :-( Since this is an important correctness issue, I think we should fix it and, if performance problems arise in cases we're not sure about, we can address them.
diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h index 3054015d93..a966dde216 100644 --- a/include/openvswitch/flow.h +++ b/include/openvswitch/flow.h @@ -179,7 +179,7 @@ BUILD_ASSERT_DECL(offsetof(struct flow, igmp_group_ip4) + sizeof(uint32_t) enum { FLOW_SEGMENT_1_ENDS_AT = offsetof(struct flow, dl_dst), FLOW_SEGMENT_2_ENDS_AT = offsetof(struct flow, nw_src), - FLOW_SEGMENT_3_ENDS_AT = offsetof(struct flow, tp_src), + FLOW_SEGMENT_3_ENDS_AT = offsetof(struct flow, nd_target), }; BUILD_ASSERT_DECL(FLOW_SEGMENT_1_ENDS_AT % sizeof(uint64_t) == 0); BUILD_ASSERT_DECL(FLOW_SEGMENT_2_ENDS_AT % sizeof(uint64_t) == 0); diff --git a/tests/classifier.at b/tests/classifier.at index cdcd72c156..a0a8fe0359 100644 --- a/tests/classifier.at +++ b/tests/classifier.at @@ -129,6 +129,33 @@ Datapath actions: 3 OVS_VSWITCHD_STOP(["/'prefixes' with incompatible field: ipv6_label/d"]) AT_CLEANUP +AT_SETUP([flow classifier - ipv6 ND dependancy]) +OVS_VSWITCHD_START +add_of_ports br0 1 2 +AT_DATA([flows.txt], [dnl + table=0,priority=100,ipv6,ipv6_src=1000::/10 actions=resubmit(,1) + table=0,priority=0 actions=NORMAL + table=1,priority=110,ipv6,ipv6_dst=1000::3 actions=resubmit(,2) + table=1,priority=100,ipv6,ipv6_dst=1000::4 actions=resubmit(,2) + table=1,priority=0 actions=NORMAL + table=2,priority=120,icmp6,nw_ttl=255,icmp_type=135,icmp_code=0,nd_target=1000::1 actions=NORMAL + table=2,priority=100,tcp actions=NORMAL + table=2,priority=100,icmp6 actions=NORMAL + table=2,priority=0 actions=NORMAL +]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +# test ICMPv6 echo request (which should have no nd_target field) +AT_CHECK([ovs-appctl ofproto/trace br0 "in_port=1,eth_src=f6:d2:b0:19:5e:7b,eth_dst=d2:49:19:91:78:fe,dl_type=0x86dd,ipv6_src=1000::3,ipv6_dst=1000::4,nw_proto=58,icmpv6_type=128,icmpv6_code=0"], [0], [stdout]) +AT_CHECK([tail -2 stdout], [0], + [Megaflow: recirc_id=0,eth,icmp6,in_port=1,dl_src=f6:d2:b0:19:5e:7b,dl_dst=d2:49:19:91:78:fe,ipv6_src=1000::/10,ipv6_dst=1000::4,nw_ttl=0,nw_frag=no +Datapath actions: 100,2 +]) + +AT_CLEANUP + + + AT_BANNER([conjunctive match]) AT_SETUP([single conjunctive match])
During flow processing, the flow wildcards are checked as a series of stages, and these stages are intended to carry dependencies in a single direction. When the neighbor discovery processing, for example, was executed there is an incorrect dependancy chain - we need fields from stage 4 to determine whether we need fields from stage 3. We can build a set of flow rules to demonstrate this: table=0,priority=100,ipv6,ipv6_src=1000::/10 actions=resubmit(,1) table=0,priority=0 actions=NORMAL table=1,priority=110,ipv6,ipv6_dst=1000::3 actions=resubmit(,2) table=1,priority=100,ipv6,ipv6_dst=1000::4 actions=resubmit(,2) table=1,priority=0 actions=NORMAL table=2,priority=120,icmp6,nw_ttl=255,icmp_type=135,icmp_code=0,nd_sll=10:de:ad:be:ef:10 actions=NORMAL table=2,priority=100,tcp actions=NORMAL table=2,priority=100,icmp6 actions=NORMAL table=2,priority=0 actions=NORMAL With this set of flows, any IPv6 packet that executes through this pipeline will have the corresponding nd_sll field flagged as required for classification even if that field doesn't make sense in such a context (for example, TCP packets). When the corresponding flow is installed into the kernel datapath, this field is not reflected when the revalidator executes the dump stage (see net/openvswitch/flow_netlink.c for more details). During the sweep stage, revalidator will compare the dumped WC with a generated WC - these will mismatch because the generated WC will include such dependent ND fields. We will then invalidate the flow and as a side effect force an upcall. By redefining the boundary, we shift these fields to the l4 subtable, and cause masks to be generated by including the requisite fields. Two other approaches were considered - moving the nd_target field alone (which leaves arp_sha/nd_sll, arp_tha/nd_tll, and tcp_flags still as issues so this was avoided) and collapsing the l3/l4 segments into one (which would result in flow explosion in real scenarios, so it was also discarded as a solution). Fixes: 476f36e83bc5 ("Classifier: Staged subtable matching.") Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2081773 Reported-by: Numan Siddique <nusiddiq@redhat.com> Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Aaron Conole <aconole@redhat.com> --- include/openvswitch/flow.h | 2 +- tests/classifier.at | 27 +++++++++++++++++++++++++++ 2 files changed, 28 insertions(+), 1 deletion(-)