Message ID | 82ef47a2986fad41fb5f33751c22b0b27d96b722.1723716173.git.echaudro@redhat.com |
---|---|
State | Changes Requested |
Delegated to: | Ilya Maximets |
Headers | show |
Series | [ovs-dev,v3] netdev-tc-offloads: Don't offload header modification on ip fragments. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
On 8/15/24 12:02, Eelco Chaudron wrote: > While offloading header modifications to TC, OVS is using {TCA_PEDIT} + > {TCA_CSUM} combination as that it the only way to represent header > rewrite. However, {TCA_CSUM} is unable to calculate L4 checksums for > IP fragments. > > Since TC already applies fragmentation bit masking, this patch simply > needs to prevent these packets from being processed through TC. > > Reported-at: https://issues.redhat.com/browse/FDP-545 > Signed-off-by: Eelco Chaudron <echaudro@redhat.com> > --- > v3: - Fixed some comment style issue. > - Add the nw_frag mask if not set. > v2: - Fixed and added some comments. > - Use ovs-pcap to compare packets. > --- > lib/netdev-offload-tc.c | 35 ++++++++++++++++ > lib/tc.c | 5 ++- > tests/system-traffic.at | 93 +++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 132 insertions(+), 1 deletion(-) Hi, Eelco. Sorry for the long delay. See some comments below. Best regards, Ilya Maximets. > > diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c > index 3be1c08d2..9395f784a 100644 > --- a/lib/netdev-offload-tc.c > +++ b/lib/netdev-offload-tc.c > @@ -1490,6 +1490,31 @@ parse_put_flow_ct_action(struct tc_flower *flower, > return 0; > } > > +/* This function returns true if the tc layer will add a l4 checksum action > + * for this set action. Refer to the csum_update_flag() function for > + * detailed logic. Note that even the kernel only supports updating TCP, > + * UDP and ICMPv6. > + */ > +static bool > +will_tc_add_l4_checksum(struct tc_flower *flower, int type) I'd change s/will_tc/tc_will/. Just reads a little better, IMO. if "it is a fragment and tc will add l4 checksum"; then if "it is a fragment and will tc add l4 checksum"; then > +{ > + switch (type) { > + case OVS_KEY_ATTR_IPV4: > + case OVS_KEY_ATTR_IPV6: > + case OVS_KEY_ATTR_TCP: > + case OVS_KEY_ATTR_UDP: > + switch (flower->key.ip_proto) { > + case IPPROTO_TCP: > + case IPPROTO_UDP: > + case IPPROTO_ICMPV6: > + case IPPROTO_UDPLITE: > + return true; > + } > + break; > + } > + return false; > +} > + > static int > parse_put_flow_set_masked_action(struct tc_flower *flower, > struct tc_action *action, > @@ -1522,6 +1547,14 @@ parse_put_flow_set_masked_action(struct tc_flower *flower, > return EOPNOTSUPP; > } > > + if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT > + && will_tc_add_l4_checksum(flower, type)) { > + VLOG_DBG_RL(&rl, "set action type %d not supported on fragments " > + "due to checksum limitation", type); > + ofpbuf_uninit(&set_buf); > + return EOPNOTSUPP; > + } > + > for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) { > struct netlink_field *f = &set_flower_map[type][i]; > > @@ -2447,6 +2480,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match, > } > > mask->nw_frag = 0; > + } else { > + mask->nw_frag = FLOW_NW_FRAG_MASK; This doesn't do much. We need the datapath to have a match on IS_FRAGMENT bit instead. Updating mask will make offload fail, since it's not consumed. What we want is: if the packet IS NOT fragmented AND tc_will_add_l4_checksum: Add explicit match on packet not being fragmented if the packet IS fragmented AND tc_will_add_l4_checksum: Do not offload What we want to avoid is if we add a flow that changes L34 fields and doesn't match on fragmentation bit, fragments may hit it and have incorrect checkums. So, every flower rule that modifies L34 should have IS_FRAGMENT set in the mask. > } > > if (key->nw_proto == IPPROTO_TCP) { > diff --git a/lib/tc.c b/lib/tc.c > index e55ba3b1b..22960a13e 100644 > --- a/lib/tc.c > +++ b/lib/tc.c > @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower, > * eth(dst=<mac>),eth_type(0x0800) actions=set(ipv4(src=<new_ip>)) > * we need to force a more specific flow as this can, for example, > * need a recalculation of icmp checksum if the packet that passes > - * is ICMPv6 and tcp checksum if its tcp. */ > + * is ICMPv6 and tcp checksum if its tcp. > + * > + * Ensure that you keep the pre-check function in netdev-offload-tc.c, > + * will_tc_add_l4_checksum(), in sync if you make any changes here. */ Comments should not talk to the reader, generally, they should describe things instead. So, we shouldn't use words like 'you'. E.g. "This section of the code must be kept in sync with ..." > > switch (htype) { > case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4: > diff --git a/tests/system-traffic.at b/tests/system-traffic.at > index 202ff0492..add0ab3a2 100644 > --- a/tests/system-traffic.at > +++ b/tests/system-traffic.at > @@ -2436,6 +2436,99 @@ recirc_id(<recirc>),in_port(2),eth_type(0x86dd),ipv6(proto=58,frag=no),icmpv6(ty > OVS_TRAFFIC_VSWITCHD_STOP > AT_CLEANUP > > +AT_SETUP([datapath - mod_nw_src/set_field on IP fragments]) > +AT_SKIP_IF([test $HAVE_TCPDUMP = no]) > +OVS_TRAFFIC_VSWITCHD_START() > + > +ADD_NAMESPACES(at_ns0, at_ns1) > + > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03) > +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02) > + > +AT_DATA([flows.txt], [dnl > + in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1 > + in_port=ovs-p0,ipv6,ipv6_src=fc00::1 actions=set_field:fc00::100->ipv6_src,ovs-p1 > +]) > + > +AT_CHECK([ovs-ofctl del-flows br0]) > +AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt]) > + > +NETNS_DAEMONIZE([at_ns1], > + [tcpdump -l -nn -xx -U -i p1 -w p1.pcap 2> tcpdump.err], > + [tcpdump.pid]) > +OVS_WAIT_UNTIL([grep "listening" tcpdump.err]) > + > +dnl IPv4 Packet content: > +dnl Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02 > +dnl Type: IPv4 (0x0800) > +dnl Internet Protocol Version 4, Src: 10.1.1.1, Dst: 10.1.1.2 > +dnl 0100 .... = Version: 4 > +dnl .... 0101 = Header Length: 20 bytes (5) > +dnl Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) > +dnl Total Length: 38 > +dnl Identification: 0x0001 (1) > +dnl 001. .... = Flags: 0x1, More fragments > +dnl 0... .... = Reserved bit: Not set > +dnl .0.. .... = Don't fragment: Not set > +dnl ..1. .... = More fragments: Set > +dnl ...0 0000 0000 0000 = Fragment Offset: 0 > +dnl Time to Live: 64 > +dnl Protocol: UDP (17) > +dnl Header Checksum: 0x44c2 > +dnl Data (18 bytes) > +eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 08 00" > +ip="45 00 00 26 00 01 20 00 40 11 44 c2 0a 01 01 01 0a 01 01 02" > +data="0b c4 08 84 00 26 e9 64 01 02 03 04 05 06 07 08 09 0a" > +packet="${eth} ${ip} ${data}" > + > +dnl We send each packet multiple times, ones for learning which will flow > +dnl through the used datapath for learning, and the others will go through the > +dnl actual datapath. I'm not sure I understand what "ones for learning which will flow through the used datapath for learning" means. May need some re-wording. :) > +for i in 1 2 3 4 5; do > + NS_CHECK_EXEC([at_ns0], > + [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null]) > +done > + > +dnl Update source address, and checksums in original packet for comparison. A comma may not be needed. > +packet=$(echo "$packet" | sed -e 's/ //g' \ > + -e 's/0a010101/0b010101/g' -e 's/44c2/43c2/g' -e 's/e964/e864/g') > +OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5]) > + > +dnl Repeat similar test with IPv6. > +dnl Packet content: > +dnl Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02 > +dnl Type: IPv6 (0x86dd) > +dnl Internet Protocol Version 6, Src: fc00::1, Dst: fc00::2 > +dnl Payload Length: 24 > +dnl Next Header: Fragment Header for IPv6 (44) > +dnl Hop Limit: 64 > +dnl Fragment Header for IPv6 > +dnl Next header: UDP (17) > +dnl Reserved octet: 0x00 > +dnl 0000 0000 0000 0... = Offset: 0 (0 bytes) > +dnl .... .... .... .00. = Reserved bits: 0 > +dnl .... .... .... ...1 = More Fragments: Yes > +dnl Identification: 0x2316ab36 > +dnl Data (16 bytes) > +eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 86 dd" > +ip="60 00 00 00 00 18 2c 40 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 \ > + fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 11 00 00 01 23 16 ab 36" > +data="0b c4 08 84 00 26 07 65 01 02 03 04 05 06 07 08" > +packet="${eth} ${ip} ${data}" > + > +for i in 1 2 3 4 5; do > + NS_CHECK_EXEC([at_ns0], > + [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null]) > +done > + > +dnl Update checksum and source address in original packet for comparison. > +packet=$(echo "$packet" | sed -e 's/ //g' -e 's/0765/0666/g' -e \ > + 's/fc000000000000000000000000000001/fc000000000000000000000000000100/g') > +OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5]) > + > +OVS_TRAFFIC_VSWITCHD_STOP > +AT_CLEANUP > + > AT_BANNER([MPLS]) > > AT_SETUP([mpls - encap header dp-support])
On 6 Dec 2024, at 23:25, Ilya Maximets wrote: > On 8/15/24 12:02, Eelco Chaudron wrote: >> While offloading header modifications to TC, OVS is using {TCA_PEDIT} + >> {TCA_CSUM} combination as that it the only way to represent header >> rewrite. However, {TCA_CSUM} is unable to calculate L4 checksums for >> IP fragments. >> >> Since TC already applies fragmentation bit masking, this patch simply >> needs to prevent these packets from being processed through TC. >> >> Reported-at: https://issues.redhat.com/browse/FDP-545 >> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> >> --- >> v3: - Fixed some comment style issue. >> - Add the nw_frag mask if not set. >> v2: - Fixed and added some comments. >> - Use ovs-pcap to compare packets. >> --- >> lib/netdev-offload-tc.c | 35 ++++++++++++++++ >> lib/tc.c | 5 ++- >> tests/system-traffic.at | 93 +++++++++++++++++++++++++++++++++++++++++ >> 3 files changed, 132 insertions(+), 1 deletion(-) > > Hi, Eelco. Sorry for the long delay. See some comments below. Thanks for the review. I’ve addressed your comments in the v4 which I just sent out. Cheers, Eelco > Best regards, Ilya Maximets. > >> >> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c >> index 3be1c08d2..9395f784a 100644 >> --- a/lib/netdev-offload-tc.c >> +++ b/lib/netdev-offload-tc.c >> @@ -1490,6 +1490,31 @@ parse_put_flow_ct_action(struct tc_flower *flower, >> return 0; >> } >> >> +/* This function returns true if the tc layer will add a l4 checksum action >> + * for this set action. Refer to the csum_update_flag() function for >> + * detailed logic. Note that even the kernel only supports updating TCP, >> + * UDP and ICMPv6. >> + */ >> +static bool >> +will_tc_add_l4_checksum(struct tc_flower *flower, int type) > > I'd change s/will_tc/tc_will/. Just reads a little better, IMO. > > if "it is a fragment and tc will add l4 checksum"; then > if "it is a fragment and will tc add l4 checksum"; then > > >> +{ >> + switch (type) { >> + case OVS_KEY_ATTR_IPV4: >> + case OVS_KEY_ATTR_IPV6: >> + case OVS_KEY_ATTR_TCP: >> + case OVS_KEY_ATTR_UDP: >> + switch (flower->key.ip_proto) { >> + case IPPROTO_TCP: >> + case IPPROTO_UDP: >> + case IPPROTO_ICMPV6: >> + case IPPROTO_UDPLITE: >> + return true; >> + } >> + break; >> + } >> + return false; >> +} >> + >> static int >> parse_put_flow_set_masked_action(struct tc_flower *flower, >> struct tc_action *action, >> @@ -1522,6 +1547,14 @@ parse_put_flow_set_masked_action(struct tc_flower *flower, >> return EOPNOTSUPP; >> } >> >> + if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT >> + && will_tc_add_l4_checksum(flower, type)) { >> + VLOG_DBG_RL(&rl, "set action type %d not supported on fragments " >> + "due to checksum limitation", type); >> + ofpbuf_uninit(&set_buf); >> + return EOPNOTSUPP; >> + } >> + >> for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) { >> struct netlink_field *f = &set_flower_map[type][i]; >> >> @@ -2447,6 +2480,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match, >> } >> >> mask->nw_frag = 0; >> + } else { >> + mask->nw_frag = FLOW_NW_FRAG_MASK; > > This doesn't do much. We need the datapath to have a match on IS_FRAGMENT > bit instead. Updating mask will make offload fail, since it's not consumed. > > What we want is: > > if the packet IS NOT fragmented AND tc_will_add_l4_checksum: > Add explicit match on packet not being fragmented > > if the packet IS fragmented AND tc_will_add_l4_checksum: > Do not offload > > What we want to avoid is if we add a flow that changes L34 fields and doesn't > match on fragmentation bit, fragments may hit it and have incorrect checkums. > So, every flower rule that modifies L34 should have IS_FRAGMENT set in the mask. > >> } >> >> if (key->nw_proto == IPPROTO_TCP) { >> diff --git a/lib/tc.c b/lib/tc.c >> index e55ba3b1b..22960a13e 100644 >> --- a/lib/tc.c >> +++ b/lib/tc.c >> @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower, >> * eth(dst=<mac>),eth_type(0x0800) actions=set(ipv4(src=<new_ip>)) >> * we need to force a more specific flow as this can, for example, >> * need a recalculation of icmp checksum if the packet that passes >> - * is ICMPv6 and tcp checksum if its tcp. */ >> + * is ICMPv6 and tcp checksum if its tcp. >> + * >> + * Ensure that you keep the pre-check function in netdev-offload-tc.c, >> + * will_tc_add_l4_checksum(), in sync if you make any changes here. */ > > Comments should not talk to the reader, generally, they should describe > things instead. So, we shouldn't use words like 'you'. > > E.g. "This section of the code must be kept in sync with ..." > >> >> switch (htype) { >> case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4: >> diff --git a/tests/system-traffic.at b/tests/system-traffic.at >> index 202ff0492..add0ab3a2 100644 >> --- a/tests/system-traffic.at >> +++ b/tests/system-traffic.at >> @@ -2436,6 +2436,99 @@ recirc_id(<recirc>),in_port(2),eth_type(0x86dd),ipv6(proto=58,frag=no),icmpv6(ty >> OVS_TRAFFIC_VSWITCHD_STOP >> AT_CLEANUP >> >> +AT_SETUP([datapath - mod_nw_src/set_field on IP fragments]) >> +AT_SKIP_IF([test $HAVE_TCPDUMP = no]) >> +OVS_TRAFFIC_VSWITCHD_START() >> + >> +ADD_NAMESPACES(at_ns0, at_ns1) >> + >> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03) >> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02) >> + >> +AT_DATA([flows.txt], [dnl >> + in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1 >> + in_port=ovs-p0,ipv6,ipv6_src=fc00::1 actions=set_field:fc00::100->ipv6_src,ovs-p1 >> +]) >> + >> +AT_CHECK([ovs-ofctl del-flows br0]) >> +AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt]) >> + >> +NETNS_DAEMONIZE([at_ns1], >> + [tcpdump -l -nn -xx -U -i p1 -w p1.pcap 2> tcpdump.err], >> + [tcpdump.pid]) >> +OVS_WAIT_UNTIL([grep "listening" tcpdump.err]) >> + >> +dnl IPv4 Packet content: >> +dnl Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02 >> +dnl Type: IPv4 (0x0800) >> +dnl Internet Protocol Version 4, Src: 10.1.1.1, Dst: 10.1.1.2 >> +dnl 0100 .... = Version: 4 >> +dnl .... 0101 = Header Length: 20 bytes (5) >> +dnl Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) >> +dnl Total Length: 38 >> +dnl Identification: 0x0001 (1) >> +dnl 001. .... = Flags: 0x1, More fragments >> +dnl 0... .... = Reserved bit: Not set >> +dnl .0.. .... = Don't fragment: Not set >> +dnl ..1. .... = More fragments: Set >> +dnl ...0 0000 0000 0000 = Fragment Offset: 0 >> +dnl Time to Live: 64 >> +dnl Protocol: UDP (17) >> +dnl Header Checksum: 0x44c2 >> +dnl Data (18 bytes) >> +eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 08 00" >> +ip="45 00 00 26 00 01 20 00 40 11 44 c2 0a 01 01 01 0a 01 01 02" >> +data="0b c4 08 84 00 26 e9 64 01 02 03 04 05 06 07 08 09 0a" >> +packet="${eth} ${ip} ${data}" >> + >> +dnl We send each packet multiple times, ones for learning which will flow >> +dnl through the used datapath for learning, and the others will go through the >> +dnl actual datapath. > > I'm not sure I understand what "ones for learning which will flow through > the used datapath for learning" means. May need some re-wording. :) > >> +for i in 1 2 3 4 5; do >> + NS_CHECK_EXEC([at_ns0], >> + [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null]) >> +done >> + >> +dnl Update source address, and checksums in original packet for comparison. > > A comma may not be needed. > >> +packet=$(echo "$packet" | sed -e 's/ //g' \ >> + -e 's/0a010101/0b010101/g' -e 's/44c2/43c2/g' -e 's/e964/e864/g') >> +OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5]) >> + >> +dnl Repeat similar test with IPv6. >> +dnl Packet content: >> +dnl Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02 >> +dnl Type: IPv6 (0x86dd) >> +dnl Internet Protocol Version 6, Src: fc00::1, Dst: fc00::2 >> +dnl Payload Length: 24 >> +dnl Next Header: Fragment Header for IPv6 (44) >> +dnl Hop Limit: 64 >> +dnl Fragment Header for IPv6 >> +dnl Next header: UDP (17) >> +dnl Reserved octet: 0x00 >> +dnl 0000 0000 0000 0... = Offset: 0 (0 bytes) >> +dnl .... .... .... .00. = Reserved bits: 0 >> +dnl .... .... .... ...1 = More Fragments: Yes >> +dnl Identification: 0x2316ab36 >> +dnl Data (16 bytes) >> +eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 86 dd" >> +ip="60 00 00 00 00 18 2c 40 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 \ >> + fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 11 00 00 01 23 16 ab 36" >> +data="0b c4 08 84 00 26 07 65 01 02 03 04 05 06 07 08" >> +packet="${eth} ${ip} ${data}" >> + >> +for i in 1 2 3 4 5; do >> + NS_CHECK_EXEC([at_ns0], >> + [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null]) >> +done >> + >> +dnl Update checksum and source address in original packet for comparison. >> +packet=$(echo "$packet" | sed -e 's/ //g' -e 's/0765/0666/g' -e \ >> + 's/fc000000000000000000000000000001/fc000000000000000000000000000100/g') >> +OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5]) >> + >> +OVS_TRAFFIC_VSWITCHD_STOP >> +AT_CLEANUP >> + >> AT_BANNER([MPLS]) >> >> AT_SETUP([mpls - encap header dp-support])
diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c index 3be1c08d2..9395f784a 100644 --- a/lib/netdev-offload-tc.c +++ b/lib/netdev-offload-tc.c @@ -1490,6 +1490,31 @@ parse_put_flow_ct_action(struct tc_flower *flower, return 0; } +/* This function returns true if the tc layer will add a l4 checksum action + * for this set action. Refer to the csum_update_flag() function for + * detailed logic. Note that even the kernel only supports updating TCP, + * UDP and ICMPv6. + */ +static bool +will_tc_add_l4_checksum(struct tc_flower *flower, int type) +{ + switch (type) { + case OVS_KEY_ATTR_IPV4: + case OVS_KEY_ATTR_IPV6: + case OVS_KEY_ATTR_TCP: + case OVS_KEY_ATTR_UDP: + switch (flower->key.ip_proto) { + case IPPROTO_TCP: + case IPPROTO_UDP: + case IPPROTO_ICMPV6: + case IPPROTO_UDPLITE: + return true; + } + break; + } + return false; +} + static int parse_put_flow_set_masked_action(struct tc_flower *flower, struct tc_action *action, @@ -1522,6 +1547,14 @@ parse_put_flow_set_masked_action(struct tc_flower *flower, return EOPNOTSUPP; } + if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT + && will_tc_add_l4_checksum(flower, type)) { + VLOG_DBG_RL(&rl, "set action type %d not supported on fragments " + "due to checksum limitation", type); + ofpbuf_uninit(&set_buf); + return EOPNOTSUPP; + } + for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) { struct netlink_field *f = &set_flower_map[type][i]; @@ -2447,6 +2480,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match, } mask->nw_frag = 0; + } else { + mask->nw_frag = FLOW_NW_FRAG_MASK; } if (key->nw_proto == IPPROTO_TCP) { diff --git a/lib/tc.c b/lib/tc.c index e55ba3b1b..22960a13e 100644 --- a/lib/tc.c +++ b/lib/tc.c @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower, * eth(dst=<mac>),eth_type(0x0800) actions=set(ipv4(src=<new_ip>)) * we need to force a more specific flow as this can, for example, * need a recalculation of icmp checksum if the packet that passes - * is ICMPv6 and tcp checksum if its tcp. */ + * is ICMPv6 and tcp checksum if its tcp. + * + * Ensure that you keep the pre-check function in netdev-offload-tc.c, + * will_tc_add_l4_checksum(), in sync if you make any changes here. */ switch (htype) { case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4: diff --git a/tests/system-traffic.at b/tests/system-traffic.at index 202ff0492..add0ab3a2 100644 --- a/tests/system-traffic.at +++ b/tests/system-traffic.at @@ -2436,6 +2436,99 @@ recirc_id(<recirc>),in_port(2),eth_type(0x86dd),ipv6(proto=58,frag=no),icmpv6(ty OVS_TRAFFIC_VSWITCHD_STOP AT_CLEANUP +AT_SETUP([datapath - mod_nw_src/set_field on IP fragments]) +AT_SKIP_IF([test $HAVE_TCPDUMP = no]) +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03) +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02) + +AT_DATA([flows.txt], [dnl + in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1 + in_port=ovs-p0,ipv6,ipv6_src=fc00::1 actions=set_field:fc00::100->ipv6_src,ovs-p1 +]) + +AT_CHECK([ovs-ofctl del-flows br0]) +AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt]) + +NETNS_DAEMONIZE([at_ns1], + [tcpdump -l -nn -xx -U -i p1 -w p1.pcap 2> tcpdump.err], + [tcpdump.pid]) +OVS_WAIT_UNTIL([grep "listening" tcpdump.err]) + +dnl IPv4 Packet content: +dnl Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02 +dnl Type: IPv4 (0x0800) +dnl Internet Protocol Version 4, Src: 10.1.1.1, Dst: 10.1.1.2 +dnl 0100 .... = Version: 4 +dnl .... 0101 = Header Length: 20 bytes (5) +dnl Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) +dnl Total Length: 38 +dnl Identification: 0x0001 (1) +dnl 001. .... = Flags: 0x1, More fragments +dnl 0... .... = Reserved bit: Not set +dnl .0.. .... = Don't fragment: Not set +dnl ..1. .... = More fragments: Set +dnl ...0 0000 0000 0000 = Fragment Offset: 0 +dnl Time to Live: 64 +dnl Protocol: UDP (17) +dnl Header Checksum: 0x44c2 +dnl Data (18 bytes) +eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 08 00" +ip="45 00 00 26 00 01 20 00 40 11 44 c2 0a 01 01 01 0a 01 01 02" +data="0b c4 08 84 00 26 e9 64 01 02 03 04 05 06 07 08 09 0a" +packet="${eth} ${ip} ${data}" + +dnl We send each packet multiple times, ones for learning which will flow +dnl through the used datapath for learning, and the others will go through the +dnl actual datapath. +for i in 1 2 3 4 5; do + NS_CHECK_EXEC([at_ns0], + [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null]) +done + +dnl Update source address, and checksums in original packet for comparison. +packet=$(echo "$packet" | sed -e 's/ //g' \ + -e 's/0a010101/0b010101/g' -e 's/44c2/43c2/g' -e 's/e964/e864/g') +OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5]) + +dnl Repeat similar test with IPv6. +dnl Packet content: +dnl Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02 +dnl Type: IPv6 (0x86dd) +dnl Internet Protocol Version 6, Src: fc00::1, Dst: fc00::2 +dnl Payload Length: 24 +dnl Next Header: Fragment Header for IPv6 (44) +dnl Hop Limit: 64 +dnl Fragment Header for IPv6 +dnl Next header: UDP (17) +dnl Reserved octet: 0x00 +dnl 0000 0000 0000 0... = Offset: 0 (0 bytes) +dnl .... .... .... .00. = Reserved bits: 0 +dnl .... .... .... ...1 = More Fragments: Yes +dnl Identification: 0x2316ab36 +dnl Data (16 bytes) +eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 86 dd" +ip="60 00 00 00 00 18 2c 40 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 \ + fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 11 00 00 01 23 16 ab 36" +data="0b c4 08 84 00 26 07 65 01 02 03 04 05 06 07 08" +packet="${eth} ${ip} ${data}" + +for i in 1 2 3 4 5; do + NS_CHECK_EXEC([at_ns0], + [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null]) +done + +dnl Update checksum and source address in original packet for comparison. +packet=$(echo "$packet" | sed -e 's/ //g' -e 's/0765/0666/g' -e \ + 's/fc000000000000000000000000000001/fc000000000000000000000000000100/g') +OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + AT_BANNER([MPLS]) AT_SETUP([mpls - encap header dp-support])
While offloading header modifications to TC, OVS is using {TCA_PEDIT} + {TCA_CSUM} combination as that it the only way to represent header rewrite. However, {TCA_CSUM} is unable to calculate L4 checksums for IP fragments. Since TC already applies fragmentation bit masking, this patch simply needs to prevent these packets from being processed through TC. Reported-at: https://issues.redhat.com/browse/FDP-545 Signed-off-by: Eelco Chaudron <echaudro@redhat.com> --- v3: - Fixed some comment style issue. - Add the nw_frag mask if not set. v2: - Fixed and added some comments. - Use ovs-pcap to compare packets. --- lib/netdev-offload-tc.c | 35 ++++++++++++++++ lib/tc.c | 5 ++- tests/system-traffic.at | 93 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 132 insertions(+), 1 deletion(-)