diff mbox series

[ovs-dev,v3] netdev-tc-offloads: Don't offload header modification on ip fragments.

Message ID 82ef47a2986fad41fb5f33751c22b0b27d96b722.1723716173.git.echaudro@redhat.com
State Changes Requested
Delegated to: Ilya Maximets
Headers show
Series [ovs-dev,v3] netdev-tc-offloads: Don't offload header modification on ip fragments. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Eelco Chaudron Aug. 15, 2024, 10:02 a.m. UTC
While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
{TCA_CSUM} combination as that it the only way to represent header
rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
IP fragments.

Since TC already applies fragmentation bit masking, this patch simply
needs to prevent these packets from being processed through TC.

Reported-at: https://issues.redhat.com/browse/FDP-545
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
v3: - Fixed some comment style issue.
    - Add the nw_frag mask if not set.
v2: - Fixed and added some comments.
    - Use ovs-pcap to compare packets.
---
 lib/netdev-offload-tc.c | 35 ++++++++++++++++
 lib/tc.c                |  5 ++-
 tests/system-traffic.at | 93 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 132 insertions(+), 1 deletion(-)

Comments

Ilya Maximets Dec. 6, 2024, 10:25 p.m. UTC | #1
On 8/15/24 12:02, Eelco Chaudron wrote:
> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
> {TCA_CSUM} combination as that it the only way to represent header
> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
> IP fragments.
> 
> Since TC already applies fragmentation bit masking, this patch simply
> needs to prevent these packets from being processed through TC.
> 
> Reported-at: https://issues.redhat.com/browse/FDP-545
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> ---
> v3: - Fixed some comment style issue.
>     - Add the nw_frag mask if not set.
> v2: - Fixed and added some comments.
>     - Use ovs-pcap to compare packets.
> ---
>  lib/netdev-offload-tc.c | 35 ++++++++++++++++
>  lib/tc.c                |  5 ++-
>  tests/system-traffic.at | 93 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 132 insertions(+), 1 deletion(-)

Hi, Eelco.  Sorry for the long delay.  See some comments below.

Best regards, Ilya Maximets.

> 
> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> index 3be1c08d2..9395f784a 100644
> --- a/lib/netdev-offload-tc.c
> +++ b/lib/netdev-offload-tc.c
> @@ -1490,6 +1490,31 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>          return 0;
>  }
>  
> +/* This function returns true if the tc layer will add a l4 checksum action
> + * for this set action.  Refer to the csum_update_flag() function for
> + * detailed logic.  Note that even the kernel only supports updating TCP,
> + * UDP and ICMPv6.
> + */
> +static bool
> +will_tc_add_l4_checksum(struct tc_flower *flower, int type)

I'd change s/will_tc/tc_will/.  Just reads a little better, IMO.

if "it is a fragment and tc will add l4 checksum"; then
if "it is a fragment and will tc add l4 checksum"; then


> +{
> +    switch (type) {
> +    case OVS_KEY_ATTR_IPV4:
> +    case OVS_KEY_ATTR_IPV6:
> +    case OVS_KEY_ATTR_TCP:
> +    case OVS_KEY_ATTR_UDP:
> +        switch (flower->key.ip_proto) {
> +        case IPPROTO_TCP:
> +        case IPPROTO_UDP:
> +        case IPPROTO_ICMPV6:
> +        case IPPROTO_UDPLITE:
> +            return true;
> +        }
> +        break;
> +    }
> +    return false;
> +}
> +
>  static int
>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>                                   struct tc_action *action,
> @@ -1522,6 +1547,14 @@ parse_put_flow_set_masked_action(struct tc_flower *flower,
>          return EOPNOTSUPP;
>      }
>  
> +    if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
> +        && will_tc_add_l4_checksum(flower, type)) {
> +        VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
> +                    "due to checksum limitation", type);
> +        ofpbuf_uninit(&set_buf);
> +        return EOPNOTSUPP;
> +    }
> +
>      for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>          struct netlink_field *f = &set_flower_map[type][i];
>  
> @@ -2447,6 +2480,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>              }
>  
>              mask->nw_frag = 0;
> +        } else {
> +            mask->nw_frag = FLOW_NW_FRAG_MASK;

This doesn't do much.  We need the datapath to have a match on IS_FRAGMENT
bit instead.  Updating mask will make offload fail, since it's not consumed.

What we want is:

 if the packet IS NOT fragmented AND tc_will_add_l4_checksum:
     Add explicit match on packet not being fragmented

 if the packet IS fragmented AND tc_will_add_l4_checksum:
     Do not offload

What we want to avoid is if we add a flow that changes L34 fields and doesn't
match on fragmentation bit, fragments may hit it and have incorrect checkums.
So, every flower rule that modifies L34 should have IS_FRAGMENT set in the mask.

>          }
>  
>          if (key->nw_proto == IPPROTO_TCP) {
> diff --git a/lib/tc.c b/lib/tc.c
> index e55ba3b1b..22960a13e 100644
> --- a/lib/tc.c
> +++ b/lib/tc.c
> @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower,
>       * eth(dst=<mac>),eth_type(0x0800) actions=set(ipv4(src=<new_ip>))
>       * we need to force a more specific flow as this can, for example,
>       * need a recalculation of icmp checksum if the packet that passes
> -     * is ICMPv6 and tcp checksum if its tcp. */
> +     * is ICMPv6 and tcp checksum if its tcp.
> +     *
> +     * Ensure that you keep the pre-check function in netdev-offload-tc.c,
> +     * will_tc_add_l4_checksum(), in sync if you make any changes here. */

Comments should not talk to the reader, generally, they should describe
things instead.  So, we shouldn't use words like 'you'.

E.g. "This section of the code must be kept in sync with ..."

>  
>      switch (htype) {
>      case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 202ff0492..add0ab3a2 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -2436,6 +2436,99 @@ recirc_id(<recirc>),in_port(2),eth_type(0x86dd),ipv6(proto=58,frag=no),icmpv6(ty
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>  
> +AT_SETUP([datapath - mod_nw_src/set_field on IP fragments])
> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03)
> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02)
> +
> +AT_DATA([flows.txt], [dnl
> +  in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1
> +  in_port=ovs-p0,ipv6,ipv6_src=fc00::1 actions=set_field:fc00::100->ipv6_src,ovs-p1
> +])
> +
> +AT_CHECK([ovs-ofctl del-flows br0])
> +AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt])
> +
> +NETNS_DAEMONIZE([at_ns1],
> +                [tcpdump -l -nn -xx -U -i p1 -w p1.pcap 2> tcpdump.err],
> +                [tcpdump.pid])
> +OVS_WAIT_UNTIL([grep "listening" tcpdump.err])
> +
> +dnl IPv4 Packet content:
> +dnl   Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02
> +dnl       Type: IPv4 (0x0800)
> +dnl   Internet Protocol Version 4, Src: 10.1.1.1, Dst: 10.1.1.2
> +dnl       0100 .... = Version: 4
> +dnl       .... 0101 = Header Length: 20 bytes (5)
> +dnl       Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
> +dnl       Total Length: 38
> +dnl       Identification: 0x0001 (1)
> +dnl       001. .... = Flags: 0x1, More fragments
> +dnl           0... .... = Reserved bit: Not set
> +dnl           .0.. .... = Don't fragment: Not set
> +dnl           ..1. .... = More fragments: Set
> +dnl       ...0 0000 0000 0000 = Fragment Offset: 0
> +dnl       Time to Live: 64
> +dnl       Protocol: UDP (17)
> +dnl       Header Checksum: 0x44c2
> +dnl   Data (18 bytes)
> +eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 08 00"
> +ip="45 00 00 26 00 01 20 00 40 11 44 c2 0a 01 01 01 0a 01 01 02"
> +data="0b c4 08 84 00 26 e9 64 01 02 03 04 05 06 07 08 09 0a"
> +packet="${eth} ${ip} ${data}"
> +
> +dnl We send each packet multiple times, ones for learning which will flow
> +dnl through the used datapath for learning, and the others will go through the
> +dnl actual datapath.

I'm not sure I understand what "ones for learning which will flow through
the used datapath for learning" means.  May need some re-wording. :)

> +for i in 1 2 3 4 5; do
> +  NS_CHECK_EXEC([at_ns0],
> +                [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null])
> +done
> +
> +dnl Update source address, and checksums in original packet for comparison.

A comma may not be needed.

> +packet=$(echo "$packet" | sed -e 's/ //g' \
> +         -e 's/0a010101/0b010101/g' -e 's/44c2/43c2/g' -e 's/e964/e864/g')
> +OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5])
> +
> +dnl Repeat similar test with IPv6.
> +dnl Packet content:
> +dnl   Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02
> +dnl       Type: IPv6 (0x86dd)
> +dnl   Internet Protocol Version 6, Src: fc00::1, Dst: fc00::2
> +dnl       Payload Length: 24
> +dnl       Next Header: Fragment Header for IPv6 (44)
> +dnl       Hop Limit: 64
> +dnl       Fragment Header for IPv6
> +dnl           Next header: UDP (17)
> +dnl           Reserved octet: 0x00
> +dnl           0000 0000 0000 0... = Offset: 0 (0 bytes)
> +dnl           .... .... .... .00. = Reserved bits: 0
> +dnl           .... .... .... ...1 = More Fragments: Yes
> +dnl           Identification: 0x2316ab36
> +dnl   Data (16 bytes)
> +eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 86 dd"
> +ip="60 00 00 00 00 18 2c 40 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 \
> +    fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 11 00 00 01 23 16 ab 36"
> +data="0b c4 08 84 00 26 07 65 01 02 03 04 05 06 07 08"
> +packet="${eth} ${ip} ${data}"
> +
> +for i in 1 2 3 4 5; do
> +  NS_CHECK_EXEC([at_ns0],
> +                [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null])
> +done
> +
> +dnl Update checksum and source address in original packet for comparison.
> +packet=$(echo "$packet" | sed -e 's/ //g' -e 's/0765/0666/g' -e \
> +       's/fc000000000000000000000000000001/fc000000000000000000000000000100/g')
> +OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
>  AT_BANNER([MPLS])
>  
>  AT_SETUP([mpls - encap header dp-support])
Eelco Chaudron Dec. 10, 2024, 12:48 p.m. UTC | #2
On 6 Dec 2024, at 23:25, Ilya Maximets wrote:

> On 8/15/24 12:02, Eelco Chaudron wrote:
>> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
>> {TCA_CSUM} combination as that it the only way to represent header
>> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
>> IP fragments.
>>
>> Since TC already applies fragmentation bit masking, this patch simply
>> needs to prevent these packets from being processed through TC.
>>
>> Reported-at: https://issues.redhat.com/browse/FDP-545
>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>> ---
>> v3: - Fixed some comment style issue.
>>     - Add the nw_frag mask if not set.
>> v2: - Fixed and added some comments.
>>     - Use ovs-pcap to compare packets.
>> ---
>>  lib/netdev-offload-tc.c | 35 ++++++++++++++++
>>  lib/tc.c                |  5 ++-
>>  tests/system-traffic.at | 93 +++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 132 insertions(+), 1 deletion(-)
>
> Hi, Eelco.  Sorry for the long delay.  See some comments below.

Thanks for the review. I’ve addressed your comments in the v4 which I just sent out.

Cheers,

Eelco


> Best regards, Ilya Maximets.
>
>>
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index 3be1c08d2..9395f784a 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -1490,6 +1490,31 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>>          return 0;
>>  }
>>
>> +/* This function returns true if the tc layer will add a l4 checksum action
>> + * for this set action.  Refer to the csum_update_flag() function for
>> + * detailed logic.  Note that even the kernel only supports updating TCP,
>> + * UDP and ICMPv6.
>> + */
>> +static bool
>> +will_tc_add_l4_checksum(struct tc_flower *flower, int type)
>
> I'd change s/will_tc/tc_will/.  Just reads a little better, IMO.
>
> if "it is a fragment and tc will add l4 checksum"; then
> if "it is a fragment and will tc add l4 checksum"; then
>
>
>> +{
>> +    switch (type) {
>> +    case OVS_KEY_ATTR_IPV4:
>> +    case OVS_KEY_ATTR_IPV6:
>> +    case OVS_KEY_ATTR_TCP:
>> +    case OVS_KEY_ATTR_UDP:
>> +        switch (flower->key.ip_proto) {
>> +        case IPPROTO_TCP:
>> +        case IPPROTO_UDP:
>> +        case IPPROTO_ICMPV6:
>> +        case IPPROTO_UDPLITE:
>> +            return true;
>> +        }
>> +        break;
>> +    }
>> +    return false;
>> +}
>> +
>>  static int
>>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>>                                   struct tc_action *action,
>> @@ -1522,6 +1547,14 @@ parse_put_flow_set_masked_action(struct tc_flower *flower,
>>          return EOPNOTSUPP;
>>      }
>>
>> +    if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>> +        && will_tc_add_l4_checksum(flower, type)) {
>> +        VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
>> +                    "due to checksum limitation", type);
>> +        ofpbuf_uninit(&set_buf);
>> +        return EOPNOTSUPP;
>> +    }
>> +
>>      for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>>          struct netlink_field *f = &set_flower_map[type][i];
>>
>> @@ -2447,6 +2480,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>>              }
>>
>>              mask->nw_frag = 0;
>> +        } else {
>> +            mask->nw_frag = FLOW_NW_FRAG_MASK;
>
> This doesn't do much.  We need the datapath to have a match on IS_FRAGMENT
> bit instead.  Updating mask will make offload fail, since it's not consumed.
>
> What we want is:
>
>  if the packet IS NOT fragmented AND tc_will_add_l4_checksum:
>      Add explicit match on packet not being fragmented
>
>  if the packet IS fragmented AND tc_will_add_l4_checksum:
>      Do not offload
>
> What we want to avoid is if we add a flow that changes L34 fields and doesn't
> match on fragmentation bit, fragments may hit it and have incorrect checkums.
> So, every flower rule that modifies L34 should have IS_FRAGMENT set in the mask.
>
>>          }
>>
>>          if (key->nw_proto == IPPROTO_TCP) {
>> diff --git a/lib/tc.c b/lib/tc.c
>> index e55ba3b1b..22960a13e 100644
>> --- a/lib/tc.c
>> +++ b/lib/tc.c
>> @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower,
>>       * eth(dst=<mac>),eth_type(0x0800) actions=set(ipv4(src=<new_ip>))
>>       * we need to force a more specific flow as this can, for example,
>>       * need a recalculation of icmp checksum if the packet that passes
>> -     * is ICMPv6 and tcp checksum if its tcp. */
>> +     * is ICMPv6 and tcp checksum if its tcp.
>> +     *
>> +     * Ensure that you keep the pre-check function in netdev-offload-tc.c,
>> +     * will_tc_add_l4_checksum(), in sync if you make any changes here. */
>
> Comments should not talk to the reader, generally, they should describe
> things instead.  So, we shouldn't use words like 'you'.
>
> E.g. "This section of the code must be kept in sync with ..."
>
>>
>>      switch (htype) {
>>      case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index 202ff0492..add0ab3a2 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -2436,6 +2436,99 @@ recirc_id(<recirc>),in_port(2),eth_type(0x86dd),ipv6(proto=58,frag=no),icmpv6(ty
>>  OVS_TRAFFIC_VSWITCHD_STOP
>>  AT_CLEANUP
>>
>> +AT_SETUP([datapath - mod_nw_src/set_field on IP fragments])
>> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +
>> +ADD_NAMESPACES(at_ns0, at_ns1)
>> +
>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03)
>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02)
>> +
>> +AT_DATA([flows.txt], [dnl
>> +  in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1
>> +  in_port=ovs-p0,ipv6,ipv6_src=fc00::1 actions=set_field:fc00::100->ipv6_src,ovs-p1
>> +])
>> +
>> +AT_CHECK([ovs-ofctl del-flows br0])
>> +AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt])
>> +
>> +NETNS_DAEMONIZE([at_ns1],
>> +                [tcpdump -l -nn -xx -U -i p1 -w p1.pcap 2> tcpdump.err],
>> +                [tcpdump.pid])
>> +OVS_WAIT_UNTIL([grep "listening" tcpdump.err])
>> +
>> +dnl IPv4 Packet content:
>> +dnl   Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02
>> +dnl       Type: IPv4 (0x0800)
>> +dnl   Internet Protocol Version 4, Src: 10.1.1.1, Dst: 10.1.1.2
>> +dnl       0100 .... = Version: 4
>> +dnl       .... 0101 = Header Length: 20 bytes (5)
>> +dnl       Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
>> +dnl       Total Length: 38
>> +dnl       Identification: 0x0001 (1)
>> +dnl       001. .... = Flags: 0x1, More fragments
>> +dnl           0... .... = Reserved bit: Not set
>> +dnl           .0.. .... = Don't fragment: Not set
>> +dnl           ..1. .... = More fragments: Set
>> +dnl       ...0 0000 0000 0000 = Fragment Offset: 0
>> +dnl       Time to Live: 64
>> +dnl       Protocol: UDP (17)
>> +dnl       Header Checksum: 0x44c2
>> +dnl   Data (18 bytes)
>> +eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 08 00"
>> +ip="45 00 00 26 00 01 20 00 40 11 44 c2 0a 01 01 01 0a 01 01 02"
>> +data="0b c4 08 84 00 26 e9 64 01 02 03 04 05 06 07 08 09 0a"
>> +packet="${eth} ${ip} ${data}"
>> +
>> +dnl We send each packet multiple times, ones for learning which will flow
>> +dnl through the used datapath for learning, and the others will go through the
>> +dnl actual datapath.
>
> I'm not sure I understand what "ones for learning which will flow through
> the used datapath for learning" means.  May need some re-wording. :)
>
>> +for i in 1 2 3 4 5; do
>> +  NS_CHECK_EXEC([at_ns0],
>> +                [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null])
>> +done
>> +
>> +dnl Update source address, and checksums in original packet for comparison.
>
> A comma may not be needed.
>
>> +packet=$(echo "$packet" | sed -e 's/ //g' \
>> +         -e 's/0a010101/0b010101/g' -e 's/44c2/43c2/g' -e 's/e964/e864/g')
>> +OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5])
>> +
>> +dnl Repeat similar test with IPv6.
>> +dnl Packet content:
>> +dnl   Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02
>> +dnl       Type: IPv6 (0x86dd)
>> +dnl   Internet Protocol Version 6, Src: fc00::1, Dst: fc00::2
>> +dnl       Payload Length: 24
>> +dnl       Next Header: Fragment Header for IPv6 (44)
>> +dnl       Hop Limit: 64
>> +dnl       Fragment Header for IPv6
>> +dnl           Next header: UDP (17)
>> +dnl           Reserved octet: 0x00
>> +dnl           0000 0000 0000 0... = Offset: 0 (0 bytes)
>> +dnl           .... .... .... .00. = Reserved bits: 0
>> +dnl           .... .... .... ...1 = More Fragments: Yes
>> +dnl           Identification: 0x2316ab36
>> +dnl   Data (16 bytes)
>> +eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 86 dd"
>> +ip="60 00 00 00 00 18 2c 40 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 \
>> +    fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 11 00 00 01 23 16 ab 36"
>> +data="0b c4 08 84 00 26 07 65 01 02 03 04 05 06 07 08"
>> +packet="${eth} ${ip} ${data}"
>> +
>> +for i in 1 2 3 4 5; do
>> +  NS_CHECK_EXEC([at_ns0],
>> +                [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null])
>> +done
>> +
>> +dnl Update checksum and source address in original packet for comparison.
>> +packet=$(echo "$packet" | sed -e 's/ //g' -e 's/0765/0666/g' -e \
>> +       's/fc000000000000000000000000000001/fc000000000000000000000000000100/g')
>> +OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5])
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>> +
>>  AT_BANNER([MPLS])
>>
>>  AT_SETUP([mpls - encap header dp-support])
diff mbox series

Patch

diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 3be1c08d2..9395f784a 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -1490,6 +1490,31 @@  parse_put_flow_ct_action(struct tc_flower *flower,
         return 0;
 }
 
+/* This function returns true if the tc layer will add a l4 checksum action
+ * for this set action.  Refer to the csum_update_flag() function for
+ * detailed logic.  Note that even the kernel only supports updating TCP,
+ * UDP and ICMPv6.
+ */
+static bool
+will_tc_add_l4_checksum(struct tc_flower *flower, int type)
+{
+    switch (type) {
+    case OVS_KEY_ATTR_IPV4:
+    case OVS_KEY_ATTR_IPV6:
+    case OVS_KEY_ATTR_TCP:
+    case OVS_KEY_ATTR_UDP:
+        switch (flower->key.ip_proto) {
+        case IPPROTO_TCP:
+        case IPPROTO_UDP:
+        case IPPROTO_ICMPV6:
+        case IPPROTO_UDPLITE:
+            return true;
+        }
+        break;
+    }
+    return false;
+}
+
 static int
 parse_put_flow_set_masked_action(struct tc_flower *flower,
                                  struct tc_action *action,
@@ -1522,6 +1547,14 @@  parse_put_flow_set_masked_action(struct tc_flower *flower,
         return EOPNOTSUPP;
     }
 
+    if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
+        && will_tc_add_l4_checksum(flower, type)) {
+        VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
+                    "due to checksum limitation", type);
+        ofpbuf_uninit(&set_buf);
+        return EOPNOTSUPP;
+    }
+
     for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
         struct netlink_field *f = &set_flower_map[type][i];
 
@@ -2447,6 +2480,8 @@  netdev_tc_flow_put(struct netdev *netdev, struct match *match,
             }
 
             mask->nw_frag = 0;
+        } else {
+            mask->nw_frag = FLOW_NW_FRAG_MASK;
         }
 
         if (key->nw_proto == IPPROTO_TCP) {
diff --git a/lib/tc.c b/lib/tc.c
index e55ba3b1b..22960a13e 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -2958,7 +2958,10 @@  csum_update_flag(struct tc_flower *flower,
      * eth(dst=<mac>),eth_type(0x0800) actions=set(ipv4(src=<new_ip>))
      * we need to force a more specific flow as this can, for example,
      * need a recalculation of icmp checksum if the packet that passes
-     * is ICMPv6 and tcp checksum if its tcp. */
+     * is ICMPv6 and tcp checksum if its tcp.
+     *
+     * Ensure that you keep the pre-check function in netdev-offload-tc.c,
+     * will_tc_add_l4_checksum(), in sync if you make any changes here. */
 
     switch (htype) {
     case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 202ff0492..add0ab3a2 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -2436,6 +2436,99 @@  recirc_id(<recirc>),in_port(2),eth_type(0x86dd),ipv6(proto=58,frag=no),icmpv6(ty
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([datapath - mod_nw_src/set_field on IP fragments])
+AT_SKIP_IF([test $HAVE_TCPDUMP = no])
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03)
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02)
+
+AT_DATA([flows.txt], [dnl
+  in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1
+  in_port=ovs-p0,ipv6,ipv6_src=fc00::1 actions=set_field:fc00::100->ipv6_src,ovs-p1
+])
+
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt])
+
+NETNS_DAEMONIZE([at_ns1],
+                [tcpdump -l -nn -xx -U -i p1 -w p1.pcap 2> tcpdump.err],
+                [tcpdump.pid])
+OVS_WAIT_UNTIL([grep "listening" tcpdump.err])
+
+dnl IPv4 Packet content:
+dnl   Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02
+dnl       Type: IPv4 (0x0800)
+dnl   Internet Protocol Version 4, Src: 10.1.1.1, Dst: 10.1.1.2
+dnl       0100 .... = Version: 4
+dnl       .... 0101 = Header Length: 20 bytes (5)
+dnl       Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
+dnl       Total Length: 38
+dnl       Identification: 0x0001 (1)
+dnl       001. .... = Flags: 0x1, More fragments
+dnl           0... .... = Reserved bit: Not set
+dnl           .0.. .... = Don't fragment: Not set
+dnl           ..1. .... = More fragments: Set
+dnl       ...0 0000 0000 0000 = Fragment Offset: 0
+dnl       Time to Live: 64
+dnl       Protocol: UDP (17)
+dnl       Header Checksum: 0x44c2
+dnl   Data (18 bytes)
+eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 08 00"
+ip="45 00 00 26 00 01 20 00 40 11 44 c2 0a 01 01 01 0a 01 01 02"
+data="0b c4 08 84 00 26 e9 64 01 02 03 04 05 06 07 08 09 0a"
+packet="${eth} ${ip} ${data}"
+
+dnl We send each packet multiple times, ones for learning which will flow
+dnl through the used datapath for learning, and the others will go through the
+dnl actual datapath.
+for i in 1 2 3 4 5; do
+  NS_CHECK_EXEC([at_ns0],
+                [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null])
+done
+
+dnl Update source address, and checksums in original packet for comparison.
+packet=$(echo "$packet" | sed -e 's/ //g' \
+         -e 's/0a010101/0b010101/g' -e 's/44c2/43c2/g' -e 's/e964/e864/g')
+OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5])
+
+dnl Repeat similar test with IPv6.
+dnl Packet content:
+dnl   Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02
+dnl       Type: IPv6 (0x86dd)
+dnl   Internet Protocol Version 6, Src: fc00::1, Dst: fc00::2
+dnl       Payload Length: 24
+dnl       Next Header: Fragment Header for IPv6 (44)
+dnl       Hop Limit: 64
+dnl       Fragment Header for IPv6
+dnl           Next header: UDP (17)
+dnl           Reserved octet: 0x00
+dnl           0000 0000 0000 0... = Offset: 0 (0 bytes)
+dnl           .... .... .... .00. = Reserved bits: 0
+dnl           .... .... .... ...1 = More Fragments: Yes
+dnl           Identification: 0x2316ab36
+dnl   Data (16 bytes)
+eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 86 dd"
+ip="60 00 00 00 00 18 2c 40 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 \
+    fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 11 00 00 01 23 16 ab 36"
+data="0b c4 08 84 00 26 07 65 01 02 03 04 05 06 07 08"
+packet="${eth} ${ip} ${data}"
+
+for i in 1 2 3 4 5; do
+  NS_CHECK_EXEC([at_ns0],
+                [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null])
+done
+
+dnl Update checksum and source address in original packet for comparison.
+packet=$(echo "$packet" | sed -e 's/ //g' -e 's/0765/0666/g' -e \
+       's/fc000000000000000000000000000001/fc000000000000000000000000000100/g')
+OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_BANNER([MPLS])
 
 AT_SETUP([mpls - encap header dp-support])