diff mbox series

[ovs-dev,v4] netdev-tc-offloads: Don't offload header modification on ip fragments.

Message ID 767e94c6e624df1642a5b5fbd05ea21fa5c89d5e.1733834547.git.echaudro@redhat.com
State Accepted
Commit 1be33d52af7785eff1d34e4eb650cf3f3932174e
Delegated to: Ilya Maximets
Headers show
Series [ovs-dev,v4] netdev-tc-offloads: Don't offload header modification on ip fragments. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Eelco Chaudron Dec. 10, 2024, 12:42 p.m. UTC
While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
{TCA_CSUM} combination as that it the only way to represent header
rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
IP fragments.

Since TC already applies fragmentation bit masking, this patch simply
needs to prevent these packets from being processed through TC.

Reported-at: https://issues.redhat.com/browse/FDP-545
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
v4: - When no mask is present for IP fragment bits do not offload.
v3: - Fixed some comment style issue.
    - Add the nw_frag mask if not set.
v2: - Fixed and added some comments.
    - Use ovs-pcap to compare packets.
---
 lib/netdev-offload-tc.c | 39 +++++++++++++++++
 lib/tc.c                |  5 ++-
 tests/system-traffic.at | 92 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 135 insertions(+), 1 deletion(-)

Comments

Ilya Maximets Dec. 12, 2024, 10:28 p.m. UTC | #1
On 12/10/24 13:42, Eelco Chaudron wrote:
> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
> {TCA_CSUM} combination as that it the only way to represent header
> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
> IP fragments.
> 
> Since TC already applies fragmentation bit masking, this patch simply
> needs to prevent these packets from being processed through TC.
> 
> Reported-at: https://issues.redhat.com/browse/FDP-545
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> ---
> v4: - When no mask is present for IP fragment bits do not offload.
> v3: - Fixed some comment style issue.
>     - Add the nw_frag mask if not set.
> v2: - Fixed and added some comments.
>     - Use ovs-pcap to compare packets.
> ---
>  lib/netdev-offload-tc.c | 39 +++++++++++++++++
>  lib/tc.c                |  5 ++-
>  tests/system-traffic.at | 92 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 135 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> index 3be1c08d2..3d917e7d9 100644
> --- a/lib/netdev-offload-tc.c
> +++ b/lib/netdev-offload-tc.c
> @@ -1490,6 +1490,31 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>          return 0;
>  }
>  
> +/* This function returns true if the tc layer will add a l4 checksum action
> + * for this set action.  Refer to the csum_update_flag() function for
> + * detailed logic.  Note that even the kernel only supports updating TCP,
> + * UDP and ICMPv6.
> + */
> +static bool
> +tc_will_add_l4_checksum(struct tc_flower *flower, int type)
> +{
> +    switch (type) {
> +    case OVS_KEY_ATTR_IPV4:
> +    case OVS_KEY_ATTR_IPV6:
> +    case OVS_KEY_ATTR_TCP:
> +    case OVS_KEY_ATTR_UDP:
> +        switch (flower->key.ip_proto) {
> +        case IPPROTO_TCP:
> +        case IPPROTO_UDP:
> +        case IPPROTO_ICMPV6:
> +        case IPPROTO_UDPLITE:
> +            return true;
> +        }
> +        break;
> +    }
> +    return false;
> +}
> +
>  static int
>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>                                   struct tc_action *action,
> @@ -1522,6 +1547,14 @@ parse_put_flow_set_masked_action(struct tc_flower *flower,
>          return EOPNOTSUPP;
>      }
>  
> +    if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
> +        && tc_will_add_l4_checksum(flower, type)) {
> +        VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
> +                    "due to checksum limitation", type);
> +        ofpbuf_uninit(&set_buf);
> +        return EOPNOTSUPP;
> +    }
> +
>      for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>          struct netlink_field *f = &set_flower_map[type][i];
>  
> @@ -2447,6 +2480,12 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>              }
>  
>              mask->nw_frag = 0;
> +        } else {
> +            /* This scenario should not occur. Currently, all installed IP DP
> +             * flows perform a fully masked match on the fragmentation bits.
> +             * However, since TC depends on this behavior, we return ENOTSUPP
> +             * for now in case this behavior changes in the future. */
> +             return EOPNOTSUPP;

Nit: Double spaces between sentences.

Otherwise:

Acked-by: Ilya Maximets <i.maximets@ovn.org>

Thanks!
Eelco Chaudron Dec. 13, 2024, 2:57 p.m. UTC | #2
On 12 Dec 2024, at 23:28, Ilya Maximets wrote:

> On 12/10/24 13:42, Eelco Chaudron wrote:
>> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
>> {TCA_CSUM} combination as that it the only way to represent header
>> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
>> IP fragments.
>>
>> Since TC already applies fragmentation bit masking, this patch simply
>> needs to prevent these packets from being processed through TC.
>>
>> Reported-at: https://issues.redhat.com/browse/FDP-545
>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>> ---
>> v4: - When no mask is present for IP fragment bits do not offload.
>> v3: - Fixed some comment style issue.
>>     - Add the nw_frag mask if not set.
>> v2: - Fixed and added some comments.
>>     - Use ovs-pcap to compare packets.
>> ---
>>  lib/netdev-offload-tc.c | 39 +++++++++++++++++
>>  lib/tc.c                |  5 ++-
>>  tests/system-traffic.at | 92 +++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 135 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index 3be1c08d2..3d917e7d9 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -1490,6 +1490,31 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>>          return 0;
>>  }
>>
>> +/* This function returns true if the tc layer will add a l4 checksum action
>> + * for this set action.  Refer to the csum_update_flag() function for
>> + * detailed logic.  Note that even the kernel only supports updating TCP,
>> + * UDP and ICMPv6.
>> + */
>> +static bool
>> +tc_will_add_l4_checksum(struct tc_flower *flower, int type)
>> +{
>> +    switch (type) {
>> +    case OVS_KEY_ATTR_IPV4:
>> +    case OVS_KEY_ATTR_IPV6:
>> +    case OVS_KEY_ATTR_TCP:
>> +    case OVS_KEY_ATTR_UDP:
>> +        switch (flower->key.ip_proto) {
>> +        case IPPROTO_TCP:
>> +        case IPPROTO_UDP:
>> +        case IPPROTO_ICMPV6:
>> +        case IPPROTO_UDPLITE:
>> +            return true;
>> +        }
>> +        break;
>> +    }
>> +    return false;
>> +}
>> +
>>  static int
>>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>>                                   struct tc_action *action,
>> @@ -1522,6 +1547,14 @@ parse_put_flow_set_masked_action(struct tc_flower *flower,
>>          return EOPNOTSUPP;
>>      }
>>
>> +    if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>> +        && tc_will_add_l4_checksum(flower, type)) {
>> +        VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
>> +                    "due to checksum limitation", type);
>> +        ofpbuf_uninit(&set_buf);
>> +        return EOPNOTSUPP;
>> +    }
>> +
>>      for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>>          struct netlink_field *f = &set_flower_map[type][i];
>>
>> @@ -2447,6 +2480,12 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>>              }
>>
>>              mask->nw_frag = 0;
>> +        } else {
>> +            /* This scenario should not occur. Currently, all installed IP DP
>> +             * flows perform a fully masked match on the fragmentation bits.
>> +             * However, since TC depends on this behavior, we return ENOTSUPP
>> +             * for now in case this behavior changes in the future. */
>> +             return EOPNOTSUPP;
>
> Nit: Double spaces between sentences.

Thanks Ilya, this is something I’ll never learn :)

> Otherwise:
>
> Acked-by: Ilya Maximets <i.maximets@ovn.org>
>
> Thanks!
Eelco Chaudron Dec. 13, 2024, 4:09 p.m. UTC | #3
On 12 Dec 2024, at 23:28, Ilya Maximets wrote:

> On 12/10/24 13:42, Eelco Chaudron wrote:
>> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
>> {TCA_CSUM} combination as that it the only way to represent header
>> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
>> IP fragments.
>>
>> Since TC already applies fragmentation bit masking, this patch simply
>> needs to prevent these packets from being processed through TC.
>>
>> Reported-at: https://issues.redhat.com/browse/FDP-545
>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>> ---
>> v4: - When no mask is present for IP fragment bits do not offload.
>> v3: - Fixed some comment style issue.
>>     - Add the nw_frag mask if not set.
>> v2: - Fixed and added some comments.
>>     - Use ovs-pcap to compare packets.
>> ---
>>  lib/netdev-offload-tc.c | 39 +++++++++++++++++
>>  lib/tc.c                |  5 ++-
>>  tests/system-traffic.at | 92 +++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 135 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index 3be1c08d2..3d917e7d9 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -1490,6 +1490,31 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>>          return 0;
>>  }
>>
>> +/* This function returns true if the tc layer will add a l4 checksum action
>> + * for this set action.  Refer to the csum_update_flag() function for
>> + * detailed logic.  Note that even the kernel only supports updating TCP,
>> + * UDP and ICMPv6.
>> + */
>> +static bool
>> +tc_will_add_l4_checksum(struct tc_flower *flower, int type)
>> +{
>> +    switch (type) {
>> +    case OVS_KEY_ATTR_IPV4:
>> +    case OVS_KEY_ATTR_IPV6:
>> +    case OVS_KEY_ATTR_TCP:
>> +    case OVS_KEY_ATTR_UDP:
>> +        switch (flower->key.ip_proto) {
>> +        case IPPROTO_TCP:
>> +        case IPPROTO_UDP:
>> +        case IPPROTO_ICMPV6:
>> +        case IPPROTO_UDPLITE:
>> +            return true;
>> +        }
>> +        break;
>> +    }
>> +    return false;
>> +}
>> +
>>  static int
>>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>>                                   struct tc_action *action,
>> @@ -1522,6 +1547,14 @@ parse_put_flow_set_masked_action(struct tc_flower *flower,
>>          return EOPNOTSUPP;
>>      }
>>
>> +    if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>> +        && tc_will_add_l4_checksum(flower, type)) {
>> +        VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
>> +                    "due to checksum limitation", type);
>> +        ofpbuf_uninit(&set_buf);
>> +        return EOPNOTSUPP;
>> +    }
>> +
>>      for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>>          struct netlink_field *f = &set_flower_map[type][i];
>>
>> @@ -2447,6 +2480,12 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>>              }
>>
>>              mask->nw_frag = 0;
>> +        } else {
>> +            /* This scenario should not occur. Currently, all installed IP DP
>> +             * flows perform a fully masked match on the fragmentation bits.
>> +             * However, since TC depends on this behavior, we return ENOTSUPP
>> +             * for now in case this behavior changes in the future. */
>> +             return EOPNOTSUPP;
>
> Nit: Double spaces between sentences.
>
> Otherwise:
>
> Acked-by: Ilya Maximets <i.maximets@ovn.org>

Applied to main with nit fixed.

//Eelco
Ilya Maximets Dec. 13, 2024, 4:41 p.m. UTC | #4
On 12/13/24 17:09, Eelco Chaudron wrote:
> 
> 
> On 12 Dec 2024, at 23:28, Ilya Maximets wrote:
> 
>> On 12/10/24 13:42, Eelco Chaudron wrote:
>>> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
>>> {TCA_CSUM} combination as that it the only way to represent header
>>> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
>>> IP fragments.
>>>
>>> Since TC already applies fragmentation bit masking, this patch simply
>>> needs to prevent these packets from being processed through TC.
>>>
>>> Reported-at: https://issues.redhat.com/browse/FDP-545
>>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>>> ---
>>> v4: - When no mask is present for IP fragment bits do not offload.
>>> v3: - Fixed some comment style issue.
>>>     - Add the nw_frag mask if not set.
>>> v2: - Fixed and added some comments.
>>>     - Use ovs-pcap to compare packets.
>>> ---
>>>  lib/netdev-offload-tc.c | 39 +++++++++++++++++
>>>  lib/tc.c                |  5 ++-
>>>  tests/system-traffic.at | 92 +++++++++++++++++++++++++++++++++++++++++
>>>  3 files changed, 135 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>>> index 3be1c08d2..3d917e7d9 100644
>>> --- a/lib/netdev-offload-tc.c
>>> +++ b/lib/netdev-offload-tc.c
>>> @@ -1490,6 +1490,31 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>>>          return 0;
>>>  }
>>>
>>> +/* This function returns true if the tc layer will add a l4 checksum action
>>> + * for this set action.  Refer to the csum_update_flag() function for
>>> + * detailed logic.  Note that even the kernel only supports updating TCP,
>>> + * UDP and ICMPv6.
>>> + */
>>> +static bool
>>> +tc_will_add_l4_checksum(struct tc_flower *flower, int type)
>>> +{
>>> +    switch (type) {
>>> +    case OVS_KEY_ATTR_IPV4:
>>> +    case OVS_KEY_ATTR_IPV6:
>>> +    case OVS_KEY_ATTR_TCP:
>>> +    case OVS_KEY_ATTR_UDP:
>>> +        switch (flower->key.ip_proto) {
>>> +        case IPPROTO_TCP:
>>> +        case IPPROTO_UDP:
>>> +        case IPPROTO_ICMPV6:
>>> +        case IPPROTO_UDPLITE:
>>> +            return true;
>>> +        }
>>> +        break;
>>> +    }
>>> +    return false;
>>> +}
>>> +
>>>  static int
>>>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>>>                                   struct tc_action *action,
>>> @@ -1522,6 +1547,14 @@ parse_put_flow_set_masked_action(struct tc_flower *flower,
>>>          return EOPNOTSUPP;
>>>      }
>>>
>>> +    if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>>> +        && tc_will_add_l4_checksum(flower, type)) {
>>> +        VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
>>> +                    "due to checksum limitation", type);
>>> +        ofpbuf_uninit(&set_buf);
>>> +        return EOPNOTSUPP;
>>> +    }
>>> +
>>>      for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>>>          struct netlink_field *f = &set_flower_map[type][i];
>>>
>>> @@ -2447,6 +2480,12 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>>>              }
>>>
>>>              mask->nw_frag = 0;
>>> +        } else {
>>> +            /* This scenario should not occur. Currently, all installed IP DP
>>> +             * flows perform a fully masked match on the fragmentation bits.
>>> +             * However, since TC depends on this behavior, we return ENOTSUPP
>>> +             * for now in case this behavior changes in the future. */
>>> +             return EOPNOTSUPP;
>>
>> Nit: Double spaces between sentences.
>>
>> Otherwise:
>>
>> Acked-by: Ilya Maximets <i.maximets@ovn.org>
> 
> Applied to main with nit fixed.

Should this also be backported at least to 3.3 ?

Best regards, Ilya Maximets.
Eelco Chaudron Dec. 16, 2024, 11:30 a.m. UTC | #5
On 13 Dec 2024, at 17:41, Ilya Maximets wrote:

> On 12/13/24 17:09, Eelco Chaudron wrote:
>>
>>
>> On 12 Dec 2024, at 23:28, Ilya Maximets wrote:
>>
>>> On 12/10/24 13:42, Eelco Chaudron wrote:
>>>> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
>>>> {TCA_CSUM} combination as that it the only way to represent header
>>>> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
>>>> IP fragments.
>>>>
>>>> Since TC already applies fragmentation bit masking, this patch simply
>>>> needs to prevent these packets from being processed through TC.
>>>>
>>>> Reported-at: https://issues.redhat.com/browse/FDP-545
>>>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>>>> ---
>>>> v4: - When no mask is present for IP fragment bits do not offload.
>>>> v3: - Fixed some comment style issue.
>>>>     - Add the nw_frag mask if not set.
>>>> v2: - Fixed and added some comments.
>>>>     - Use ovs-pcap to compare packets.
>>>> ---
>>>>  lib/netdev-offload-tc.c | 39 +++++++++++++++++
>>>>  lib/tc.c                |  5 ++-
>>>>  tests/system-traffic.at | 92 +++++++++++++++++++++++++++++++++++++++++
>>>>  3 files changed, 135 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>>>> index 3be1c08d2..3d917e7d9 100644
>>>> --- a/lib/netdev-offload-tc.c
>>>> +++ b/lib/netdev-offload-tc.c
>>>> @@ -1490,6 +1490,31 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>>>>          return 0;
>>>>  }
>>>>
>>>> +/* This function returns true if the tc layer will add a l4 checksum action
>>>> + * for this set action.  Refer to the csum_update_flag() function for
>>>> + * detailed logic.  Note that even the kernel only supports updating TCP,
>>>> + * UDP and ICMPv6.
>>>> + */
>>>> +static bool
>>>> +tc_will_add_l4_checksum(struct tc_flower *flower, int type)
>>>> +{
>>>> +    switch (type) {
>>>> +    case OVS_KEY_ATTR_IPV4:
>>>> +    case OVS_KEY_ATTR_IPV6:
>>>> +    case OVS_KEY_ATTR_TCP:
>>>> +    case OVS_KEY_ATTR_UDP:
>>>> +        switch (flower->key.ip_proto) {
>>>> +        case IPPROTO_TCP:
>>>> +        case IPPROTO_UDP:
>>>> +        case IPPROTO_ICMPV6:
>>>> +        case IPPROTO_UDPLITE:
>>>> +            return true;
>>>> +        }
>>>> +        break;
>>>> +    }
>>>> +    return false;
>>>> +}
>>>> +
>>>>  static int
>>>>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>>>>                                   struct tc_action *action,
>>>> @@ -1522,6 +1547,14 @@ parse_put_flow_set_masked_action(struct tc_flower *flower,
>>>>          return EOPNOTSUPP;
>>>>      }
>>>>
>>>> +    if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>>>> +        && tc_will_add_l4_checksum(flower, type)) {
>>>> +        VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
>>>> +                    "due to checksum limitation", type);
>>>> +        ofpbuf_uninit(&set_buf);
>>>> +        return EOPNOTSUPP;
>>>> +    }
>>>> +
>>>>      for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>>>>          struct netlink_field *f = &set_flower_map[type][i];
>>>>
>>>> @@ -2447,6 +2480,12 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>>>>              }
>>>>
>>>>              mask->nw_frag = 0;
>>>> +        } else {
>>>> +            /* This scenario should not occur. Currently, all installed IP DP
>>>> +             * flows perform a fully masked match on the fragmentation bits.
>>>> +             * However, since TC depends on this behavior, we return ENOTSUPP
>>>> +             * for now in case this behavior changes in the future. */
>>>> +             return EOPNOTSUPP;
>>>
>>> Nit: Double spaces between sentences.
>>>
>>> Otherwise:
>>>
>>> Acked-by: Ilya Maximets <i.maximets@ovn.org>
>>
>> Applied to main with nit fixed.
>
> Should this also be backported at least to 3.3 ?

I’m fine with backporting this, was just not sure as it is a behavior change (which should fix the behavior as it should be failing as is).

If no one objects, I’ll backport it later this week.

//Eelco
Eelco Chaudron Dec. 18, 2024, 1:48 p.m. UTC | #6
On 16 Dec 2024, at 12:30, Eelco Chaudron wrote:

> On 13 Dec 2024, at 17:41, Ilya Maximets wrote:
>
>> On 12/13/24 17:09, Eelco Chaudron wrote:
>>>
>>>
>>> On 12 Dec 2024, at 23:28, Ilya Maximets wrote:
>>>
>>>> On 12/10/24 13:42, Eelco Chaudron wrote:
>>>>> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
>>>>> {TCA_CSUM} combination as that it the only way to represent header
>>>>> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
>>>>> IP fragments.
>>>>>
>>>>> Since TC already applies fragmentation bit masking, this patch simply
>>>>> needs to prevent these packets from being processed through TC.
>>>>>
>>>>> Reported-at: https://issues.redhat.com/browse/FDP-545
>>>>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>>>>> ---
>>>>> v4: - When no mask is present for IP fragment bits do not offload.
>>>>> v3: - Fixed some comment style issue.
>>>>>     - Add the nw_frag mask if not set.
>>>>> v2: - Fixed and added some comments.
>>>>>     - Use ovs-pcap to compare packets.
>>>>> ---
>>>>>  lib/netdev-offload-tc.c | 39 +++++++++++++++++
>>>>>  lib/tc.c                |  5 ++-
>>>>>  tests/system-traffic.at | 92 +++++++++++++++++++++++++++++++++++++++++
>>>>>  3 files changed, 135 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>>>>> index 3be1c08d2..3d917e7d9 100644
>>>>> --- a/lib/netdev-offload-tc.c
>>>>> +++ b/lib/netdev-offload-tc.c
>>>>> @@ -1490,6 +1490,31 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>>>>>          return 0;
>>>>>  }
>>>>>
>>>>> +/* This function returns true if the tc layer will add a l4 checksum action
>>>>> + * for this set action.  Refer to the csum_update_flag() function for
>>>>> + * detailed logic.  Note that even the kernel only supports updating TCP,
>>>>> + * UDP and ICMPv6.
>>>>> + */
>>>>> +static bool
>>>>> +tc_will_add_l4_checksum(struct tc_flower *flower, int type)
>>>>> +{
>>>>> +    switch (type) {
>>>>> +    case OVS_KEY_ATTR_IPV4:
>>>>> +    case OVS_KEY_ATTR_IPV6:
>>>>> +    case OVS_KEY_ATTR_TCP:
>>>>> +    case OVS_KEY_ATTR_UDP:
>>>>> +        switch (flower->key.ip_proto) {
>>>>> +        case IPPROTO_TCP:
>>>>> +        case IPPROTO_UDP:
>>>>> +        case IPPROTO_ICMPV6:
>>>>> +        case IPPROTO_UDPLITE:
>>>>> +            return true;
>>>>> +        }
>>>>> +        break;
>>>>> +    }
>>>>> +    return false;
>>>>> +}
>>>>> +
>>>>>  static int
>>>>>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>>>>>                                   struct tc_action *action,
>>>>> @@ -1522,6 +1547,14 @@ parse_put_flow_set_masked_action(struct tc_flower *flower,
>>>>>          return EOPNOTSUPP;
>>>>>      }
>>>>>
>>>>> +    if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>>>>> +        && tc_will_add_l4_checksum(flower, type)) {
>>>>> +        VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
>>>>> +                    "due to checksum limitation", type);
>>>>> +        ofpbuf_uninit(&set_buf);
>>>>> +        return EOPNOTSUPP;
>>>>> +    }
>>>>> +
>>>>>      for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>>>>>          struct netlink_field *f = &set_flower_map[type][i];
>>>>>
>>>>> @@ -2447,6 +2480,12 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>>>>>              }
>>>>>
>>>>>              mask->nw_frag = 0;
>>>>> +        } else {
>>>>> +            /* This scenario should not occur. Currently, all installed IP DP
>>>>> +             * flows perform a fully masked match on the fragmentation bits.
>>>>> +             * However, since TC depends on this behavior, we return ENOTSUPP
>>>>> +             * for now in case this behavior changes in the future. */
>>>>> +             return EOPNOTSUPP;
>>>>
>>>> Nit: Double spaces between sentences.
>>>>
>>>> Otherwise:
>>>>
>>>> Acked-by: Ilya Maximets <i.maximets@ovn.org>
>>>
>>> Applied to main with nit fixed.
>>
>> Should this also be backported at least to 3.3 ?
>
> I’m fine with backporting this, was just not sure as it is a behavior change (which should fix the behavior as it should be failing as is).
>
> If no one objects, I’ll backport it later this week.

FYI, backported down to 3.3.

//Eelco
diff mbox series

Patch

diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 3be1c08d2..3d917e7d9 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -1490,6 +1490,31 @@  parse_put_flow_ct_action(struct tc_flower *flower,
         return 0;
 }
 
+/* This function returns true if the tc layer will add a l4 checksum action
+ * for this set action.  Refer to the csum_update_flag() function for
+ * detailed logic.  Note that even the kernel only supports updating TCP,
+ * UDP and ICMPv6.
+ */
+static bool
+tc_will_add_l4_checksum(struct tc_flower *flower, int type)
+{
+    switch (type) {
+    case OVS_KEY_ATTR_IPV4:
+    case OVS_KEY_ATTR_IPV6:
+    case OVS_KEY_ATTR_TCP:
+    case OVS_KEY_ATTR_UDP:
+        switch (flower->key.ip_proto) {
+        case IPPROTO_TCP:
+        case IPPROTO_UDP:
+        case IPPROTO_ICMPV6:
+        case IPPROTO_UDPLITE:
+            return true;
+        }
+        break;
+    }
+    return false;
+}
+
 static int
 parse_put_flow_set_masked_action(struct tc_flower *flower,
                                  struct tc_action *action,
@@ -1522,6 +1547,14 @@  parse_put_flow_set_masked_action(struct tc_flower *flower,
         return EOPNOTSUPP;
     }
 
+    if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
+        && tc_will_add_l4_checksum(flower, type)) {
+        VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
+                    "due to checksum limitation", type);
+        ofpbuf_uninit(&set_buf);
+        return EOPNOTSUPP;
+    }
+
     for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
         struct netlink_field *f = &set_flower_map[type][i];
 
@@ -2447,6 +2480,12 @@  netdev_tc_flow_put(struct netdev *netdev, struct match *match,
             }
 
             mask->nw_frag = 0;
+        } else {
+            /* This scenario should not occur. Currently, all installed IP DP
+             * flows perform a fully masked match on the fragmentation bits.
+             * However, since TC depends on this behavior, we return ENOTSUPP
+             * for now in case this behavior changes in the future. */
+             return EOPNOTSUPP;
         }
 
         if (key->nw_proto == IPPROTO_TCP) {
diff --git a/lib/tc.c b/lib/tc.c
index e55ba3b1b..2510d0a3f 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -2958,7 +2958,10 @@  csum_update_flag(struct tc_flower *flower,
      * eth(dst=<mac>),eth_type(0x0800) actions=set(ipv4(src=<new_ip>))
      * we need to force a more specific flow as this can, for example,
      * need a recalculation of icmp checksum if the packet that passes
-     * is ICMPv6 and tcp checksum if its tcp. */
+     * is ICMPv6 and tcp checksum if its tcp.
+     *
+     * This section of the code must be kept in sync with the pre-check
+     * function in netdev-offload-tc.c, tc_will_add_l4_checksum(). */
 
     switch (htype) {
     case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 202ff0492..16cf9370d 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -2436,6 +2436,98 @@  recirc_id(<recirc>),in_port(2),eth_type(0x86dd),ipv6(proto=58,frag=no),icmpv6(ty
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([datapath - mod_nw_src/set_field on IP fragments])
+AT_SKIP_IF([test $HAVE_TCPDUMP = no])
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03)
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02)
+
+AT_DATA([flows.txt], [dnl
+  in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1
+  in_port=ovs-p0,ipv6,ipv6_src=fc00::1 actions=set_field:fc00::100->ipv6_src,ovs-p1
+])
+
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt])
+
+NETNS_DAEMONIZE([at_ns1],
+                [tcpdump -l -nn -xx -U -i p1 -w p1.pcap 2> tcpdump.err],
+                [tcpdump.pid])
+OVS_WAIT_UNTIL([grep "listening" tcpdump.err])
+
+dnl IPv4 Packet content:
+dnl   Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02
+dnl       Type: IPv4 (0x0800)
+dnl   Internet Protocol Version 4, Src: 10.1.1.1, Dst: 10.1.1.2
+dnl       0100 .... = Version: 4
+dnl       .... 0101 = Header Length: 20 bytes (5)
+dnl       Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
+dnl       Total Length: 38
+dnl       Identification: 0x0001 (1)
+dnl       001. .... = Flags: 0x1, More fragments
+dnl           0... .... = Reserved bit: Not set
+dnl           .0.. .... = Don't fragment: Not set
+dnl           ..1. .... = More fragments: Set
+dnl       ...0 0000 0000 0000 = Fragment Offset: 0
+dnl       Time to Live: 64
+dnl       Protocol: UDP (17)
+dnl       Header Checksum: 0x44c2
+dnl   Data (18 bytes)
+eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 08 00"
+ip="45 00 00 26 00 01 20 00 40 11 44 c2 0a 01 01 01 0a 01 01 02"
+data="0b c4 08 84 00 26 e9 64 01 02 03 04 05 06 07 08 09 0a"
+packet="${eth} ${ip} ${data}"
+
+dnl We send each packet multiple times, one for learning, which will go to
+dnl ovs-vswitchd, and the others will go through the actual datapath.
+for i in 1 2 3 4 5; do
+  NS_CHECK_EXEC([at_ns0],
+                [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null])
+done
+
+dnl Update source address and checksums in original packet for comparison.
+packet=$(echo "$packet" | sed -e 's/ //g' \
+         -e 's/0a010101/0b010101/g' -e 's/44c2/43c2/g' -e 's/e964/e864/g')
+OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5])
+
+dnl Repeat similar test with IPv6.
+dnl Packet content:
+dnl   Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02
+dnl       Type: IPv6 (0x86dd)
+dnl   Internet Protocol Version 6, Src: fc00::1, Dst: fc00::2
+dnl       Payload Length: 24
+dnl       Next Header: Fragment Header for IPv6 (44)
+dnl       Hop Limit: 64
+dnl       Fragment Header for IPv6
+dnl           Next header: UDP (17)
+dnl           Reserved octet: 0x00
+dnl           0000 0000 0000 0... = Offset: 0 (0 bytes)
+dnl           .... .... .... .00. = Reserved bits: 0
+dnl           .... .... .... ...1 = More Fragments: Yes
+dnl           Identification: 0x2316ab36
+dnl   Data (16 bytes)
+eth="36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 86 dd"
+ip="60 00 00 00 00 18 2c 40 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 \
+    fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 11 00 00 01 23 16 ab 36"
+data="0b c4 08 84 00 26 07 65 01 02 03 04 05 06 07 08"
+packet="${eth} ${ip} ${data}"
+
+for i in 1 2 3 4 5; do
+  NS_CHECK_EXEC([at_ns0],
+                [$PYTHON3 $srcdir/sendpkt.py p0 ${packet} > /dev/null])
+done
+
+dnl Update checksum and source address in original packet for comparison.
+packet=$(echo "$packet" | sed -e 's/ //g' -e 's/0765/0666/g' -e \
+       's/fc000000000000000000000000000001/fc000000000000000000000000000100/g')
+OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_BANNER([MPLS])
 
 AT_SETUP([mpls - encap header dp-support])