diff mbox series

[net] net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames

Message ID 1548939763-20074-1-git-send-email-tariqt@mellanox.com
State Changes Requested
Delegated to: David Miller
Headers show
Series [net] net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames | expand

Commit Message

Tariq Toukan Jan. 31, 2019, 1:02 p.m. UTC
From: Saeed Mahameed <saeedm@mellanox.com>

When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are usually zero's, which don't affect
checksum. However, it is not guaranteed. For example, switches might
choose to make other use of these octets.
This repeatedly causes kernel hardware checksum fault.

Prior to the cited commit below, skb checksum was forced to be
CHECKSUM_NONE when padding is detected. After it, we need to keep
skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
verify and parse IP headers, it does not worth the effort as the packets
are so small that CHECKSUM_COMPLETE has no significant advantage.

Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Hi Dave,
Please queue for -stable >= v4.18.

Thanks,
Tariq

Comments

Eric Dumazet Jan. 31, 2019, 3:02 p.m. UTC | #1
On Thu, Jan 31, 2019 at 5:04 AM Tariq Toukan <tariqt@mellanox.com> wrote:
>
> From: Saeed Mahameed <saeedm@mellanox.com>
>
> When an ethernet frame is padded to meet the minimum ethernet frame
> size, the padding octets are not covered by the hardware checksum.
> Fortunately the padding octets are usually zero's, which don't affect
> checksum. However, it is not guaranteed. For example, switches might
> choose to make other use of these octets.
> This repeatedly causes kernel hardware checksum fault.
>
> Prior to the cited commit below, skb checksum was forced to be
> CHECKSUM_NONE when padding is detected. After it, we need to keep
> skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
> verify and parse IP headers, it does not worth the effort as the packets
> are so small that CHECKSUM_COMPLETE has no significant advantage.
>
> Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
>
> Hi Dave,
> Please queue for -stable >= v4.18.
>
> Thanks,
> Tariq
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 9a0881cb7f51..fc8a11444e1a 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -617,6 +617,8 @@ static int get_fixed_ipv6_csum(__wsum hw_checksum, struct sk_buff *skb,
>  }
>  #endif
>
> +#define short_frame(size) ((size) <= ETH_ZLEN + ETH_FCS_LEN)
> +
>  /* We reach this function only after checking that any of
>   * the (IPv4 | IPv6) bits are set in cqe->status.
>   */
> @@ -624,9 +626,22 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
>                       netdev_features_t dev_features)
>  {
>         __wsum hw_checksum = 0;
> +       void *hdr;
> +
> +       /* CQE csum doesn't cover padding octets in short ethernet
> +        * frames. And the pad field is appended prior to calculating
> +        * and appending the FCS field.
> +        *
> +        * Detecting these padded frames requires to verify and parse
> +        * IP headers, so we simply force all those small frames to be
> +        * CHECKSUM_NONE even if they are not padded.
> +        * TODO: better if we report CHECKSUM_UNNECESSARY but this
> +        * demands some refactroing.

This TODO: looks bogus to me.

mlx4 driver first tries to use CHECKSUM_UNNECESSARY.

if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_TCP |
                                                  MLX4_CQE_STATUS_UDP)) &&
     (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
      cqe->checksum == cpu_to_be16(0xffff)) {
     ...
      ip_summed = CHECKSUM_UNNECESSARY;
      ...
}

Then if this attempt failed, it tries to use CHECKSUM_COMPLETE (and
calls this check_sum() function)

So there is no refactoring to be done for mlx4 : short
IPv[46]+{UDP|TCP} frames get CHECKSUM_UNNECESSARY already

> +        */
> +       if (short_frame(skb->len))
> +               return -EINVAL;
>
> -       void *hdr = (u8 *)va + sizeof(struct ethhdr);
> -
> +       hdr = (u8 *)va + sizeof(struct ethhdr);
>         hw_checksum = csum_unfold((__force __sum16)cqe->checksum);
>
>         if (cqe->vlan_my_qpn & cpu_to_be32(MLX4_CQE_CVLAN_PRESENT_MASK) &&
> --
> 1.8.3.1
>
David Miller Jan. 31, 2019, 5:38 p.m. UTC | #2
From: Tariq Toukan <tariqt@mellanox.com>
Date: Thu, 31 Jan 2019 15:02:43 +0200

> From: Saeed Mahameed <saeedm@mellanox.com>
> 
> When an ethernet frame is padded to meet the minimum ethernet frame
> size, the padding octets are not covered by the hardware checksum.
> Fortunately the padding octets are usually zero's, which don't affect
> checksum. However, it is not guaranteed. For example, switches might
> choose to make other use of these octets.
> This repeatedly causes kernel hardware checksum fault.
> 
> Prior to the cited commit below, skb checksum was forced to be
> CHECKSUM_NONE when padding is detected. After it, we need to keep
> skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
> verify and parse IP headers, it does not worth the effort as the packets
> are so small that CHECKSUM_COMPLETE has no significant advantage.
> 
> Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> Hi Dave,
> Please queue for -stable >= v4.18.

Please look into Eric's feedback and update the comment as needed.

Thank you.
Saeed Mahameed Jan. 31, 2019, 7:04 p.m. UTC | #3
On Thu, 2019-01-31 at 07:02 -0800, Eric Dumazet wrote:
> On Thu, Jan 31, 2019 at 5:04 AM Tariq Toukan <tariqt@mellanox.com>
> wrote:
> > From: Saeed Mahameed <saeedm@mellanox.com>
> > 
> > When an ethernet frame is padded to meet the minimum ethernet frame
> > size, the padding octets are not covered by the hardware checksum.
> > Fortunately the padding octets are usually zero's, which don't
> > affect
> > checksum. However, it is not guaranteed. For example, switches
> > might
> > choose to make other use of these octets.
> > This repeatedly causes kernel hardware checksum fault.
> > 
> > Prior to the cited commit below, skb checksum was forced to be
> > CHECKSUM_NONE when padding is detected. After it, we need to keep
> > skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
> > verify and parse IP headers, it does not worth the effort as the
> > packets
> > are so small that CHECKSUM_COMPLETE has no significant advantage.
> > 
> > Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE
> > are friends")
> > Cc: Eric Dumazet <edumazet@google.com>
> > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> > Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlx4/en_rx.c | 19 +++++++++++++++++-
> > -
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> > 
> > Hi Dave,
> > Please queue for -stable >= v4.18.
> > 
> > Thanks,
> > Tariq
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > index 9a0881cb7f51..fc8a11444e1a 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > @@ -617,6 +617,8 @@ static int get_fixed_ipv6_csum(__wsum
> > hw_checksum, struct sk_buff *skb,
> >  }
> >  #endif
> > 
> > +#define short_frame(size) ((size) <= ETH_ZLEN + ETH_FCS_LEN)
> > +
> >  /* We reach this function only after checking that any of
> >   * the (IPv4 | IPv6) bits are set in cqe->status.
> >   */
> > @@ -624,9 +626,22 @@ static int check_csum(struct mlx4_cqe *cqe,
> > struct sk_buff *skb, void *va,
> >                       netdev_features_t dev_features)
> >  {
> >         __wsum hw_checksum = 0;
> > +       void *hdr;
> > +
> > +       /* CQE csum doesn't cover padding octets in short ethernet
> > +        * frames. And the pad field is appended prior to
> > calculating
> > +        * and appending the FCS field.
> > +        *
> > +        * Detecting these padded frames requires to verify and
> > parse
> > +        * IP headers, so we simply force all those small frames to
> > be
> > +        * CHECKSUM_NONE even if they are not padded.
> > +        * TODO: better if we report CHECKSUM_UNNECESSARY but this
> > +        * demands some refactroing.
> 
> This TODO: looks bogus to me.
> 
> mlx4 driver first tries to use CHECKSUM_UNNECESSARY.
> 
> if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_TCP |
>                                                   MLX4_CQE_STATUS_UDP
> )) &&
>      (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
>       cqe->checksum == cpu_to_be16(0xffff)) {
>      ...
>       ip_summed = CHECKSUM_UNNECESSARY;
>       ...
> }
> 
> Then if this attempt failed, it tries to use CHECKSUM_COMPLETE (and
> calls this check_sum() function)
> 
> So there is no refactoring to be done for mlx4 : short
> IPv[46]+{UDP|TCP} frames get CHECKSUM_UNNECESSARY already

not exactly, considering mlx4 weird condition to decide to do
CHECKSUM_UNNECESSARY:

if ((cqe csum fields are valid) && cqe->checksum ==
cpu_to_be16(0xffff)) 
     { do CHECKSUM_UNNECESSARY }
else { do CHECKSUM_COMPLETE }

CHECKSUM_UNNECESSARY will be skipped if csum complete field is valid 
i.e (cqe->checksum != 0xffff)

but if checksum complete is to be skipped due to short_frame we want to
go back to CHECKSUM_UNNECESSARY, this is the refactoring i am talking
about :).


I hope now it is clear.. 


> 
> > +        */
> > +       if (short_frame(skb->len))
> > +               return -EINVAL;
> > 
> > -       void *hdr = (u8 *)va + sizeof(struct ethhdr);
> > -
> > +       hdr = (u8 *)va + sizeof(struct ethhdr);
> >         hw_checksum = csum_unfold((__force __sum16)cqe->checksum);
> > 
> >         if (cqe->vlan_my_qpn &
> > cpu_to_be32(MLX4_CQE_CVLAN_PRESENT_MASK) &&
> > --
> > 1.8.3.1
> >
Eric Dumazet Jan. 31, 2019, 7:07 p.m. UTC | #4
On Thu, Jan 31, 2019 at 11:04 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
>
> On Thu, 2019-01-31 at 07:02 -0800, Eric Dumazet wrote:
> > On Thu, Jan 31, 2019 at 5:04 AM Tariq Toukan <tariqt@mellanox.com>
> > wrote:
> > > From: Saeed Mahameed <saeedm@mellanox.com>
> > >
> > > When an ethernet frame is padded to meet the minimum ethernet frame
> > > size, the padding octets are not covered by the hardware checksum.
> > > Fortunately the padding octets are usually zero's, which don't
> > > affect
> > > checksum. However, it is not guaranteed. For example, switches
> > > might
> > > choose to make other use of these octets.
> > > This repeatedly causes kernel hardware checksum fault.
> > >
> > > Prior to the cited commit below, skb checksum was forced to be
> > > CHECKSUM_NONE when padding is detected. After it, we need to keep
> > > skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
> > > verify and parse IP headers, it does not worth the effort as the
> > > packets
> > > are so small that CHECKSUM_COMPLETE has no significant advantage.
> > >
> > > Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE
> > > are friends")
> > > Cc: Eric Dumazet <edumazet@google.com>
> > > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> > > Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> > > ---
> > >  drivers/net/ethernet/mellanox/mlx4/en_rx.c | 19 +++++++++++++++++-
> > > -
> > >  1 file changed, 17 insertions(+), 2 deletions(-)
> > >
> > > Hi Dave,
> > > Please queue for -stable >= v4.18.
> > >
> > > Thanks,
> > > Tariq
> > >
> > > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > > b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > > index 9a0881cb7f51..fc8a11444e1a 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > > @@ -617,6 +617,8 @@ static int get_fixed_ipv6_csum(__wsum
> > > hw_checksum, struct sk_buff *skb,
> > >  }
> > >  #endif
> > >
> > > +#define short_frame(size) ((size) <= ETH_ZLEN + ETH_FCS_LEN)
> > > +
> > >  /* We reach this function only after checking that any of
> > >   * the (IPv4 | IPv6) bits are set in cqe->status.
> > >   */
> > > @@ -624,9 +626,22 @@ static int check_csum(struct mlx4_cqe *cqe,
> > > struct sk_buff *skb, void *va,
> > >                       netdev_features_t dev_features)
> > >  {
> > >         __wsum hw_checksum = 0;
> > > +       void *hdr;
> > > +
> > > +       /* CQE csum doesn't cover padding octets in short ethernet
> > > +        * frames. And the pad field is appended prior to
> > > calculating
> > > +        * and appending the FCS field.
> > > +        *
> > > +        * Detecting these padded frames requires to verify and
> > > parse
> > > +        * IP headers, so we simply force all those small frames to
> > > be
> > > +        * CHECKSUM_NONE even if they are not padded.
> > > +        * TODO: better if we report CHECKSUM_UNNECESSARY but this
> > > +        * demands some refactroing.
> >
> > This TODO: looks bogus to me.
> >
> > mlx4 driver first tries to use CHECKSUM_UNNECESSARY.
> >
> > if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_TCP |
> >                                                   MLX4_CQE_STATUS_UDP
> > )) &&
> >      (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
> >       cqe->checksum == cpu_to_be16(0xffff)) {
> >      ...
> >       ip_summed = CHECKSUM_UNNECESSARY;
> >       ...
> > }
> >
> > Then if this attempt failed, it tries to use CHECKSUM_COMPLETE (and
> > calls this check_sum() function)
> >
> > So there is no refactoring to be done for mlx4 : short
> > IPv[46]+{UDP|TCP} frames get CHECKSUM_UNNECESSARY already
>
> not exactly, considering mlx4 weird condition to decide to do
> CHECKSUM_UNNECESSARY:
>
> if ((cqe csum fields are valid) && cqe->checksum ==
> cpu_to_be16(0xffff))
>      { do CHECKSUM_UNNECESSARY }
> else { do CHECKSUM_COMPLETE }
>
> CHECKSUM_UNNECESSARY will be skipped if csum complete field is valid
> i.e (cqe->checksum != 0xffff)
>
> but if checksum complete is to be skipped due to short_frame we want to
> go back to CHECKSUM_UNNECESSARY, this is the refactoring i am talking
> about :).
>
>
> I hope now it is clear..

Not really, since in case of short frame, cqe->checksum will be
0xffff, since mlx4 does not include the padding bytes in the checksum
in the first place.

(For native IPv4/IPV6 TCP/UDP frames that is)

>
>
> >
> > > +        */
> > > +       if (short_frame(skb->len))
> > > +               return -EINVAL;
> > >
> > > -       void *hdr = (u8 *)va + sizeof(struct ethhdr);
> > > -
> > > +       hdr = (u8 *)va + sizeof(struct ethhdr);
> > >         hw_checksum = csum_unfold((__force __sum16)cqe->checksum);
> > >
> > >         if (cqe->vlan_my_qpn &
> > > cpu_to_be32(MLX4_CQE_CVLAN_PRESENT_MASK) &&
> > > --
> > > 1.8.3.1
> > >
Saeed Mahameed Jan. 31, 2019, 7:27 p.m. UTC | #5
On Thu, 2019-01-31 at 11:07 -0800, Eric Dumazet wrote:
> On Thu, Jan 31, 2019 at 11:04 AM Saeed Mahameed <saeedm@mellanox.com>
> wrote:
> > On Thu, 2019-01-31 at 07:02 -0800, Eric Dumazet wrote:
> > > On Thu, Jan 31, 2019 at 5:04 AM Tariq Toukan <tariqt@mellanox.com
> > > >
> > > wrote:
> > > > From: Saeed Mahameed <saeedm@mellanox.com>
> > > > 
> > > > When an ethernet frame is padded to meet the minimum ethernet
> > > > frame
> > > > size, the padding octets are not covered by the hardware
> > > > checksum.
> > > > Fortunately the padding octets are usually zero's, which don't
> > > > affect
> > > > checksum. However, it is not guaranteed. For example, switches
> > > > might
> > > > choose to make other use of these octets.
> > > > This repeatedly causes kernel hardware checksum fault.
> > > > 
> > > > Prior to the cited commit below, skb checksum was forced to be
> > > > CHECKSUM_NONE when padding is detected. After it, we need to
> > > > keep
> > > > skb->csum updated. However, fixing up CHECKSUM_COMPLETE
> > > > requires to
> > > > verify and parse IP headers, it does not worth the effort as
> > > > the
> > > > packets
> > > > are so small that CHECKSUM_COMPLETE has no significant
> > > > advantage.
> > > > 
> > > > Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and
> > > > CHECKSUM_COMPLETE
> > > > are friends")
> > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> > > > Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> > > > ---
> > > >  drivers/net/ethernet/mellanox/mlx4/en_rx.c | 19
> > > > +++++++++++++++++-
> > > > -
> > > >  1 file changed, 17 insertions(+), 2 deletions(-)
> > > > 
> > > > Hi Dave,
> > > > Please queue for -stable >= v4.18.
> > > > 
> > > > Thanks,
> > > > Tariq
> > > > 
> > > > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > > > b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > > > index 9a0881cb7f51..fc8a11444e1a 100644
> > > > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > > > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > > > @@ -617,6 +617,8 @@ static int get_fixed_ipv6_csum(__wsum
> > > > hw_checksum, struct sk_buff *skb,
> > > >  }
> > > >  #endif
> > > > 
> > > > +#define short_frame(size) ((size) <= ETH_ZLEN + ETH_FCS_LEN)
> > > > +
> > > >  /* We reach this function only after checking that any of
> > > >   * the (IPv4 | IPv6) bits are set in cqe->status.
> > > >   */
> > > > @@ -624,9 +626,22 @@ static int check_csum(struct mlx4_cqe
> > > > *cqe,
> > > > struct sk_buff *skb, void *va,
> > > >                       netdev_features_t dev_features)
> > > >  {
> > > >         __wsum hw_checksum = 0;
> > > > +       void *hdr;
> > > > +
> > > > +       /* CQE csum doesn't cover padding octets in short
> > > > ethernet
> > > > +        * frames. And the pad field is appended prior to
> > > > calculating
> > > > +        * and appending the FCS field.
> > > > +        *
> > > > +        * Detecting these padded frames requires to verify and
> > > > parse
> > > > +        * IP headers, so we simply force all those small
> > > > frames to
> > > > be
> > > > +        * CHECKSUM_NONE even if they are not padded.
> > > > +        * TODO: better if we report CHECKSUM_UNNECESSARY but
> > > > this
> > > > +        * demands some refactroing.
> > > 
> > > This TODO: looks bogus to me.
> > > 
> > > mlx4 driver first tries to use CHECKSUM_UNNECESSARY.
> > > 
> > > if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_TCP |
> > >                                                   MLX4_CQE_STATUS
> > > _UDP
> > > )) &&
> > >      (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
> > >       cqe->checksum == cpu_to_be16(0xffff)) {
> > >      ...
> > >       ip_summed = CHECKSUM_UNNECESSARY;
> > >       ...
> > > }
> > > 
> > > Then if this attempt failed, it tries to use CHECKSUM_COMPLETE
> > > (and
> > > calls this check_sum() function)
> > > 
> > > So there is no refactoring to be done for mlx4 : short
> > > IPv[46]+{UDP|TCP} frames get CHECKSUM_UNNECESSARY already
> > 
> > not exactly, considering mlx4 weird condition to decide to do
> > CHECKSUM_UNNECESSARY:
> > 
> > if ((cqe csum fields are valid) && cqe->checksum ==
> > cpu_to_be16(0xffff))
> >      { do CHECKSUM_UNNECESSARY }
> > else { do CHECKSUM_COMPLETE }
> > 
> > CHECKSUM_UNNECESSARY will be skipped if csum complete field is
> > valid
> > i.e (cqe->checksum != 0xffff)
> > 
> > but if checksum complete is to be skipped due to short_frame we
> > want to
> > go back to CHECKSUM_UNNECESSARY, this is the refactoring i am
> > talking
> > about :).
> > 
> > 
> > I hope now it is clear..
> 
> Not really, since in case of short frame, cqe->checksum will be
> 0xffff, since mlx4 does not include the padding bytes in the checksum
> in the first place.
> 

Are you sure ? you are claiming that the hardware will skip csum
complete i.e cqe->checksum will be 0xffff for padded short IP frames.
i don't think this is the case, the whole bug is that the hw does
provide a partial cqe->checksum (i.e doesn't included the padding
bytes) even for short eth frames.


> (For native IPv4/IPV6 TCP/UDP frames that is)
> 
> > 
> > > > +        */
> > > > +       if (short_frame(skb->len))
> > > > +               return -EINVAL;
> > > > 
> > > > -       void *hdr = (u8 *)va + sizeof(struct ethhdr);
> > > > -
> > > > +       hdr = (u8 *)va + sizeof(struct ethhdr);
> > > >         hw_checksum = csum_unfold((__force __sum16)cqe-
> > > > >checksum);
> > > > 
> > > >         if (cqe->vlan_my_qpn &
> > > > cpu_to_be32(MLX4_CQE_CVLAN_PRESENT_MASK) &&
> > > > --
> > > > 1.8.3.1
> > > >
Eric Dumazet Jan. 31, 2019, 7:42 p.m. UTC | #6
On Thu, Jan 31, 2019 at 11:27 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
>
> Are you sure ? you are claiming that the hardware will skip csum
> complete i.e cqe->checksum will be 0xffff for padded short IP frames.
> i don't think this is the case, the whole bug is that the hw does
> provide a partial cqe->checksum (i.e doesn't included the padding
> bytes) even for short eth frames.

If the padding is not included, then cqe->checksum is 0xFFFF for
correctly received frames.

Otherwise, what would be cqe->checksum in this case ? A random value ?
Saeed Mahameed Feb. 1, 2019, 12:05 a.m. UTC | #7
On Thu, 2019-01-31 at 11:42 -0800, Eric Dumazet wrote:
> On Thu, Jan 31, 2019 at 11:27 AM Saeed Mahameed <saeedm@mellanox.com>
> wrote:
> > Are you sure ? you are claiming that the hardware will skip csum
> > complete i.e cqe->checksum will be 0xffff for padded short IP
> > frames.
> > i don't think this is the case, the whole bug is that the hw does
> > provide a partial cqe->checksum (i.e doesn't included the padding
> > bytes) even for short eth frames.
> 
> If the padding is not included, then cqe->checksum is 0xFFFF for
> correctly received frames.
> 
> Otherwise, what would be cqe->checksum in this case ? A random value
> ?

the actual checksum of IP headers+IP payload, while ignoring the
padding bytes, which is the bug, let me double check..
Saeed Mahameed Feb. 6, 2019, 8:15 p.m. UTC | #8
On Thu, Jan 31, 2019 at 4:06 PM Saeed Mahameed <saeedm@mellanox.com> wrote:
>
> On Thu, 2019-01-31 at 11:42 -0800, Eric Dumazet wrote:
> > On Thu, Jan 31, 2019 at 11:27 AM Saeed Mahameed <saeedm@mellanox.com>
> > wrote:
> > > Are you sure ? you are claiming that the hardware will skip csum
> > > complete i.e cqe->checksum will be 0xffff for padded short IP
> > > frames.
> > > i don't think this is the case, the whole bug is that the hw does
> > > provide a partial cqe->checksum (i.e doesn't included the padding
> > > bytes) even for short eth frames.
> >
> > If the padding is not included, then cqe->checksum is 0xFFFF for
> > correctly received frames.
> >
> > Otherwise, what would be cqe->checksum in this case ? A random value
> > ?
>
> the actual checksum of IP headers+IP payload, while ignoring the
> padding bytes, which is the bug, let me double check..
>
>

Ok, just verified, csum complete (cqe->checksum) is provided and valid
for non-TCP/UDP ip packets, (on specific ConnectX3 HWs)
e.g. ICMP packets or IP fragments go through csum complete,  that
being said, looking at the code before my patch.
when cqe->checksum complete is not valid a IP non-TCP/UDP packet will
report csum NONE, which means my
TODO comment is valid even without my patch :).

We can remove the TODO, although i am fine with it if it kept there,
since it is valid,
but we must add a future optimization task (to tariq's backlog ;)) for
IP non-TCP/UDP traffic to check for
csum unnecessary when csum complete is not an option.

Thanks,
Saeed.
Eric Dumazet Feb. 6, 2019, 8:17 p.m. UTC | #9
On Wed, Feb 6, 2019 at 12:15 PM Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
>
> Ok, just verified, csum complete (cqe->checksum) is provided and valid
> for non-TCP/UDP ip packets, (on specific ConnectX3 HWs)
> e.g. ICMP packets or IP fragments go through csum complete,  that
> being said, looking at the code before my patch.
> when cqe->checksum complete is not valid a IP non-TCP/UDP packet will
> report csum NONE, which means my
> TODO comment is valid even without my patch :).
>
> We can remove the TODO, although i am fine with it if it kept there,
> since it is valid,
> but we must add a future optimization task (to tariq's backlog ;)) for
> IP non-TCP/UDP traffic to check for
> csum unnecessary when csum complete is not an option.
>
>

Thanks for double checking.
You might add more details in the changelog for future generations ;)
Saeed Mahameed Feb. 6, 2019, 8:23 p.m. UTC | #10
On Wed, Feb 6, 2019 at 12:17 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Feb 6, 2019 at 12:15 PM Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
> >
> > Ok, just verified, csum complete (cqe->checksum) is provided and valid
> > for non-TCP/UDP ip packets, (on specific ConnectX3 HWs)
> > e.g. ICMP packets or IP fragments go through csum complete,  that
> > being said, looking at the code before my patch.
> > when cqe->checksum complete is not valid a IP non-TCP/UDP packet will
> > report csum NONE, which means my
> > TODO comment is valid even without my patch :).
> >
> > We can remove the TODO, although i am fine with it if it kept there,
> > since it is valid,
> > but we must add a future optimization task (to tariq's backlog ;)) for
> > IP non-TCP/UDP traffic to check for
> > csum unnecessary when csum complete is not an option.
> >
> >
>
> Thanks for double checking.
> You might add more details in the changelog for future generations ;)

Great, I will do that, we will post V2,

Thank you Eric.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 9a0881cb7f51..fc8a11444e1a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -617,6 +617,8 @@  static int get_fixed_ipv6_csum(__wsum hw_checksum, struct sk_buff *skb,
 }
 #endif
 
+#define short_frame(size) ((size) <= ETH_ZLEN + ETH_FCS_LEN)
+
 /* We reach this function only after checking that any of
  * the (IPv4 | IPv6) bits are set in cqe->status.
  */
@@ -624,9 +626,22 @@  static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
 		      netdev_features_t dev_features)
 {
 	__wsum hw_checksum = 0;
+	void *hdr;
+
+	/* CQE csum doesn't cover padding octets in short ethernet
+	 * frames. And the pad field is appended prior to calculating
+	 * and appending the FCS field.
+	 *
+	 * Detecting these padded frames requires to verify and parse
+	 * IP headers, so we simply force all those small frames to be
+	 * CHECKSUM_NONE even if they are not padded.
+	 * TODO: better if we report CHECKSUM_UNNECESSARY but this
+	 * demands some refactroing.
+	 */
+	if (short_frame(skb->len))
+		return -EINVAL;
 
-	void *hdr = (u8 *)va + sizeof(struct ethhdr);
-
+	hdr = (u8 *)va + sizeof(struct ethhdr);
 	hw_checksum = csum_unfold((__force __sum16)cqe->checksum);
 
 	if (cqe->vlan_my_qpn & cpu_to_be32(MLX4_CQE_CVLAN_PRESENT_MASK) &&