diff mbox series

[net] net: ethernet: cadence: fix socket buffer corruption problem

Message ID 1540417883-8476-1-git-send-email-Tristram.Ha@microchip.com
State Accepted, archived
Delegated to: David Miller
Headers show
Series [net] net: ethernet: cadence: fix socket buffer corruption problem | expand

Commit Message

Tristram.Ha@microchip.com Oct. 24, 2018, 9:51 p.m. UTC
From: Tristram Ha <Tristram.Ha@microchip.com>

Socket buffer is not re-created when headroom is 2 and tailroom is 1.

Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
---
 drivers/net/ethernet/cadence/macb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Miller Oct. 25, 2018, 6:32 p.m. UTC | #1
From: <Tristram.Ha@microchip.com>
Date: Wed, 24 Oct 2018 14:51:23 -0700

> From: Tristram Ha <Tristram.Ha@microchip.com>
> 
> Socket buffer is not re-created when headroom is 2 and tailroom is 1.
> 
> Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>

Applied.
Florian Fainelli Oct. 25, 2018, 6:41 p.m. UTC | #2
On 10/25/18 11:32 AM, David Miller wrote:
> From: <Tristram.Ha@microchip.com>
> Date: Wed, 24 Oct 2018 14:51:23 -0700
> 
>> From: Tristram Ha <Tristram.Ha@microchip.com>
>>
>> Socket buffer is not re-created when headroom is 2 and tailroom is 1.
>>
>> Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
> 
> Applied.

No fixes tag?
Claudiu Beznea Oct. 29, 2018, 3:04 p.m. UTC | #3
Hi Tristram,

Could you, please, tell me if the above variable was false in your case?

bool cloned = skb_cloned(*skb) || skb_header_cloned(*skb);

If yes, then, the proper fix would be as follows:

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 8f5bf9166c11..492a8e1a34cd 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1690,7 +1690,7 @@ static int macb_pad_and_fcs(struct sk_buff **skb, struct net_device *ndev)
                padlen += ETH_FCS_LEN;
        }
 
-       if (!cloned && headroom + tailroom >= padlen) {
+       if (!cloned && headroom + tailroom >= ETH_FCS_LEN) {
                (*skb)->data = memmove((*skb)->head, (*skb)->data, (*skb)->len);
                skb_set_tail_pointer(*skb, (*skb)->len);
        } else {

Could you please check if it works in your case (and without your patch)?

Thank you,
Claudiu Beznea

On 25.10.2018 00:51, Tristram.Ha@microchip.com wrote:
> From: Tristram Ha <Tristram.Ha@microchip.com>
> 
> Socket buffer is not re-created when headroom is 2 and tailroom is 1.
> 
> Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
> ---
>  drivers/net/ethernet/cadence/macb_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 8f5bf91..1d86b4d 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -1684,7 +1684,7 @@ static int macb_pad_and_fcs(struct sk_buff **skb, struct net_device *ndev)
>  			padlen = 0;
>  		/* No room for FCS, need to reallocate skb. */
>  		else
> -			padlen = ETH_FCS_LEN - tailroom;
> +			padlen = ETH_FCS_LEN;
>  	} else {
>  		/* Add room for FCS. */
>  		padlen += ETH_FCS_LEN;
>
Tristram.Ha@microchip.com Oct. 29, 2018, 9:40 p.m. UTC | #4
> Could you, please, tell me if the above variable was false in your case?
> 
> bool cloned = skb_cloned(*skb) || skb_header_cloned(*skb);
> 
> If yes, then, the proper fix would be as follows:
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c
> b/drivers/net/ethernet/cadence/macb_main.c
> index 8f5bf9166c11..492a8e1a34cd 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -1690,7 +1690,7 @@ static int macb_pad_and_fcs(struct sk_buff **skb,
> struct net_device *ndev)
>                 padlen += ETH_FCS_LEN;
>         }
> 
> -       if (!cloned && headroom + tailroom >= padlen) {
> +       if (!cloned && headroom + tailroom >= ETH_FCS_LEN) {
>                 (*skb)->data = memmove((*skb)->head, (*skb)->data, (*skb)->len);
>                 skb_set_tail_pointer(*skb, (*skb)->len);
>         } else {
> 
> Could you please check if it works in your case (and without your patch)?
> 

Actually doing that reveals another bug:

	if (padlen) {
		if (padlen >= ETH_FCS_LEN)
			skb_put_zero(*skb, padlen - ETH_FCS_LEN);
		else
			skb_trim(*skb, ETH_FCS_LEN - padlen);
	}

My fix calls skb_put_zero with zero length.  Your change calls skb_trim which
actually sets the socket buffer length to 1!

When this problem happens headroom is 2, tailroom is 1, skb->len is 61, and
padlen is 3.

DSA driver is being used.  That is why the length is already padded to 60 bytes
and 1-byte tail tag is added.

BTW, I am not sure while this macb_pad_and_fcs function was added.  Is it to
workaround some hardware bugs?  The code is executed only when
NETIF_IF_HW_CSUM is used.  But if hardware tx checksumming is enabled why
not also use the hardware to calculate CRC?

NETIF_F_SG is not enabled in the MAC I used, so enabling NETIF_IF_HW_CSUM
is rather pointless.  With the padding code the transmit throughput cannot get
higher than 100Mbps in a gigabit connection.

I would recommend to add this option to disable manual padding in one of those
macb_config structures.
Claudiu Beznea Oct. 30, 2018, 1:23 p.m. UTC | #5
On 29.10.2018 23:40, Tristram Ha - C24268 wrote:
>> Could you, please, tell me if the above variable was false in your case?
>>
>> bool cloned = skb_cloned(*skb) || skb_header_cloned(*skb);
>>
>> If yes, then, the proper fix would be as follows:
>>
>> diff --git a/drivers/net/ethernet/cadence/macb_main.c
>> b/drivers/net/ethernet/cadence/macb_main.c
>> index 8f5bf9166c11..492a8e1a34cd 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -1690,7 +1690,7 @@ static int macb_pad_and_fcs(struct sk_buff **skb,
>> struct net_device *ndev)
>>                 padlen += ETH_FCS_LEN;
>>         }
>>
>> -       if (!cloned && headroom + tailroom >= padlen) {
>> +       if (!cloned && headroom + tailroom >= ETH_FCS_LEN) {
>>                 (*skb)->data = memmove((*skb)->head, (*skb)->data, (*skb)->len);
>>                 skb_set_tail_pointer(*skb, (*skb)->len);
>>         } else {
>>
>> Could you please check if it works in your case (and without your patch)?
>>
> 
> Actually doing that reveals another bug:
> 
>         if (padlen) {
>                 if (padlen >= ETH_FCS_LEN)
>                         skb_put_zero(*skb, padlen - ETH_FCS_LEN);
>                 else
>                         skb_trim(*skb, ETH_FCS_LEN - padlen);
>         }
> 
> My fix calls skb_put_zero with zero length.  Your change calls skb_trim which
> actually sets the socket buffer length to 1!
> 
> When this problem happens headroom is 2, tailroom is 1, skb->len is 61, and
> padlen is 3.
> 

Ok, I see now. Looking again, your fix is good. But, as you said, there is
the skb_trim() in this function that is wrong from the beginning (my bad).
I propose to remove it since, with your fix is not even reached anymore.

Could you check on your side that applying this on top of your patch, your
scenario is still working?

diff --git a/drivers/net/ethernet/cadence/macb_main.c
b/drivers/net/ethernet/cadence/macb_main.c
index 1d86b4d5645a..e1347d6d1b50 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1702,12 +1702,8 @@ static int macb_pad_and_fcs(struct sk_buff **skb,
struct net_device *ndev)
                *skb = nskb;
        }

-       if (padlen) {
-               if (padlen >= ETH_FCS_LEN)
-                       skb_put_zero(*skb, padlen - ETH_FCS_LEN);
-               else
-                       skb_trim(*skb, ETH_FCS_LEN - padlen);
-       }
+       if (padlen > ETH_FCS_LEN)
+               skb_put_zero(*skb, padlen - ETH_FCS_LEN);


> DSA driver is being used.  That is why the length is already padded to 60 bytes
> and 1-byte tail tag is added.

Ok, I see, I didn't test with such configurations.

> 
> BTW, I am not sure while this macb_pad_and_fcs function was added.  Is it to
> workaround some hardware bugs?  The code is executed only when
> NETIF_IF_HW_CSUM is used.  But if hardware tx checksumming is enabled why
> not also use the hardware to calculate CRC?

It was reported in [1] that UDP checksum is offloaded to hardware no matter
the application previously computed it.

The code should be executed only for packets that has checksum computed by
applications ((*skb)->ip_summed != CHECKSUM_PARTIAL). The idea was to not
recompute checksum for packets with checksum already computed. To do so,
while hardware checksum is enabled (NETIF_F_HW_CSUM), TX_NOCRC bit should
be set on buffer descriptor. But to do so, packets must have a minimum size
of 64 and FCS to be computed.

The NETIF_F_HW_CSUM check was placed there because the issue described in
[1] is reproducible because hardware checksum is enabled and overrides the
checksum provided by applications.

[1] https://www.spinics.net/lists/netdev/msg505065.html
> 
> NETIF_F_SG is not enabled in the MAC I used, so enabling NETIF_IF_HW_CSUM
> is rather pointless.  With the padding code the transmit throughput cannot get
> higher than 100Mbps in a gigabit connection.
> 
> I would recommend to add this option to disable manual padding in one of those
> macb_config structures.

In this way the user would have to know from the beginning what kind of
packets are used.

Thank you,
Claudiu Beznea

>
Tristram.Ha@microchip.com Oct. 30, 2018, 7:36 p.m. UTC | #6
> Could you check on your side that applying this on top of your patch, your
> scenario is still working?
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c
> b/drivers/net/ethernet/cadence/macb_main.c
> index 1d86b4d5645a..e1347d6d1b50 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -1702,12 +1702,8 @@ static int macb_pad_and_fcs(struct sk_buff **skb,
> struct net_device *ndev)
>                 *skb = nskb;
>         }
> 
> -       if (padlen) {
> -               if (padlen >= ETH_FCS_LEN)
> -                       skb_put_zero(*skb, padlen - ETH_FCS_LEN);
> -               else
> -                       skb_trim(*skb, ETH_FCS_LEN - padlen);
> -       }
> +       if (padlen > ETH_FCS_LEN)
> +               skb_put_zero(*skb, padlen - ETH_FCS_LEN);

I think it is okay but I need to check all paths are covered.
 
> It was reported in [1] that UDP checksum is offloaded to hardware no matter
> the application previously computed it.
> 
> The code should be executed only for packets that has checksum computed
> by
> applications ((*skb)->ip_summed != CHECKSUM_PARTIAL). The idea was to
> not
> recompute checksum for packets with checksum already computed. To do
> so,
> while hardware checksum is enabled (NETIF_F_HW_CSUM), TX_NOCRC bit
> should
> be set on buffer descriptor. But to do so, packets must have a minimum size
> of 64 and FCS to be computed.
> 
> The NETIF_F_HW_CSUM check was placed there because the issue
> described in
> [1] is reproducible because hardware checksum is enabled and overrides the
> checksum provided by applications.
> 
> [1] https://www.spinics.net/lists/netdev/msg505065.html

I understand the issue now.  It is weird that the transmit descriptor does not
have direct control over turning on checksum generation or not, but it wastes
3 bits returning the error codes of such generation.  What can the driver do
with such information?

In my opinion then hardware transmit checksumming cannot be supported
In Linux.

> > NETIF_F_SG is not enabled in the MAC I used, so enabling
> NETIF_IF_HW_CSUM
> > is rather pointless.  With the padding code the transmit throughput cannot
> get
> > higher than 100Mbps in a gigabit connection.
> >
> > I would recommend to add this option to disable manual padding in one of
> those
> > macb_config structures.
> 
> In this way the user would have to know from the beginning what kind of
> packets are used.
> 

The kernel already does a good job of calculating checksum.  Using hardware to
do that does not improve performance much.

Alternative is to use ethtool to disable hardware tx checksum so that software can
intentionally send wrong checksums.
Claudiu Beznea Oct. 31, 2018, 7:45 a.m. UTC | #7
On 30.10.2018 21:36, Tristram Ha - C24268 wrote:
>> Could you check on your side that applying this on top of your patch, your
>> scenario is still working?
>>
>> diff --git a/drivers/net/ethernet/cadence/macb_main.c
>> b/drivers/net/ethernet/cadence/macb_main.c
>> index 1d86b4d5645a..e1347d6d1b50 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -1702,12 +1702,8 @@ static int macb_pad_and_fcs(struct sk_buff **skb,
>> struct net_device *ndev)
>>                 *skb = nskb;
>>         }
>>
>> -       if (padlen) {
>> -               if (padlen >= ETH_FCS_LEN)
>> -                       skb_put_zero(*skb, padlen - ETH_FCS_LEN);
>> -               else
>> -                       skb_trim(*skb, ETH_FCS_LEN - padlen);
>> -       }
>> +       if (padlen > ETH_FCS_LEN)
>> +               skb_put_zero(*skb, padlen - ETH_FCS_LEN);
> 
> I think it is okay but I need to check all paths are covered.

On my side I checked with pktgen generating packets with sizes starting
from 1-MTU. Same scripts I also used when I first implemented this but it
seems that your scenario wasn't covered. Please have a look and let me know
how it works on your side.

> 
>> It was reported in [1] that UDP checksum is offloaded to hardware no matter
>> the application previously computed it.
>>
>> The code should be executed only for packets that has checksum computed
>> by
>> applications ((*skb)->ip_summed != CHECKSUM_PARTIAL). The idea was to
>> not
>> recompute checksum for packets with checksum already computed. To do
>> so,
>> while hardware checksum is enabled (NETIF_F_HW_CSUM), TX_NOCRC bit
>> should
>> be set on buffer descriptor. But to do so, packets must have a minimum size
>> of 64 and FCS to be computed.
>>
>> The NETIF_F_HW_CSUM check was placed there because the issue
>> described in
>> [1] is reproducible because hardware checksum is enabled and overrides the
>> checksum provided by applications.
>>
>> [1] https://www.spinics.net/lists/netdev/msg505065.html
> 
> I understand the issue now.  It is weird that the transmit descriptor does not
> have direct control over turning on checksum generation or not, but it wastes
> 3 bits returning the error codes of such generation.  What can the driver do
> with such information?

Yep, from my POV it would have been nice if it could have been able to do
the pad an FCS even when hardware checksum is enabled and TX_NOCRC bit is
set in descriptor. For cases when hardware checksum is disable it seems
that it could handle it.

> 
> In my opinion then hardware transmit checksumming cannot be supported
> In Linux.
> 
>>> NETIF_F_SG is not enabled in the MAC I used, so enabling
>> NETIF_IF_HW_CSUM
>>> is rather pointless.  With the padding code the transmit throughput cannot
>> get
>>> higher than 100Mbps in a gigabit connection.
>>>
>>> I would recommend to add this option to disable manual padding in one of
>> those
>>> macb_config structures.
>>
>> In this way the user would have to know from the beginning what kind of
>> packets are used.
>>
> 
> The kernel already does a good job of calculating checksum.  Using hardware to
> do that does not improve performance much.
> 
> Alternative is to use ethtool to disable hardware tx checksum so that software can
> intentionally send wrong checksums.
>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 8f5bf91..1d86b4d 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1684,7 +1684,7 @@  static int macb_pad_and_fcs(struct sk_buff **skb, struct net_device *ndev)
 			padlen = 0;
 		/* No room for FCS, need to reallocate skb. */
 		else
-			padlen = ETH_FCS_LEN - tailroom;
+			padlen = ETH_FCS_LEN;
 	} else {
 		/* Add room for FCS. */
 		padlen += ETH_FCS_LEN;