Patchwork [2/2] mv643xx_eth: fix race in trasmit path.

login
register
mail settings
Submitter Richard Cochran
Date June 20, 2011, 7:48 a.m.
Message ID <25a379578b71bf01f3c77ac76a193d26554f9e0c.1308555865.git.richard.cochran@omicron.at>
Download mbox | patch
Permalink /patch/101042/
State Accepted
Delegated to: David Miller
Headers show

Comments

Richard Cochran - June 20, 2011, 7:48 a.m.
Because the socket buffer is freed in the completion interrupt, it is not
safe to access it after submitting it to the hardware.

Cc: stable@kernel.org
Cc: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
---
 drivers/net/mv643xx_eth.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)
Maxime Bizon - June 20, 2011, 2:07 p.m.
On Mon, 2011-06-20 at 09:48 +0200, Richard Cochran wrote:

Hi Richard,

> Because the socket buffer is freed in the completion interrupt, it is not
> safe to access it after submitting it to the hardware.

I don't see why.

skb is freed from txq_reclaim() which grabs the tx queue lock before,
(hence the lockless __skb_queue_xxx() in both functions)

What am I missing ?

> Cc: stable@kernel.org
> Cc: Lennert Buytenhek <buytenh@wantstofly.org>
> Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
> ---
>  drivers/net/mv643xx_eth.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
> index a5d9b1c..1b7d2c1 100644
> --- a/drivers/net/mv643xx_eth.c
> +++ b/drivers/net/mv643xx_eth.c
> @@ -859,7 +859,7 @@ no_csum:
>  static netdev_tx_t mv643xx_eth_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct mv643xx_eth_private *mp = netdev_priv(dev);
> -	int queue;
> +	int length, queue;
>  	struct tx_queue *txq;
>  	struct netdev_queue *nq;
>  
> @@ -881,10 +881,12 @@ static netdev_tx_t mv643xx_eth_xmit(struct sk_buff *skb, struct net_device *dev)
>  		return NETDEV_TX_OK;
>  	}
>  
> +	length = skb->len;
> +
>  	if (!txq_submit_skb(txq, skb)) {
>  		int entries_left;
>  
> -		txq->tx_bytes += skb->len;
> +		txq->tx_bytes += length;
>  		txq->tx_packets++;
>  
>  		entries_left = txq->tx_ring_size - txq->tx_desc_count;
Eric Dumazet - June 20, 2011, 3:49 p.m.
Le lundi 20 juin 2011 à 09:48 +0200, Richard Cochran a écrit :
> Because the socket buffer is freed in the completion interrupt, it is not
> safe to access it after submitting it to the hardware.
> 
> Cc: stable@kernel.org
> Cc: Lennert Buytenhek <buytenh@wantstofly.org>
> Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
> ---
>  drivers/net/mv643xx_eth.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - June 20, 2011, 4:31 p.m.
Le lundi 20 juin 2011 à 16:07 +0200, Maxime Bizon a écrit :
> On Mon, 2011-06-20 at 09:48 +0200, Richard Cochran wrote:
> 
> Hi Richard,
> 
> > Because the socket buffer is freed in the completion interrupt, it is not
> > safe to access it after submitting it to the hardware.
> 
> I don't see why.
> 
> skb is freed from txq_reclaim() which grabs the tx queue lock before,
> (hence the lockless __skb_queue_xxx() in both functions)
> 
> What am I missing ?

You are right, this driver still takes tx queue lock in its TX
completion path.

txq_reclaim() can currently run for a very long time, blocking other
cpus to make progress if blocked on tx queue lock.

So consider this patch as a cleanup :

Its better to make all drivers behave the same way

1) Either increment tx stats in their start_xmit(), and making sure they
dont access skb->len too late.

2) increment tx stats in their TX completion path.

Then we can work on making TX completion path not taking tx queue lock
in its fast path.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lennert Buytenhek - June 20, 2011, 4:33 p.m.
On Mon, Jun 20, 2011 at 09:48:07AM +0200, Richard Cochran wrote:

> Because the socket buffer is freed in the completion interrupt, it
> is not safe to access it after submitting it to the hardware.

Maybe I'm missing something here, but mv643xx_eth TX reclaim is done
from NAPI poll, under __netif_tx_lock(), while mv643xx_eth_xmit() also
runs under __netif_tx_lock().
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - June 20, 2011, 5:19 p.m.
Le lundi 20 juin 2011 à 18:33 +0200, Lennert Buytenhek a écrit :
> On Mon, Jun 20, 2011 at 09:48:07AM +0200, Richard Cochran wrote:
> 
> > Because the socket buffer is freed in the completion interrupt, it
> > is not safe to access it after submitting it to the hardware.
> 
> Maybe I'm missing something here, but mv643xx_eth TX reclaim is done
> from NAPI poll, under __netif_tx_lock(), while mv643xx_eth_xmit() also
> runs under __netif_tx_lock().

See my previous answer. Its true this driver _currently_ holds tx queue
lock in its TX completion. But that might/should change.

Goal is to make tx completion not use tx queue lock in fast path, like
its done in tg3, bnx2, bnx2x ... and other recent drivers.

Its obviously correct to move skb->len access in start_xmit() before
starting the IO, even if not a bug fix, it makes all drivers behave the
same : When reviewing them, its easier not to worry about these possible
use after free.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - June 20, 2011, 5:21 p.m.
Le lundi 20 juin 2011 à 19:19 +0200, Eric Dumazet a écrit :
> Le lundi 20 juin 2011 à 18:33 +0200, Lennert Buytenhek a écrit :
> > On Mon, Jun 20, 2011 at 09:48:07AM +0200, Richard Cochran wrote:
> > 
> > > Because the socket buffer is freed in the completion interrupt, it
> > > is not safe to access it after submitting it to the hardware.
> > 
> > Maybe I'm missing something here, but mv643xx_eth TX reclaim is done
> > from NAPI poll, under __netif_tx_lock(), while mv643xx_eth_xmit() also
> > runs under __netif_tx_lock().
> 
> See my previous answer. Its true this driver _currently_ holds tx queue
> lock in its TX completion. But that might/should change.
> 
> Goal is to make tx completion not use tx queue lock in fast path, like
> its done in tg3, bnx2, bnx2x ... and other recent drivers.
> 
> Its obviously correct to move skb->len access in start_xmit() before
> starting the IO, even if not a bug fix, it makes all drivers behave the
> same : When reviewing them, its easier not to worry about these possible
> use after free.
> 
> 

One random example found in drivers/staging/hv/netvsc_drv.c

Is the "net->stats.tx_bytes += skb->len;" found in line 188 is correct ?

Who knows ?

I believe its not correct ;)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - June 21, 2011, 11 p.m.
From: Richard Cochran <richardcochran@gmail.com>
Date: Mon, 20 Jun 2011 09:48:07 +0200

> Because the socket buffer is freed in the completion interrupt, it is not
> safe to access it after submitting it to the hardware.
> 
> Signed-off-by: Richard Cochran <richard.cochran@omicron.at>

Since this isn't actually a bonafide bug fix I've applied this to net-next-2.6
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index a5d9b1c..1b7d2c1 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -859,7 +859,7 @@  no_csum:
 static netdev_tx_t mv643xx_eth_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct mv643xx_eth_private *mp = netdev_priv(dev);
-	int queue;
+	int length, queue;
 	struct tx_queue *txq;
 	struct netdev_queue *nq;
 
@@ -881,10 +881,12 @@  static netdev_tx_t mv643xx_eth_xmit(struct sk_buff *skb, struct net_device *dev)
 		return NETDEV_TX_OK;
 	}
 
+	length = skb->len;
+
 	if (!txq_submit_skb(txq, skb)) {
 		int entries_left;
 
-		txq->tx_bytes += skb->len;
+		txq->tx_bytes += length;
 		txq->tx_packets++;
 
 		entries_left = txq->tx_ring_size - txq->tx_desc_count;