Patchwork gianfar: Fix TX ring processing on SMP machines

login
register
mail settings
Submitter Anton Vorontsov
Date March 3, 2010, 6:18 p.m.
Message ID <20100303181858.GA458@oksana.dev.rtsoft.ru>
Download mbox | patch
Permalink /patch/46853/
State Accepted, archived
Delegated to: Kumar Gala
Headers show

Comments

Anton Vorontsov - March 3, 2010, 6:18 p.m.
Starting with commit a3bc1f11e9b867a4f49505 ("gianfar: Revive SKB
recycling") gianfar driver sooner or later stops transmitting any
packets on SMP machines.

start_xmit() prepares new skb for transmitting, generally it does
three things:

1. sets up all BDs (marks them ready to send), except the first one.
2. stores skb into tx_queue->tx_skbuff so that clean_tx_ring()
   would cleanup it later.
3. sets up the first BD, i.e. marks it ready.

Here is what clean_tx_ring() does:

1. reads skbs from tx_queue->tx_skbuff
2. checks if the *last* BD is ready. If it's still ready [to send]
   then it it isn't transmitted, so clean_tx_ring() returns.
   Otherwise it actually cleanups BDs. All is OK.

Now, if there is just one BD, code flow:

- start_xmit(): stores skb into tx_skbuff. Note that the first BD
  (which is also the last one) isn't marked as ready, yet.
- clean_tx_ring(): sees that skb is not null, *and* its lstatus
  says that it is NOT ready (like if BD was sent), so it cleans
  it up (bad!)
- start_xmit(): marks BD as ready [to send], but it's too late.

We can fix this simply by reordering lstatus/tx_skbuff writes.

Reported-by: Martyn Welch <martyn.welch@ge.com>
Bisected-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Tested-by: Martyn Welch <martyn.welch@ge.com>
Cc: Sandeep Gopalpet <Sandeep.Kumar@freescale.com>
Cc: Stable <stable@vger.kernel.org> [2.6.33]
---
 drivers/net/gianfar.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)
David Miller - March 4, 2010, 8:41 a.m.
From: Anton Vorontsov <avorontsov@ru.mvista.com>
Date: Wed, 3 Mar 2010 21:18:58 +0300

> Starting with commit a3bc1f11e9b867a4f49505 ("gianfar: Revive SKB
> recycling") gianfar driver sooner or later stops transmitting any
> packets on SMP machines.
> 
> start_xmit() prepares new skb for transmitting, generally it does
> three things:
> 
> 1. sets up all BDs (marks them ready to send), except the first one.
> 2. stores skb into tx_queue->tx_skbuff so that clean_tx_ring()
>    would cleanup it later.
> 3. sets up the first BD, i.e. marks it ready.
> 
> Here is what clean_tx_ring() does:
> 
> 1. reads skbs from tx_queue->tx_skbuff
> 2. checks if the *last* BD is ready. If it's still ready [to send]
>    then it it isn't transmitted, so clean_tx_ring() returns.
>    Otherwise it actually cleanups BDs. All is OK.
> 
> Now, if there is just one BD, code flow:
> 
> - start_xmit(): stores skb into tx_skbuff. Note that the first BD
>   (which is also the last one) isn't marked as ready, yet.
> - clean_tx_ring(): sees that skb is not null, *and* its lstatus
>   says that it is NOT ready (like if BD was sent), so it cleans
>   it up (bad!)
> - start_xmit(): marks BD as ready [to send], but it's too late.
> 
> We can fix this simply by reordering lstatus/tx_skbuff writes.
> 
> Reported-by: Martyn Welch <martyn.welch@ge.com>
> Bisected-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> Tested-by: Martyn Welch <martyn.welch@ge.com>

Applied.
Kumar Gala - March 4, 2010, 4:34 p.m.
On Mar 4, 2010, at 2:41 AM, David Miller wrote:

> From: Anton Vorontsov <avorontsov@ru.mvista.com>
> Date: Wed, 3 Mar 2010 21:18:58 +0300
> 
>> Starting with commit a3bc1f11e9b867a4f49505 ("gianfar: Revive SKB
>> recycling") gianfar driver sooner or later stops transmitting any
>> packets on SMP machines.
>> 
>> start_xmit() prepares new skb for transmitting, generally it does
>> three things:
>> 
>> 1. sets up all BDs (marks them ready to send), except the first one.
>> 2. stores skb into tx_queue->tx_skbuff so that clean_tx_ring()
>>   would cleanup it later.
>> 3. sets up the first BD, i.e. marks it ready.
>> 
>> Here is what clean_tx_ring() does:
>> 
>> 1. reads skbs from tx_queue->tx_skbuff
>> 2. checks if the *last* BD is ready. If it's still ready [to send]
>>   then it it isn't transmitted, so clean_tx_ring() returns.
>>   Otherwise it actually cleanups BDs. All is OK.
>> 
>> Now, if there is just one BD, code flow:
>> 
>> - start_xmit(): stores skb into tx_skbuff. Note that the first BD
>>  (which is also the last one) isn't marked as ready, yet.
>> - clean_tx_ring(): sees that skb is not null, *and* its lstatus
>>  says that it is NOT ready (like if BD was sent), so it cleans
>>  it up (bad!)
>> - start_xmit(): marks BD as ready [to send], but it's too late.
>> 
>> We can fix this simply by reordering lstatus/tx_skbuff writes.
>> 
>> Reported-by: Martyn Welch <martyn.welch@ge.com>
>> Bisected-by: Paul Gortmaker <paul.gortmaker@windriver.com>
>> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
>> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
>> Tested-by: Martyn Welch <martyn.welch@ge.com>
> 
> Applied.

Anton,

Once this makes it into Linus's tree can you make sure we get it added to -stable.

- k
Esben Haabendal - June 11, 2010, 8:45 a.m.
On Wed, Mar 3, 2010 at 8:18 PM, Anton Vorontsov
<avorontsov@ru.mvista.com> wrote:
> Starting with commit a3bc1f11e9b867a4f49505 ("gianfar: Revive SKB
> recycling") gianfar driver sooner or later stops transmitting any
> packets on SMP machines.
>
> start_xmit() prepares new skb for transmitting, generally it does
> three things:
>
> 1. sets up all BDs (marks them ready to send), except the first one.
> 2. stores skb into tx_queue->tx_skbuff so that clean_tx_ring()
>   would cleanup it later.
> 3. sets up the first BD, i.e. marks it ready.
>
> Here is what clean_tx_ring() does:
>
> 1. reads skbs from tx_queue->tx_skbuff
> 2. checks if the *last* BD is ready. If it's still ready [to send]
>   then it it isn't transmitted, so clean_tx_ring() returns.
>   Otherwise it actually cleanups BDs. All is OK.
>
> Now, if there is just one BD, code flow:
>
> - start_xmit(): stores skb into tx_skbuff. Note that the first BD
>  (which is also the last one) isn't marked as ready, yet.
> - clean_tx_ring(): sees that skb is not null, *and* its lstatus
>  says that it is NOT ready (like if BD was sent), so it cleans
>  it up (bad!)
> - start_xmit(): marks BD as ready [to send], but it's too late.
>
> We can fix this simply by reordering lstatus/tx_skbuff writes.
>
> Reported-by: Martyn Welch <martyn.welch@ge.com>
> Bisected-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> Tested-by: Martyn Welch <martyn.welch@ge.com>
> Cc: Sandeep Gopalpet <Sandeep.Kumar@freescale.com>
> Cc: Stable <stable@vger.kernel.org> [2.6.33]
> ---
>  drivers/net/gianfar.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> index 8bd3c9f..cccb409 100644
> --- a/drivers/net/gianfar.c
> +++ b/drivers/net/gianfar.c
> @@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
>        }
>
>        /* setup the TxBD length and buffer pointer for the first BD */
> -       tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
>        txbdp_start->bufPtr = dma_map_single(&priv->ofdev->dev, skb->data,
>                        skb_headlen(skb), DMA_TO_DEVICE);
>
> @@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
>
>        txbdp_start->lstatus = lstatus;
>
> +       eieio(); /* force lstatus write before tx_skbuff */
> +
> +       tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
> +
>        /* Update the current skb pointer to the next entry we will use
>         * (wrapping if necessary) */
>        tx_queue->skb_curtx = (tx_queue->skb_curtx + 1) &

This patch also makes gianfar work stable on mpc8313 with 2.6.33/RT_PREEMPT.
WIthout it, I see exactly the same problems as reported by Anton on SMP.

/Esben

Patch

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 8bd3c9f..cccb409 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -2021,7 +2021,6 @@  static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	/* setup the TxBD length and buffer pointer for the first BD */
-	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
 	txbdp_start->bufPtr = dma_map_single(&priv->ofdev->dev, skb->data,
 			skb_headlen(skb), DMA_TO_DEVICE);
 
@@ -2053,6 +2052,10 @@  static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	txbdp_start->lstatus = lstatus;
 
+	eieio(); /* force lstatus write before tx_skbuff */
+
+	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
+
 	/* Update the current skb pointer to the next entry we will use
 	 * (wrapping if necessary) */
 	tx_queue->skb_curtx = (tx_queue->skb_curtx + 1) &